sort by:
Revision Author Date Message Commit Date
2e82669 powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled This fix the below build error arch/powerpc/mm/hash_utils_64.c: In function ‘flush_hash_hugepage’: arch/powerpc/mm/hash_utils_64.c:1381:1: error: label at end of compound statement tm_abort: ^ make[1]: *** [arch/powerpc/mm/hash_utils_64.o] Error 1 Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> 23 April 2015, 07:42:14 UTC
a7e73e7 powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error kvm_no_guest() calls power7_wakeup_loss() to put the thread into the deepest supported idle state. power7_wakeup_loss() is defined in arch/powerpc/kernel/idle_power7.S, which is compiled only when PPC_P7_NAP=y. And PPC_P7_NAP is selected when PPC_POWERNV=y. Hence in cases where PPC_POWERNV=n and KVM_BOOK3S_64_HV=y we see the following error: arch/powerpc/kvm/built-in.o: In function `kvm_no_guest': arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x42c): undefined reference to `power7_wakeup_loss' Fix this by adding PPC_POWERNV as a dependency for KVM_BOOK3S_64_HV. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> 17 April 2015, 01:57:36 UTC
7d6e7f7 powerpc/mm/thp: Return pte address if we find trans_splitting. For THP that is marked trans splitting, we return the pte. This require the callers to handle the pmd_trans_splitting scenario, if they care. All the current callers are either looking at pfn or write_ok, hence we don't need to update them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> 17 April 2015, 01:23:40 UTC
691e95f powerpc/mm/thp: Make page table walk safe against thp split/collapse We can disable a THP split or a hugepage collapse by disabling irq. We do send IPI to all the cpus in the early part of split/collapse, and disabling local irq ensure we don't make progress with split/collapse. If the THP is getting split we return NULL from find_linux_pte_or_hugepte(). For all the current callers it should be ok. We need to be careful if we want to use returned pte_t pointer outside the irq disabled region. W.r.t to THP split, the pfn remains the same, but then a hugepage collapse will result in a pfn change. There are few steps we can take to avoid a hugepage collapse.One way is to take page reference inside the irq disable region. Other option is to take mmap_sem so that a parallel collapse will not happen. We can also disable collapse by taking pmd_lock. Another method used by kvm subsystem is to check whether we had a mmu_notifer update in between using mmu_notifier_retry(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> 17 April 2015, 01:23:39 UTC
dac5657 KVM: PPC: Remove page table walk helpers This patch remove helpers which we had used only once in the code. Limiting page table walk variants help in ensuring that we won't end up with code walking page table with wrong assumptions. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> 17 April 2015, 01:23:24 UTC
5e1d44a KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer pte can get updated from other CPUs as part of multiple activities like THP split, huge page collapse, unmap. We need to make sure we don't reload the pte value again and again for different checks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> 17 April 2015, 01:23:24 UTC
1cbee46 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into fixes 17 April 2015, 01:22:51 UTC
d19d5ef Merge tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc updates from Michael Ellerman: - Numerous minor fixes, cleanups etc. - More EEH work from Gavin to remove its dependency on device_nodes. - Memory hotplug implemented entirely in the kernel from Nathan Fontenot. - Removal of redundant CONFIG_PPC_OF by Kevin Hao. - Rewrite of VPHN parsing logic & tests from Greg Kurz. - A fix from Nish Aravamudan to reduce memory usage by clamping nodes_possible_map. - Support for pstore on powernv from Hari Bathini. - Removal of old powerpc specific byte swap routines by David Gibson. - Fix from Vasant Hegde to prevent the flash driver telling you it was flashing your firmware when it wasn't. - Patch from Ben Herrenschmidt to add an OPAL heartbeat driver. - Fix for an oops causing get/put_cpu_var() imbalance in perf by Jan Stancek. - Some fixes for migration from Tyrel Datwyler. - A new syscall to switch the cpu endian by Michael Ellerman. - Large series from Wei Yang to implement SRIOV, reviewed and acked by Bjorn. - A fix for the OPAL sensor driver from Cédric Le Goater. - Fixes to get STRICT_MM_TYPECHECKS building again by Michael Ellerman. - Large series from Daniel Axtens to make our PCI hooks per PHB rather than per machine. - Small patch from Sam Bobroff to explicitly abort non-suspended transactions on syscalls, plus a test to exercise it. - Numerous reworks and fixes for the 24x7 PMU from Sukadev Bhattiprolu. - Small patch to enable the hard lockup detector from Anton Blanchard. - Fix from Dave Olson for missing L2 cache information on some CPUs. - Some fixes from Michael Ellerman to get Cell machines booting again. - Freescale updates from Scott: Highlights include BMan device tree nodes, an MSI erratum workaround, a couple minor performance improvements, config updates, and misc fixes/cleanup. * tag 'powerpc-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (196 commits) powerpc/powermac: Fix build error seen with powermac smp builds powerpc/pseries: Fix compile of memory hotplug without CONFIG_MEMORY_HOTREMOVE powerpc: Remove PPC32 code from pseries specific find_and_init_phbs() powerpc/cell: Fix iommu breakage caused by controller_ops change powerpc/eeh: Fix crash in eeh_add_device_early() on Cell powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails powerpc/pseries: Correct memory hotplug locking powerpc: Fix missing L2 cache size in /sys/devices/system/cpu powerpc: Add ppc64 hard lockup detector support oprofile: Disable oprofile NMI timer on ppc64 powerpc/perf/hv-24x7: Add missing put_cpu_var() powerpc/perf/hv-24x7: Break up single_24x7_request powerpc/perf/hv-24x7: Define update_event_count() powerpc/perf/hv-24x7: Whitespace cleanup powerpc/perf/hv-24x7: Define add_event_to_24x7_request() powerpc/perf/hv-24x7: Rename hv_24x7_event_update powerpc/perf/hv-24x7: Move debug prints to separate function powerpc/perf/hv-24x7: Drop event_24x7_request() powerpc/perf/hv-24x7: Use pr_devel() to log message ... Conflicts: tools/testing/selftests/powerpc/Makefile tools/testing/selftests/powerpc/tm/Makefile 16 April 2015, 18:53:32 UTC
34c9a0f crypto: fix broken crypto_register_instance() module handling Commit 9c521a200bc3 ("crypto: api - remove instance when test failed") tried to grab a module reference count before the module was even set. Worse, it then goes on to free the module reference count after it is set so you quickly end up with a negative module reference count which prevents people from using any instances belonging to that module. This patch moves the module initialisation before the reference count. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 16 April 2015, 03:26:16 UTC
eea3a00 Merge branch 'akpm' (patches from Andrew) Merge second patchbomb from Andrew Morton: - the rest of MM - various misc bits - add ability to run /sbin/reboot at reboot time - printk/vsprintf changes - fiddle with seq_printf() return value * akpm: (114 commits) parisc: remove use of seq_printf return value lru_cache: remove use of seq_printf return value tracing: remove use of seq_printf return value cgroup: remove use of seq_printf return value proc: remove use of seq_printf return value s390: remove use of seq_printf return value cris fasttimer: remove use of seq_printf return value cris: remove use of seq_printf return value openrisc: remove use of seq_printf return value ARM: plat-pxa: remove use of seq_printf return value nios2: cpuinfo: remove use of seq_printf return value microblaze: mb: remove use of seq_printf return value ipc: remove use of seq_printf return value rtc: remove use of seq_printf return value power: wakeup: remove use of seq_printf return value x86: mtrr: if: remove use of seq_printf return value linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK MAINTAINERS: CREDITS: remove Stefano Brivio from B43 .mailmap: add Ricardo Ribalda CREDITS: add Ricardo Ribalda Delgado ... 15 April 2015, 23:39:15 UTC
e693d73 parisc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
d50f8f8 lru_cache: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
962e370 tracing: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Miscellanea: o Remove unused return value from trace_lookup_stack Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
94ff212 cgroup: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
25ce319 proc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
c2f0b61 s390: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
dc640a8 cris fasttimer: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Miscellanea: o Coalesce formats, realign arguments Signed-off-by: Joe Perches <joe@perches.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
1336d42 cris: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
58a1aa7 openrisc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Jonas Bonn <jonas@southpole.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
cd2b293 ARM: plat-pxa: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, (as it is here, it doesn't return # of chars emitted) will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
4122669 nios2: cpuinfo: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Ley Foon Tan <lftan@altera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
81f0cd9 microblaze: mb: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
7f032d6 ipc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
4395eb1 rtc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
9f6a240 power: wakeup: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <len.brown@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
3ac62bc x86: mtrr: if: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
89c1e79 linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK The macro BITMAP_LAST_WORD_MASK can be implemented without a conditional, which will generally lead to slightly better generated code (221 bytes saved for allmodconfig-GCOV_KERNEL, ~2k with GCOV_KERNEL). As a small bonus, this also ensures that the nbits parameter is expanded exactly once. In BITMAP_FIRST_WORD_MASK, if start is signed gcc is technically allowed to assume it is positive (or divisible by BITS_PER_LONG), and hence just do the simple mask. It doesn't seem to use this, and even on an architecture like x86 where the shift only depends on the lower 5 or 6 bits, and these bits are not affected by the signedness of the expression, gcc still generates code to compute the C99 mandated value of start % BITS_PER_LONG. So just use a mask explicitly, also for consistency with BITMAP_LAST_WORD_MASK. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Tejun Heo <tj@kernel.org> Reviewed-by: George Spelvin <linux@horizon.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
8a72ed6 MAINTAINERS: CREDITS: remove Stefano Brivio from B43 This email address isn't working anymore Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
01c7e5a .mailmap: add Ricardo Ribalda Work and Home computer had different settings in the mail client. Some contributions appear as Ricardo Ribalda, others as Ricardo Ribalda Delgado (and one as just Ricardo). Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
3ca7bf8 CREDITS: add Ricardo Ribalda Delgado Add personal details to CREDITS file. Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
49e7d9d MAINTAINERS: Use tabs consistently Consistently use a single tab after the "specifier:" type. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
41416f2 lib/string_helpers.c: change semantics of string_escape_mem The current semantics of string_escape_mem are inadequate for one of its current users, vsnprintf(). If that is to honour its contract, it must know how much space would be needed for the entire escaped buffer, and string_escape_mem provides no way of obtaining that (short of allocating a large enough buffer (~4 times input string) to let it play with, and that's definitely a big no-no inside vsnprintf). So change the semantics for string_escape_mem to be more snprintf-like: Return the size of the output that would be generated if the destination buffer was big enough, but of course still only write to the part of dst it is allowed to, and (contrary to snprintf) don't do '\0'-termination. It is then up to the caller to detect whether output was truncated and to append a '\0' if desired. Also, we must output partial escape sequences, otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever they previously contained. This also fixes a bug in the escaped_string() helper function, which used to unconditionally pass a length of "end-buf" to string_escape_mem(); since the latter doesn't check osz for being insanely large, it would happily write to dst. For example, kasprintf(GFP_KERNEL, "something and then %pE", ...); is an easy way to trigger an oops. In test-string_helpers.c, the -ENOMEM test is replaced with testing for getting the expected return value even if the buffer is too small. We also ensure that nothing is written (by relying on a NULL pointer deref) if the output size is 0 by passing NULL - this has to work for kasprintf("%pE") to work. In net/sunrpc/cache.c, I think qword_add still has the same semantics. Someone should definitely double-check this. In fs/proc/array.c, I made the minimum possible change, but longer-term it should stop poking around in seq_file internals. [andriy.shevchenko@linux.intel.com: simplify qword_add] [andriy.shevchenko@linux.intel.com: add missed curly braces] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
3aeddc7 lib/string_helpers.c: refactor string_escape_mem When printf is given the format specifier %pE, it needs a way of obtaining the total output size that would be generated if the buffer was large enough, and string_escape_mem doesn't easily provide that. This is a refactorization of string_escape_mem in preparation of changing its external API to provide that information. The somewhat ugly early returns and subsequent seemingly redundant conditionals are to make the following patch touch as little as possible in string_helpers.c while still preserving the current behaviour of never outputting partial escape sequences. That behaviour must also change for %pE to work as one expects from every other printf specifier. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
9c98f23 lib/vsprintf.c: fix potential NULL deref in hex_string The helper hex_string() is broken in two ways. First, it doesn't increment buf regardless of whether there is room to print, so callers such as kasprintf() that try to probe the correct storage to allocate will get a too small return value. But even worse, kasprintf() (and likely anyone else trying to find the size of the result) pass NULL for buf and 0 for size, so we also have end == NULL. But this means that the end-1 in hex_string() is (char*)-1, so buf < end-1 is true and we get a NULL pointer deref. I double-checked this with a trivial kernel module that just did a kasprintf(GFP_KERNEL, "%14ph", "CrashBoomBang"). Nobody seems to be using %ph with kasprintf, but we might as well fix it before it hits someone. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
900cca2 lib/vsprintf: add %pC{,n,r} format specifiers for clocks Add format specifiers for printing struct clk: - '%pC' or '%pCn': name (Common Clock Framework) or address (legacy clock framework) of the clock, - '%pCr': rate of the clock. [akpm@linux-foundation.org: omit code if !CONFIG_HAVE_CLK] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Turquette <mturquette@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
e8a7ba5 lib/vsprintf: Move integer format types to the top Move the format types for 64-bit integers and configurable size integers to the top, so they're next to the other integer format types. While at it, add the missing format types for s32 and u32. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Turquette <mturquette@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
7330660 lib/vsprintf: document %p parameters passed by reference This patch series improves the documentation for printk() formats, and adds support for printing clocks. The latter has always been a hassle if you wanted to support both the common and legacy clock frameworks. - '%pC' and '%pCn' print the name (Common Clock Framework) or address (legacy clock framework) of a clock, - '%pCr' prints the current clock rate. This patch (of 3): Make sure all %p extensions that take parameters by references are documented to do so. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Turquette <mturquette@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
d1c1b12 lib/vsprintf.c: another small hack Making ZEROPAD == '0'-' ', we can eliminate a few more instructions. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
3ea8d44 lib/vsprintf.c: eliminate duplicate hex string array gcc doesn't merge or overlap const char[] objects with identical contents (probably language lawyers would also insist that these things have different addresses), but there's no reason to have the string "0123456789ABCDEF" occur in multiple places. hex_asc_upper is declared in kernel.h and defined in lib/hexdump.c, which is unconditionally compiled in. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
e26c12c lib/vsprintf.c: reduce stack use in number() At least since the initial git commit, when base was passed as a separate parameter, number() has only been called with bases 8, 10 and 16. I'm guessing that 66 was to accommodate 64 0/1, a sign and a '\0', but the buffer is only used for the actual digits. Octal digits carry 3 bits of information, so 24 is enough. Spell that 3*sizeof(num) so one less place needs to be changed should long long ever be 128 bits. Also remove the commented-out code that would handle an arbitrary base. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
51be17d lib/vsprintf.c: eliminate some branches Since FORMAT_TYPE_INT is simply 1 more than FORMAT_TYPE_UINT, and similarly for BYTE/UBYTE, SHORT/USHORT, LONG/ULONG, we can eliminate a few instructions by making SIGN have the value 1 instead of 2, and then use arithmetic instead of branches for computing the right spec->type. It's a little hacky, but certainly in the same spirit as SMALL needing to have the value 0x20. For example for the spec->qualifier == 'l' case, gcc now generates 75e: 0f b6 53 01 movzbl 0x1(%rbx),%edx 762: 83 e2 01 and $0x1,%edx 765: 83 c2 09 add $0x9,%edx 768: 88 13 mov %dl,(%rbx) instead of 763: 0f b6 53 01 movzbl 0x1(%rbx),%edx 767: 83 e2 02 and $0x2,%edx 76a: 80 fa 01 cmp $0x1,%dl 76d: 19 d2 sbb %edx,%edx 76f: 83 c2 0a add $0xa,%edx 772: 88 13 mov %dl,(%rbx) Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
7b1460e printk: comment pr_cont() stating it is only to continue a line KERN_CONT is nicely commented in kern_levels.h, but pr_cont() is now used more often, and it lacks the comment stating what it is used for. It can be confused as continuing the log level, but that is not its purpose. Its purpose is to continue a line that had no newline enclosed. This should be documented by pr_cont() as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
e243304 powerpc/powernv: reboot when requested by firmware Use orderly_reboot so userspace will to shut itself down via the reboot path. This is required for graceful reboot initiated by the BMC, such as when a user uses ipmitool to issue a 'chassis power cycle' command. Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fabian Frederick <fabf@skynet.be> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
7a54f46 kernel/reboot.c: add orderly_reboot for graceful reboot The kernel has orderly_poweroff which allows the kernel to initiate a graceful shutdown of userspace, by running /sbin/poweroff. This adds orderly_reboot that will cause userspace to shut itself down by calling /sbin/reboot. This will be used for shutdown initiated by a system controller on platforms that do not use ACPI. orderly_reboot() should be used when the system wants to allow userspace to gracefully shut itself down. For cases where the system may imminently catch on fire, the existing emergency_restart() provides an immediate reboot without involving userspace. Signed-off-by: Joel Stanley <joel@jms.id.au> Cc: Fabian Frederick <fabf@skynet.be> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
7975a9b drivers/sbus/char/envctrl.c: ignore orderly_poweroff return value orderly_poweroff() unconditionally returns 0, so remove the dead code that checks the return value. A future patch will change the return type to void. Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: David S. Miller <davem@davemloft.net> Cc: Fabian Frederick <fabf@skynet.be> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
972fae6 kernel/hung_task.c: change hung_task.c to use for_each_process_thread() In check_hung_uninterruptible_tasks() avoid the use of deprecated while_each_thread(). The "max_count" logic will prevent a livelock - see commit 0c740d0a ("introduce for_each_thread() to replace the buggy while_each_thread()"). Having said this let's use for_each_process_thread(). Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Wysochanski <dwysocha@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
96831c0 kernel/resource.c: remove deprecated __check_region() and friends All users of __check_region(), check_region(), and check_mem_region() are gone. We got rid of the last user in v4.0-rc1. Remove them. bloat-o-meter on x86_64 shows: add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-102 (-102) function old new delta __kstrtab___check_region 15 - -15 __ksymtab___check_region 16 - -16 __check_region 71 - -71 Signed-off-by: Jakub Sitnicki <jsitnicki@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
2813893 kernel: conditionally support non-root users, groups and capabilities There are a lot of embedded systems that run most or all of their functionality in init, running as root:root. For these systems, supporting multiple users is not necessary. This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for non-root users, non-root groups, and capabilities optional. It is enabled under CONFIG_EXPERT menu. When this symbol is not defined, UID and GID are zero in any possible case and processes always have all capabilities. The following syscalls are compiled out: setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset. Also, groups.c is compiled out completely. In kernel/capability.c, capable function was moved in order to avoid adding two ifdef blocks. This change saves about 25 KB on a defconfig build. The most minimal kernels have total text sizes in the high hundreds of kB rather than low MB. (The 25k goes down a bit with allnoconfig, but not that much. The kernel was booted in Qemu. All the common functionalities work. Adding users/groups is not possible, failing with -ENOSYS. Bloat-o-meter output: add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
c79574a lib/test-hexdump.c: fix initconst confusion const char *...[] is not const, but an array of pointer to const. So these arrays cannot be __initconst, but must be __initdata This fixes section conflicts with LTO. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
946e879 paride: fix the "verbose" module param The verbose module parameter can be set to 2 for extremely verbose messages so the type should be int instead of bool. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tim Waugh <tim@cyberelk.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
23f40a9 include/linux: remove empty conditionals Commit 607ca46e97a1 ("UAPI: (Scripted) Disintegrate include/linux") left behind some empty conditional blocks. Since they are useless and may cause a reader to wonder whether something is missing, remove them. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
e4bc332 /proc/PID/status: show all sets of pid according to ns If some issues occurred inside a container guest, host user could not know which process is in trouble just by guest pid: the users of container guest only knew the pid inside containers. This will bring obstacle for trouble shooting. This patch adds four fields: NStgid, NSpid, NSpgid and NSsid: a) In init_pid_ns, nothing changed; b) In one pidns, will tell the pid inside containers: NStgid: 21776 5 1 NSpid: 21776 5 1 NSpgid: 21776 5 1 NSsid: 21729 1 0 ** Process id is 21776 in level 0, 5 in level 1, 1 in level 2. c) If pidns is nested, it depends on which pidns are you in. NStgid: 5 1 NSpid: 5 1 NSpgid: 5 1 NSsid: 1 0 ** Views from level 1 [akpm@linux-foundation.org: add CONFIG_PID_NS ifdef] Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Tested-by: Serge Hallyn <serge.hallyn@canonical.com> Tested-by: Nathan Scott <nathans@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
201c7b7 zram: fix error return code Return a negative error code on failure. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e1,e2; @@ ( if (\(ret < 0\|ret != 0\)) { ... return ret; } | ret = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
160a117 zsmalloc: remove extra cond_resched() in __zs_compact Do not perform cond_resched() before the busy compaction loop in __zs_compact(), because this loop does it when needed. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
81da9b1 zsmalloc: fix fatal corruption due to wrong size class selection There is no point in overriding the size class below. It causes fatal corruption on the next chunk on the 3264-bytes size class, which is the last size class that is not huge. For example, if the requested size was exactly 3264 bytes, current zsmalloc allocates and returns a chunk from the size class of 3264 bytes, not 4096. User access to this chunk may overwrite head of the next adjacent chunk. Here is the panic log captured when freelist was corrupted due to this: Kernel BUG at ffffffc00030659c [verbose debug info unavailable] Internal error: Oops - BUG: 96000006 [#1] PREEMPT SMP Modules linked in: exynos-snapshot: core register saved(CPU:5) CPUMERRSR: 0000000000000000, L2MERRSR: 0000000000000000 exynos-snapshot: context saved(CPU:5) exynos-snapshot: item - log_kevents is disabled CPU: 5 PID: 898 Comm: kswapd0 Not tainted 3.10.61-4497415-eng #1 task: ffffffc0b8783d80 ti: ffffffc0b71e8000 task.ti: ffffffc0b71e8000 PC is at obj_idx_to_offset+0x0/0x1c LR is at obj_malloc+0x44/0xe8 pc : [<ffffffc00030659c>] lr : [<ffffffc000306604>] pstate: a0000045 sp : ffffffc0b71eb790 x29: ffffffc0b71eb790 x28: ffffffc00204c000 x27: 000000000001d96f x26: 0000000000000000 x25: ffffffc098cc3500 x24: ffffffc0a13f2810 x23: ffffffc098cc3501 x22: ffffffc0a13f2800 x21: 000011e1a02006e3 x20: ffffffc0a13f2800 x19: ffffffbc02a7e000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000feb x15: 0000000000000000 x14: 00000000a01003e3 x13: 0000000000000020 x12: fffffffffffffff0 x11: ffffffc08b264000 x10: 00000000e3a01004 x9 : ffffffc08b263fea x8 : ffffffc0b1e611c0 x7 : ffffffc000307d24 x6 : 0000000000000000 x5 : 0000000000000038 x4 : 000000000000011e x3 : ffffffbc00003e90 x2 : 0000000000000cc0 x1 : 00000000d0100371 x0 : ffffffbc00003e90 Reported-by: Sooyong Suk <s.suk@samsung.com> Signed-off-by: Heesub Shin <heesub.shin@samsung.com> Tested-by: Sooyong Suk <s.suk@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
839373e zsmalloc: remove unnecessary insertion/removal of zspage in compaction In putback_zspage, we don't need to insert a zspage into list of zspage in size_class again to just fix fullness group. We could do directly without reinsertion so we could save some instuctions. Reported-by: Heesub Shin <heesub.shin@samsung.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Ganesh Mahendran <opensource.ganesh@gmail.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Juneho Choi <juno.choi@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
495819e zsmalloc: micro-optimize zs_object_copy() A micro-optimization. Avoid additional branching and reduce (a bit) registry pressure (f.e. s_off += size; d_off += size; may be calculated twise: first for >= PAGE_SIZE check and later for offset update in "else" clause). scripts/bloat-o-meter shows some improvement add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10) function old new delta zs_object_copy 550 540 -10 Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
1ec7cfb zsmalloc: remove synchronize_rcu from zs_compact() Do not synchronize rcu in zs_compact(). Neither zsmalloc not zram use rcu. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
8f7d282 zram: deprecate zram attrs sysfs nodes Add Documentation/ABI/obsolete/sysfs-block-zram file and list obsolete and deprecated attributes there. The patch also adds additional information to zram documentation and describes the basic strategy: - the existing RW nodes will be downgraded to WO nodes (in 4.11) - deprecated RO sysfs nodes will eventually be removed (in 4.11) Users will be additionally notified about deprecated attr usage by pr_warn_once() (added to every deprecated attr _show()), as suggested by Minchan Kim. User space is advised to use zram<id>/stat, zram<id>/io_stat and zram<id>/mm_stat files. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
4f2109f zram: export new 'mm_stat' sysfs attrs Per-device `zram<id>/mm_stat' file provides mm statistics of a particular zram device in a format similar to block layer statistics. The file consists of a single line and represents the following stats (separated by whitespace): orig_data_size compr_data_size mem_used_total mem_limit mem_used_max zero_pages num_migrated Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
2f6a3be zram: export new 'io_stat' sysfs attrs Per-device `zram<id>/io_stat' file provides accumulated I/O statistics of particular zram device in a format similar to block layer statistics. The file consists of a single line and represents the following stats (separated by whitespace): failed_reads failed_writes invalid_io notify_free Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
77ba015 zram: describe device attrs in documentation Briefly describe exported device stat attrs in zram documentation. We will eventually get rid of per-stat sysfs nodes and, thus, clean up Documentation/ABI/testing/sysfs-block-zram file, which is the only source of information about device sysfs nodes. Add `num_migrated' description, since there is no independent `num_migrated' sysfs node (and no corresponding sysfs-block-zram entry), it will be exported via zram<id>/mm_stat file. At this point we can provide minimal description, because sysfs-block-zram still contains detailed information. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
8811a94 zram: use generic start/end io accounting Use bio generic_start_io_acct() and generic_end_io_acct() to account device's block layer statistics. This will let users to monitor zram activities using sysstat and similar packages/tools. Apart from the usual per-stat sysfs attr, zram IO stats are now also available in '/sys/block/zram<id>/stat' and '/proc/diskstats' files. We will slowly get rid of per-stat sysfs files. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
c72c616 zram: move compact_store() to sysfs functions area A cosmetic change. We have a new code layout and keep zram per-device sysfs store and show functions in one place. Move compact_store() to that handlers block to conform to current layout. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
10447b6 zram: remove `num_migrated' device attr This patch introduces rework to zram stats. We have per-stat sysfs nodes, and it makes things a bit hard to use in user space: it doesn't give an immediate stats 'snapshot', it requires user space to use more syscalls - open, read, close for every stat file, with appropriate error checks on every step, etc. First, zram now accounts block layer statistics, available in /sys/block/zram<id>/stat and /proc/diskstats files. So some new stats are available (see Documentation/block/stat.txt), besides, zram's activities now can be monitored by sysstat's iostat or similar tools. Example: cat /sys/block/zram0/stat 248 0 1984 0 251029 0 2008232 5120 0 5116 5116 Second, group currently exported on per-stat basis nodes into two categories (files): -- zram<id>/io_stat accumulates device's IO stats, that are not accounted by block layer, and contains: failed_reads failed_writes invalid_io notify_free Example: cat /sys/block/zram0/io_stat 0 0 0 652572 -- zram<id>/mm_stat accumulates zram mm stats and contains: orig_data_size compr_data_size mem_used_total mem_limit mem_used_max zero_pages num_migrated Example: cat /sys/block/zram0/mm_stat 434634752 270288572 279158784 0 579895296 15060 0 per-stat sysfs nodes are now considered to be deprecated and we plan to remove them (and clean up some of the existing stat code) in two years (as of now, there is no warning printed to syslog about deprecated stats being used). User space is advised to use the above mentioned 3 files. This patch (of 7): Remove sysfs `num_migrated' attribute. We are moving away from per-stat device attrs towards 3 stat files that will accumulate io and mm stats in a format similar to block layer statistics in /sys/block/<dev>/stat. That will be easier to use in user space, and reduce the number of syscalls needed to read zram device statistics. `num_migrated' will return back in zram<id>/mm_stat file. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
888fa37 mm/zsmalloc.c: fix comment for get_pages_per_zspage Signed-off-by: Yinghao Xie <yinghao.xie@sumsung.com> Suggested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
d02be50 zsmalloc: zsmalloc documentation Create zsmalloc doc which explains design concept and stat information. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
248ca1b zsmalloc: add fullness into stat During investigating compaction, fullness information of each class is helpful for investigating how the compaction works well. With that, we could know how compaction works well more clear on each size class. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
7b60a68 zsmalloc: record handle in page->private for huge object We store handle on header of each allocated object so it increases the size of each object by sizeof(unsigned long). If zram stores 4096 bytes to zsmalloc(ie, bad compression), zsmalloc needs 4104B-class to add handle. However, 4104B-class has 1-pages_per_zspage so wasted size by internal fragment is 8192 - 4104, which is terrible. So this patch records the handle in page->private on such huge object(ie, pages_per_zspage == 1 && maxobj_per_zspage == 1) instead of header of each object so we could use 4096B-class, not 4104B-class. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
4e3ba87 zram: support compaction Now that zsmalloc supports compaction, zram can use it. For the first step, this patch exports compact knob via sysfs so user can do compaction via "echo 1 > /sys/block/zram0/compact". Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
d3d07c9 zsmalloc: adjust ZS_ALMOST_FULL Curretly, zsmalloc regards a zspage as ZS_ALMOST_EMPTY if the zspage has under 1/4 used objects(ie, fullness_threshold_frac). It could make result in loose packing since zsmalloc migrates only ZS_ALMOST_EMPTY zspage out. This patch changes the rule so that zsmalloc makes zspage which has above 3/4 used object ZS_ALMOST_FULL so it could make tight packing. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
312fcae zsmalloc: support compaction This patch provides core functions for migration of zsmalloc. Migraion policy is simple as follows. for each size class { while { src_page = get zs_page from ZS_ALMOST_EMPTY if (!src_page) break; dst_page = get zs_page from ZS_ALMOST_FULL if (!dst_page) dst_page = get zs_page from ZS_ALMOST_EMPTY if (!dst_page) break; migrate(from src_page, to dst_page); } } For migration, we need to identify which objects in zspage are allocated to migrate them out. We could know it by iterating of freed objects in a zspage because first_page of zspage keeps free objects singly-linked list but it's not efficient. Instead, this patch adds a tag(ie, OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check whether the object is allocated easily. This patch adds another status bit in handle to synchronize between user access through zs_map_object and migration. During migration, we cannot move objects user are using due to data coherency between old object and new object. [akpm@linux-foundation.org: zsmalloc.c needs sched.h for cond_resched()] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
c780626 zsmalloc: factor out obj_[malloc|free] In later patch, migration needs some part of functions in zs_malloc and zs_free so this patch factor out them. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
2e40e16 zsmalloc: decouple handle and object Recently, we started to use zram heavily and some of issues popped. 1) external fragmentation I got a report from Juneho Choi that fork failed although there are plenty of free pages in the system. His investigation revealed zram is one of the culprit to make heavy fragmentation so there was no more contiguous 16K page for pgd to fork in the ARM. 2) non-movable pages Other problem of zram now is that inherently, user want to use zram as swap in small memory system so they use zRAM with CMA to use memory efficiently. However, unfortunately, it doesn't work well because zRAM cannot use CMA's movable pages unless it doesn't support compaction. I got several reports about that OOM happened with zram although there are lots of swap space and free space in CMA area. 3) internal fragmentation zRAM has started support memory limitation feature to limit memory usage and I sent a patchset(https://lkml.org/lkml/2014/9/21/148) for VM to be harmonized with zram-swap to stop anonymous page reclaim if zram consumed memory up to the limit although there are free space on the swap. One problem for that direction is zram has no way to know any hole in memory space zsmalloc allocated by internal fragmentation so zram would regard swap is full although there are free space in zsmalloc. For solving the issue, zram want to trigger compaction of zsmalloc before it decides full or not. This patchset is first step to support above issues. For that, it adds indirect layer between handle and object location and supports manual compaction to solve 3th problem first of all. After this patchset got merged, next step is to make VM aware of zsmalloc compaction so that generic compaction will move zsmalloced-pages automatically in runtime. In my imaginary experiment(ie, high compress ratio data with heavy swap in/out on 8G zram-swap), data is as follows, Before = zram allocated object : 60212066 bytes zram total used: 140103680 bytes ratio: 42.98 percent MemFree: 840192 kB Compaction After = frag ratio after compaction zram allocated object : 60212066 bytes zram total used: 76185600 bytes ratio: 79.03 percent MemFree: 901932 kB Juneho reported below in his real platform with small aging. So, I think the benefit would be bigger in real aging system for a long time. - frag_ratio increased 3% (ie, higher is better) - memfree increased about 6MB - In buddy info, Normal 2^3: 4, 2^2: 1: 2^1 increased, Highmem: 2^1 21 increased frag ratio after swap fragment used : 156677 kbytes total: 166092 kbytes frag_ratio : 94 meminfo before compaction MemFree: 83724 kB Node 0, zone Normal 13642 1364 57 10 61 17 9 5 4 0 0 Node 0, zone HighMem 425 29 1 0 0 0 0 0 0 0 0 num_migrated : 23630 compaction done frag ratio after compaction used : 156673 kbytes total: 160564 kbytes frag_ratio : 97 meminfo after compaction MemFree: 89060 kB Node 0, zone Normal 14076 1544 67 14 61 17 9 5 4 0 0 Node 0, zone HighMem 863 50 1 0 0 0 0 0 0 0 0 This patchset adds more logics(about 480 lines) in zsmalloc but when I tested heavy swapin/out program, the regression for swapin/out speed is marginal because most of overheads were caused by compress/decompress and other MM reclaim stuff. This patch (of 7): Currently, handle of zsmalloc encodes object's location directly so it makes support of migration hard. This patch decouples handle and object via adding indirect layer. For that, it allocates handle dynamically and returns it to user. The handle is the address allocated by slab allocation so it's unique and we could keep object's location in the memory space allocated for handle. With it, we can change object's position without changing handle itself. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
018e9a4 mm/compaction.c: fix "suitable_migration_target() unused" warning mm/compaction.c:250:13: warning: 'suitable_migration_target' defined but not used [-Wunused-function] Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
be64f88 dax: unify ext2/4_{dax,}_file_operations The original dax patchset split the ext2/4_file_operations because of the two NULL splice_read/splice_write in the dax case. In the vfs if splice_read/splice_write are NULL we then call default_splice_read/write. What we do here is make generic_file_splice_read aware of IS_DAX() so the original ext2/4_file_operations can be used as is. For write it appears that iter_file_splice_write is just fine. It uses the regular f_op->write(file,..) or new_sync_write(file, ...). Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
0e3b210 dax: use pfn_mkwrite to update c/mtime + freeze protection From: Yigal Korman <yigal@plexistor.com> [v1] Without this patch, c/mtime is not updated correctly when mmap'ed page is first read from and then written to. A new xfstest is submitted for testing this (generic/080) [v2] Jan Kara has pointed out that if we add the sb_start/end_pagefault pair in the new pfn_mkwrite we are then fixing another bug where: A user could start writing to the page while filesystem is frozen. Signed-off-by: Yigal Korman <yigal@plexistor.com> Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
dd90618 mm: new pfn_mkwrite same as page_mkwrite for VM_PFNMAP This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to get notified when access is a write to a read-only PFN. This can happen if we mmap() a file then first mmap-read from it to page-in a read-only PFN, than we mmap-write to the same page. We need this functionality to fix a DAX bug, where in the scenario above we fail to set ctime/mtime though we modified the file. An xfstest is attached to this patchset that shows the failure and the fix. (A DAX patch will follow) This functionality is extra important for us, because upon dirtying of a pmem page we also want to RDMA the page to a remote cluster node. We define a new pfn_mkwrite and do not reuse page_mkwrite because 1 - The name ;-) 2 - But mainly because it would take a very long and tedious audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP users. To make sure they do not now CRASH. For example current DAX code (which this is for) would crash. If we would want to reuse page_mkwrite, We will need to first patch all users, so to not-crash-on-no-page. Then enable this patch. But even if I did that I would not sleep so well at night. Adding a new vector is the safest thing to do, and is not that expensive. an extra pointer at a static function vector per driver. Also the new vector is better for performance, because else we Will call all current Kernel vectors, so to: check-ha-no-page-do-nothing and return. No need to call it from do_shared_fault because do_wp_page is called to change pte permissions anyway. Signed-off-by: Yigal Korman <yigal@plexistor.com> Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
2682582 mm/memory: also print a_ops->readpage in print_bad_pte() A lot of filesystems use generic_file_mmap() and filemap_fault(), f_op->mmap and vm_ops->fault aren't enough to identify filesystem. This prints file name, vm_ops->fault, f_op->mmap and a_ops->readpage (which is almost always implemented and filesystem-specific). Example: [ 23.676410] BUG: Bad page map in process sh pte:1b7e6025 pmd:19bbd067 [ 23.676887] page:ffffea00006df980 count:4 mapcount:1 mapping:ffff8800196426c0 index:0x97 [ 23.677481] flags: 0x10000000000000c(referenced|uptodate) [ 23.677896] page dumped because: bad pte [ 23.678205] addr:00007f52fcb17000 vm_flags:00000075 anon_vma: (null) mapping:ffff8800196426c0 index:97 [ 23.678922] file:libc-2.19.so fault:filemap_fault mmap:generic_file_readonly_mmap readpage:v9fs_vfs_readpage [akpm@linux-foundation.org: use pr_alert, per Kirill] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Sasha Levin <sasha.levin@oracle.com> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
9239361 mm/mempool.c: kasan: poison mempool elements Mempools keep allocated objects in reserved for situations when ordinary allocation may not be possible to satisfy. These objects shouldn't be accessed before they leave the pool. This patch poison elements when get into the pool and unpoison when they leave it. This will let KASan to detect use-after-free of mempool's elements. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Tested-by: David Rientjes <rientjes@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Chernenkov <drcheren@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
bda6d33 mm/cma_debug.c: remove blank lines before DEFINE_SIMPLE_ATTRIBUTE() Like EXPORT_SYMBOL(): the positioning communicates that the macro pertains to the immediately preceding function. Cc: Dmitry Safonov <d.safonov@partner.samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Stefan Strogin <stefan.strogin@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pintu Kumar <pintu.k@samsung.com> Cc: Weijie Yang <weijie.yang@samsung.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Vyacheslav Tyrtov <v.tyrtov@samsung.com> Cc: Aleksei Mateosian <a.mateosian@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
2e32b94 mm: cma: add functions to get region pages counters Here are two functions that provide interface to compute/get used size and size of biggest free chunk in cma region. Add that information to debugfs. [akpm@linux-foundation.org: move debug code from cma.c into cma_debug.c] [stefan.strogin@gmail.com: move code from cma_get_used() and cma_get_maxchunk() to cma_used_get() and cma_maxchunk_get()] Signed-off-by: Dmitry Safonov <d.safonov@partner.samsung.com> Signed-off-by: Stefan Strogin <stefan.strogin@gmail.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pintu Kumar <pintu.k@samsung.com> Cc: Weijie Yang <weijie.yang@samsung.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Vyacheslav Tyrtov <v.tyrtov@samsung.com> Cc: Aleksei Mateosian <a.mateosian@samsung.com> Signed-off-by: Stefan Strogin <stefan.strogin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
79553da thp: cleanup khugepaged startup Few trivial cleanups: - no need to call set_recommended_min_free_kbytes() from late_initcall() -- start_khugepaged() calls it; - no need to call set_recommended_min_free_kbytes() from start_khugepaged() if khugepaged is not started; - there isn't much point in running start_khugepaged() if we've just set transparent_hugepage_flags to zero; - start_khugepaged() is misnamed -- it also used to stop the thread; Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
e39155e mm: uninline and cleanup page-mapping related helpers Most-used page->mapping helper -- page_mapping() -- has already uninlined. Let's uninline also page_rmapping() and page_anon_vma(). It saves us depending on configuration around 400 bytes in text: text data bss dec hex filename 660318 99254 410000 1169572 11d8a4 mm/built-in.o-before 659854 99254 410000 1169108 11d6d4 mm/built-in.o I also tried to make code a bit more clean. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
99e8ea6 mm: cma: add trace events for CMA allocations and freeings Add trace events for cma_alloc() and cma_release(). The cma_alloc tracepoint is used both for successful and failed allocations, in case of allocation failure pfn=-1UL is stored and printed. Signed-off-by: Stefan Strogin <stefan.strogin@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mpn@google.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
cdd7875 include/linux/mm.h: simplify flag check Flip the flag test so that it is the simplest. No functional change, just a small readability improvement: No code changed: # arch/x86/kernel/sys_x86_64.o: text data bss dec hex filename 1551 24 0 1575 627 sys_x86_64.o.before 1551 24 0 1575 627 sys_x86_64.o.after md5: 70708d1b1ad35cc891118a69dc1a63f9 sys_x86_64.o.before.asm 70708d1b1ad35cc891118a69dc1a63f9 sys_x86_64.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
6a4055b mm/memblock.c: add debug output for memblock_add() memblock_reserve() calls memblock_reserve_region() which prints debugging information if 'memblock=debug' was passed on the command line. This patch adds the same behaviour, but for memblock_add function(). [akpm@linux-foundation.org: s/memblock_memory/memblock_add/ in message] Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Emil Medve <Emilian.Medve@freescale.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
7e1f049 mm: hugetlb: cleanup using paeg_huge_active() Now we have an easy access to hugepages' activeness, so existing helpers to get the information can be cleaned up. [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
bcc5422 mm: hugetlb: introduce page_huge_active We are not safe from calling isolate_huge_page() on a hugepage concurrently, which can make the victim hugepage in invalid state and results in BUG_ON(). The root problem of this is that we don't have any information on struct page (so easily accessible) about hugepages' activeness. Note that hugepages' activeness means just being linked to hstate->hugepage_activelist, which is not the same as normal pages' activeness represented by PageActive flag. Normal pages are isolated by isolate_lru_page() which prechecks PageLRU before isolation, so let's do similarly for hugetlb with a new paeg_huge_active(). set/clear_page_huge_active() should be called within hugetlb_lock. But hugetlb_cow() and hugetlb_no_page() don't do this, being justified because in these functions set_page_huge_active() is called right after the hugepage is allocated and no other thread tries to isolate it. [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/, make it return bool] [fengguang.wu@intel.com: set_page_huge_active() can be static] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
822fc61 mm: don't call __page_cache_release for hugetlb __put_compound_page() calls __page_cache_release() to do some freeing work, but it's obviously for thps, not for hugetlb. We don't care because PageLRU is always cleared and page->mem_cgroup is always NULL for hugetlb. But it's not correct and has potential risks, so let's make it conditional. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
9fcd145 mm/mmap.c: use while instead of if+goto The creators of the C language gave us the while keyword. Let's use that instead of synthesizing it from if+goto. Made possible by 6597d783397a ("mm/mmap.c: replace find_vma_prepare() with clearer find_vma_links()"). [akpm@linux-foundation.org: fix 80-col overflows] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
215ba78 mm, selftests: test return value of munmap for MAP_HUGETLB memory When MAP_HUGETLB memory is unmapped, the length must be hugepage aligned, otherwise it fails with -EINVAL. All tests currently behave correctly, but it's better to explcitly test the return value for completeness and document the requirement, especially if users copy map_hugetlb.c as a sample implementation. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Joern Engel <joern@logfs.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Eric B Munson <emunson@akamai.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
80d6b94 mm, doc: cleanup and clarify munmap behavior for hugetlb memory munmap(2) of hugetlb memory requires a length that is hugepage aligned, otherwise it may fail. Add this to the documentation. This also cleans up the documentation and separates it into logical units: one part refers to MAP_HUGETLB and another part refers to requirements for shared memory segments. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Joern Engel <joern@logfs.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
ae7efa5 thp: do not adjust zone water marks if khugepaged is not started set_recommended_min_free_kbytes() adjusts zone water marks to be suitable for khugepaged. We avoid doing this if khugepaged is disabled, but don't catch the case when khugepaged is failed to start. Let's address this by checking khugepaged_thread instead of khugepaged_enabled() in set_recommended_min_free_kbytes(). It's NULL if the kernel thread is stopped or failed to start. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
65ebb64 thp: handle errors in hugepage_init() properly We miss error-handling in few cases hugepage_init(). Let's fix that. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:18 UTC
bdfedb7 mm, mempool: poison elements backed by slab allocator Mempools keep elements in a reserved pool for contexts in which allocation may not be possible. When an element is allocated from the reserved pool, its memory contents is the same as when it was added to the reserved pool. Because of this, elements lack any free poisoning to detect use-after-free errors. This patch adds free poisoning for elements backed by the slab allocator. This is possible because the mempool layer knows the object size of each element. When an element is added to the reserved pool, it is poisoned with POISON_FREE. When it is removed from the reserved pool, the contents are checked for POISON_FREE. If there is a mismatch, a warning is emitted to the kernel log. This is only effective for configs with CONFIG_DEBUG_SLAB or CONFIG_SLUB_DEBUG_ON. [fabio.estevam@freescale.com: use '%zu' for printing 'size_t' variable] [arnd@arndb.de: add missing include] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:18 UTC
e244c9e mm, mempool: disallow mempools based on slab caches with constructors All occurrences of mempools based on slab caches with object constructors have been removed from the tree, so disallow creating them. We can only dereference mem->ctor in mm/mempool.c without including mm/slab.h in include/linux/mempool.h. So simply note the restriction, just like the comment restricting usage of __GFP_ZERO, and warn on kernels with CONFIG_DEBUG_VM() if such a mempool is allocated from. We don't want to incur this check on every element allocation, so use VM_BUG_ON(). Signed-off-by: David Rientjes <rientjes@google.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:18 UTC
ee14624 fs, jfs: remove slab object constructor Mempools based on slab caches with object constructors are risky because element allocation can happen either from the slab cache itself, meaning the constructor is properly called before returning, or from the mempool reserve pool, meaning the constructor is not called before returning, depending on the allocation context. For this reason, we should disallow creating mempools based on slab caches that have object constructors. Callers of mempool_alloc() will be responsible for properly initializing the returned element. Then, it doesn't matter if the element came from the slab cache or the mempool reserved pool. The only occurrence of a mempool being based on a slab cache with an object constructor in the tree is in fs/jfs/jfs_metapage.c. Remove it and properly initialize the element in alloc_metapage(). At the same time, META_free is never used, so remove it as well. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:18 UTC
4db0c3c mm: remove rest of ACCESS_ONCE() usages We converted some of the usages of ACCESS_ONCE to READ_ONCE in the mm/ tree since it doesn't work reliably on non-scalar types. This patch removes the rest of the usages of ACCESS_ONCE, and use the new READ_ONCE API for the read accesses. This makes things cleaner, instead of using separate/multiple sets of APIs. Signed-off-by: Jason Low <jason.low2@hp.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:18 UTC
9d8c47e mm: use READ_ONCE() for non-scalar types Commit 38c5ce936a08 ("mm/gup: Replace ACCESS_ONCE with READ_ONCE") converted ACCESS_ONCE usage in gup_pmd_range() to READ_ONCE, since ACCESS_ONCE doesn't work reliably on non-scalar types. This patch also fixes the other ACCESS_ONCE usages in gup_pte_range() and __get_user_pages_fast() in mm/gup.c Signed-off-by: Jason Low <jason.low2@hp.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:18 UTC
back to top