sort by:
Revision Author Date Message Commit Date
6fd4b15 crypto: caam - improve initalization for context state saves Multiple function in asynchronous hashing use a saved-state block, a.k.a. struct caam_hash_state, which holds a stash of information between requests (init/update/final). Certain values in this state block are loaded for processing using an inline-if, and when this is done, the potential for uninitialized data can pose conflicts. Therefore, this patch improves initialization of state data to prevent false assignments using uninitialized data in the state block. This patch addresses the following traceback, originating in ahash_final_ctx(), although a problem like this could certainly exhibit other symptoms: kernel BUG at arch/arm/mm/dma-mapping.c:465! Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = 80004000 [00000000] *pgd=00000000 Internal error: Oops: 805 [#1] PREEMPT SMP Modules linked in: CPU: 0 Not tainted (3.0.15-01752-gdd441b9-dirty #40) PC is at __bug+0x1c/0x28 LR is at __bug+0x18/0x28 pc : [<80043240>] lr : [<8004323c>] psr: 60000013 sp : e423fd98 ip : 60000013 fp : 0000001c r10: e4191b84 r9 : 00000020 r8 : 00000009 r7 : 88005038 r6 : 00000001 r5 : 2d676572 r4 : e4191a60 r3 : 00000000 r2 : 00000001 r1 : 60000093 r0 : 00000033 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c53c7d Table: 1000404a DAC: 00000015 Process cryptomgr_test (pid: 1306, stack limit = 0xe423e2f0) Stack: (0xe423fd98 to 0xe4240000) fd80: 11807fd1 80048544 fda0: 88005000 e4191a00 e5178040 8039dda0 00000000 00000014 2d676572 e4191008 fdc0: 88005018 e4191a60 00100100 e4191a00 00000000 8039ce0c e423fea8 00000007 fde0: e4191a00 e4227000 e5178000 8039ce18 e419183c 80203808 80a94a44 00000006 fe00: 00000000 80207180 00000000 00000006 e423ff08 00000000 00000007 e5178000 fe20: e41918a4 80a949b4 8c4844e2 00000000 00000049 74227000 8c4844e2 00000e90 fe40: 0000000e 74227e90 ffff8c58 80ac29e0 e423fed4 8006a350 8c81625c e423ff5c fe60: 00008576 e4002500 00000003 00030010 e4002500 00000003 e5180000 e4002500 fe80: e5178000 800e6d24 007fffff 00000000 00000010 e4001280 e4002500 60000013 fea0: 000000d0 804df078 00000000 00000000 00000000 00000000 00000000 00000000 fec0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 fee0: 00000000 00000000 e4227000 e4226000 e4753000 e4752000 e40a5000 e40a4000 ff00: e41e7000 e41e6000 00000000 00000000 00000000 e423ff14 e423ff14 00000000 ff20: 00000400 804f9080 e5178000 e4db0b40 00000000 e4db0b80 0000047c 00000400 ff40: 00000000 8020758c 00000400 ffffffff 0000008a 00000000 e4db0b40 80206e00 ff60: e4049dbc 00000000 00000000 00000003 e423ffa4 80062978 e41a8bfc 00000000 ff80: 00000000 e4049db4 00000013 e4049db0 00000013 00000000 00000000 00000000 ffa0: e4db0b40 e4db0b40 80204cbc 00000013 00000000 00000000 00000000 80204cfc ffc0: e4049da0 80089544 80040a40 00000000 e4db0b40 00000000 00000000 00000000 ffe0: e423ffe0 e423ffe0 e4049da0 800894c4 80040a40 80040a40 00000000 00000000 [<80043240>] (__bug+0x1c/0x28) from [<80048544>] (___dma_single_dev_to_cpu+0x84) [<80048544>] (___dma_single_dev_to_cpu+0x84/0x94) from [<8039dda0>] (ahash_fina) [<8039dda0>] (ahash_final_ctx+0x180/0x428) from [<8039ce18>] (ahash_final+0xc/0) [<8039ce18>] (ahash_final+0xc/0x10) from [<80203808>] (crypto_ahash_op+0x28/0xc) [<80203808>] (crypto_ahash_op+0x28/0xc0) from [<80207180>] (test_hash+0x214/0x5) [<80207180>] (test_hash+0x214/0x5b8) from [<8020758c>] (alg_test_hash+0x68/0x8c) [<8020758c>] (alg_test_hash+0x68/0x8c) from [<80206e00>] (alg_test+0x7c/0x1b8) [<80206e00>] (alg_test+0x7c/0x1b8) from [<80204cfc>] (cryptomgr_test+0x40/0x48) [<80204cfc>] (cryptomgr_test+0x40/0x48) from [<80089544>] (kthread+0x80/0x88) [<80089544>] (kthread+0x80/0x88) from [<80040a40>] (kernel_thread_exit+0x0/0x8) Code: e59f0010 e1a01003 eb126a8d e3a03000 (e5833000) ---[ end trace d52a403a1d1eaa86 ]--- Cc: stable@vger.kernel.org Signed-off-by: Steve Cornelius <steve.cornelius@freescale.com> Signed-off-by: Victoria Milhoan <vicki.milhoan@freescale.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 16 June 2015, 06:20:38 UTC
f858c7b crypto: algif_aead - Disable AEAD user-space for now The newly added AEAD user-space isn't quite ready for prime time just yet. In particular it is conflicting with the AEAD single SG list interface change so this patch disables it now. Once the SG list stuff is completely done we can then renable this interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 26 May 2015, 07:51:45 UTC
a1cae34 crypto: s390/ghash - Fix incorrect ghash icv buffer handling. Multitheaded tests showed that the icv buffer in the current ghash implementation is not handled correctly. A move of this working ghash buffer value to the descriptor context fixed this. Code is tested and verified with an multithreaded application via af_alg interface. Cc: stable@vger.kernel.org Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Gerald Schaefer <geraldsc@linux.vnet.ibm.com> Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 22 May 2015, 03:23:03 UTC
7b2a18e crypto: algif_aead - fix invalid sgl linking This patch fixes it. Also minor updates to comments. Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 18 May 2015, 04:13:06 UTC
ec59a65 crypto: arm64/sha2-ce - prevent asm code finalization in final() path Ensure that the asm code finalization path is not triggered when invoked via final(), since it already takes care of that itself. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 07 May 2015, 03:16:26 UTC
bf7883e crypto: arm64/sha1-ce - prevent asm code finalization in final() path Ensure that the asm code finalization path is not triggered when invoked via final(), since it already takes care of that itself. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 07 May 2015, 03:16:25 UTC
ac02c6e crypto: arm64/crc32 - bring in line with generic CRC32 The arm64 CRC32 (not CRC32c) implementation was not quite doing the same thing as the generic one. Fix that. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 07 May 2015, 03:16:24 UTC
f440c4e hwrng: bcm63xx - Fix driver compilation - s/clk_didsable_unprepare/clk_disable_unprepare - s/prov/priv - s/error/ret (bcm63xx_rng_probe) Fixes: 6229c16060fe ("hwrng: bcm63xx - make use of devm_hwrng_register") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 04 May 2015, 09:49:52 UTC
7829fb0 lib: make memzero_explicit more robust against dead store elimination In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in case LTO would decide to inline memzero_explicit() and eventually find out it could be elimiated as dead store. While using barrier() works well for the case of gcc, recent efforts from LLVMLinux people suggest to use llvm as an alternative to gcc, and there, Stephan found in a simple stand-alone user space example that llvm could nevertheless optimize and thus elimitate the memset(). A similar issue has been observed in the referenced llvm bug report, which is regarded as not-a-bug. Based on some experiments, icc is a bit special on its own, while it doesn't seem to eliminate the memset(), it could do so with an own implementation, and then result in similar findings as with llvm. The fix in this patch now works for all three compilers (also tested with more aggressive optimization levels). Arguably, in the current kernel tree it's more of a theoretical issue, but imho, it's better to be pedantic about it. It's clearly visible with gcc/llvm though, with the below code: if we would have used barrier() only here, llvm would have omitted clearing, not so with barrier_data() variant: static inline void memzero_explicit(void *s, size_t count) { memset(s, 0, count); barrier_data(s); } int main(void) { char buff[20]; memzero_explicit(buff, sizeof(buff)); return 0; } $ gcc -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp) 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp) 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp) 0x000000000040041f <+31>: xor %eax,%eax 0x0000000000400421 <+33>: retq End of assembler dump. $ clang -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp) 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp) 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax 0x0000000000400505 <+21>: xor %eax,%eax 0x0000000000400507 <+23>: retq End of assembler dump. As gcc, clang, but also icc defines __GNUC__, it's sufficient to define this in compiler-gcc.h only to be picked up. For a fallback or otherwise unsupported compiler, we define it as a barrier. Similarly, for ecc which does not support gcc inline asm. Reference: https://llvm.org/bugs/show_bug.cgi?id=15495 Reported-by: Stephan Mueller <smueller@chronox.de> Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Stephan Mueller <smueller@chronox.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: mancha security <mancha1@zoho.com> Cc: Mark Charlebois <charlebm@gmail.com> Cc: Behan Webster <behanw@converseincode.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 04 May 2015, 09:49:51 UTC
8c98ebd crypto: img-hash - CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA If NO_DMA=y: drivers/built-in.o: In function `img_hash_write_via_dma_stop': img-hash.c:(.text+0xa2b822): undefined reference to `dma_unmap_sg' drivers/built-in.o: In function `img_hash_xmit_dma': img-hash.c:(.text+0xa2b8d8): undefined reference to `dma_map_sg' img-hash.c:(.text+0xa2b948): undefined reference to `dma_unmap_sg' Also move the "depends" section below the "tristate" line while we're at it. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 24 April 2015, 12:14:09 UTC
00425bb crypto: x86/sha512_ssse3 - fixup for asm function prototype change Patch e68410ebf626 ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer") changed the prototypes of the core asm SHA-512 implementations so that they are compatible with the prototype used by the base layer. However, in one instance, the register that was used for passing the input buffer was reused as a scratch register later on in the code, and since the input buffer param changed places with the digest param -which needs to be written back before the function returns- this resulted in the scratch register to be dereferenced in a memory write operation, causing a GPF. Fix this by changing the scratch register to use the same register as the input buffer param again. Fixes: e68410ebf626 ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer") Reported-By: Bobby Powers <bobbypowers@gmail.com> Tested-By: Bobby Powers <bobbypowers@gmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 24 April 2015, 12:09:01 UTC
34c9a0f crypto: fix broken crypto_register_instance() module handling Commit 9c521a200bc3 ("crypto: api - remove instance when test failed") tried to grab a module reference count before the module was even set. Worse, it then goes on to free the module reference count after it is set so you quickly end up with a negative module reference count which prevents people from using any instances belonging to that module. This patch moves the module initialisation before the reference count. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 16 April 2015, 03:26:16 UTC
eea3a00 Merge branch 'akpm' (patches from Andrew) Merge second patchbomb from Andrew Morton: - the rest of MM - various misc bits - add ability to run /sbin/reboot at reboot time - printk/vsprintf changes - fiddle with seq_printf() return value * akpm: (114 commits) parisc: remove use of seq_printf return value lru_cache: remove use of seq_printf return value tracing: remove use of seq_printf return value cgroup: remove use of seq_printf return value proc: remove use of seq_printf return value s390: remove use of seq_printf return value cris fasttimer: remove use of seq_printf return value cris: remove use of seq_printf return value openrisc: remove use of seq_printf return value ARM: plat-pxa: remove use of seq_printf return value nios2: cpuinfo: remove use of seq_printf return value microblaze: mb: remove use of seq_printf return value ipc: remove use of seq_printf return value rtc: remove use of seq_printf return value power: wakeup: remove use of seq_printf return value x86: mtrr: if: remove use of seq_printf return value linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK MAINTAINERS: CREDITS: remove Stefano Brivio from B43 .mailmap: add Ricardo Ribalda CREDITS: add Ricardo Ribalda Delgado ... 15 April 2015, 23:39:15 UTC
e693d73 parisc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
d50f8f8 lru_cache: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
962e370 tracing: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Miscellanea: o Remove unused return value from trace_lookup_stack Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
94ff212 cgroup: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
25ce319 proc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
c2f0b61 s390: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
dc640a8 cris fasttimer: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Miscellanea: o Coalesce formats, realign arguments Signed-off-by: Joe Perches <joe@perches.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
1336d42 cris: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
58a1aa7 openrisc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Jonas Bonn <jonas@southpole.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
cd2b293 ARM: plat-pxa: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, (as it is here, it doesn't return # of chars emitted) will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Russell King <linux@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:25 UTC
4122669 nios2: cpuinfo: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Ley Foon Tan <lftan@altera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
81f0cd9 microblaze: mb: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
7f032d6 ipc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
4395eb1 rtc: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
9f6a240 power: wakeup: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <len.brown@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
3ac62bc x86: mtrr: if: remove use of seq_printf return value The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: Joe Perches <joe@perches.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
89c1e79 linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK The macro BITMAP_LAST_WORD_MASK can be implemented without a conditional, which will generally lead to slightly better generated code (221 bytes saved for allmodconfig-GCOV_KERNEL, ~2k with GCOV_KERNEL). As a small bonus, this also ensures that the nbits parameter is expanded exactly once. In BITMAP_FIRST_WORD_MASK, if start is signed gcc is technically allowed to assume it is positive (or divisible by BITS_PER_LONG), and hence just do the simple mask. It doesn't seem to use this, and even on an architecture like x86 where the shift only depends on the lower 5 or 6 bits, and these bits are not affected by the signedness of the expression, gcc still generates code to compute the C99 mandated value of start % BITS_PER_LONG. So just use a mask explicitly, also for consistency with BITMAP_LAST_WORD_MASK. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Tejun Heo <tj@kernel.org> Reviewed-by: George Spelvin <linux@horizon.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
8a72ed6 MAINTAINERS: CREDITS: remove Stefano Brivio from B43 This email address isn't working anymore Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
01c7e5a .mailmap: add Ricardo Ribalda Work and Home computer had different settings in the mail client. Some contributions appear as Ricardo Ribalda, others as Ricardo Ribalda Delgado (and one as just Ricardo). Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
3ca7bf8 CREDITS: add Ricardo Ribalda Delgado Add personal details to CREDITS file. Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
49e7d9d MAINTAINERS: Use tabs consistently Consistently use a single tab after the "specifier:" type. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
41416f2 lib/string_helpers.c: change semantics of string_escape_mem The current semantics of string_escape_mem are inadequate for one of its current users, vsnprintf(). If that is to honour its contract, it must know how much space would be needed for the entire escaped buffer, and string_escape_mem provides no way of obtaining that (short of allocating a large enough buffer (~4 times input string) to let it play with, and that's definitely a big no-no inside vsnprintf). So change the semantics for string_escape_mem to be more snprintf-like: Return the size of the output that would be generated if the destination buffer was big enough, but of course still only write to the part of dst it is allowed to, and (contrary to snprintf) don't do '\0'-termination. It is then up to the caller to detect whether output was truncated and to append a '\0' if desired. Also, we must output partial escape sequences, otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever they previously contained. This also fixes a bug in the escaped_string() helper function, which used to unconditionally pass a length of "end-buf" to string_escape_mem(); since the latter doesn't check osz for being insanely large, it would happily write to dst. For example, kasprintf(GFP_KERNEL, "something and then %pE", ...); is an easy way to trigger an oops. In test-string_helpers.c, the -ENOMEM test is replaced with testing for getting the expected return value even if the buffer is too small. We also ensure that nothing is written (by relying on a NULL pointer deref) if the output size is 0 by passing NULL - this has to work for kasprintf("%pE") to work. In net/sunrpc/cache.c, I think qword_add still has the same semantics. Someone should definitely double-check this. In fs/proc/array.c, I made the minimum possible change, but longer-term it should stop poking around in seq_file internals. [andriy.shevchenko@linux.intel.com: simplify qword_add] [andriy.shevchenko@linux.intel.com: add missed curly braces] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:24 UTC
3aeddc7 lib/string_helpers.c: refactor string_escape_mem When printf is given the format specifier %pE, it needs a way of obtaining the total output size that would be generated if the buffer was large enough, and string_escape_mem doesn't easily provide that. This is a refactorization of string_escape_mem in preparation of changing its external API to provide that information. The somewhat ugly early returns and subsequent seemingly redundant conditionals are to make the following patch touch as little as possible in string_helpers.c while still preserving the current behaviour of never outputting partial escape sequences. That behaviour must also change for %pE to work as one expects from every other printf specifier. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
9c98f23 lib/vsprintf.c: fix potential NULL deref in hex_string The helper hex_string() is broken in two ways. First, it doesn't increment buf regardless of whether there is room to print, so callers such as kasprintf() that try to probe the correct storage to allocate will get a too small return value. But even worse, kasprintf() (and likely anyone else trying to find the size of the result) pass NULL for buf and 0 for size, so we also have end == NULL. But this means that the end-1 in hex_string() is (char*)-1, so buf < end-1 is true and we get a NULL pointer deref. I double-checked this with a trivial kernel module that just did a kasprintf(GFP_KERNEL, "%14ph", "CrashBoomBang"). Nobody seems to be using %ph with kasprintf, but we might as well fix it before it hits someone. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
900cca2 lib/vsprintf: add %pC{,n,r} format specifiers for clocks Add format specifiers for printing struct clk: - '%pC' or '%pCn': name (Common Clock Framework) or address (legacy clock framework) of the clock, - '%pCr': rate of the clock. [akpm@linux-foundation.org: omit code if !CONFIG_HAVE_CLK] Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Turquette <mturquette@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
e8a7ba5 lib/vsprintf: Move integer format types to the top Move the format types for 64-bit integers and configurable size integers to the top, so they're next to the other integer format types. While at it, add the missing format types for s32 and u32. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Turquette <mturquette@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
7330660 lib/vsprintf: document %p parameters passed by reference This patch series improves the documentation for printk() formats, and adds support for printing clocks. The latter has always been a hassle if you wanted to support both the common and legacy clock frameworks. - '%pC' and '%pCn' print the name (Common Clock Framework) or address (legacy clock framework) of a clock, - '%pCr' prints the current clock rate. This patch (of 3): Make sure all %p extensions that take parameters by references are documented to do so. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mike Turquette <mturquette@linaro.org> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
d1c1b12 lib/vsprintf.c: another small hack Making ZEROPAD == '0'-' ', we can eliminate a few more instructions. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
3ea8d44 lib/vsprintf.c: eliminate duplicate hex string array gcc doesn't merge or overlap const char[] objects with identical contents (probably language lawyers would also insist that these things have different addresses), but there's no reason to have the string "0123456789ABCDEF" occur in multiple places. hex_asc_upper is declared in kernel.h and defined in lib/hexdump.c, which is unconditionally compiled in. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
e26c12c lib/vsprintf.c: reduce stack use in number() At least since the initial git commit, when base was passed as a separate parameter, number() has only been called with bases 8, 10 and 16. I'm guessing that 66 was to accommodate 64 0/1, a sign and a '\0', but the buffer is only used for the actual digits. Octal digits carry 3 bits of information, so 24 is enough. Spell that 3*sizeof(num) so one less place needs to be changed should long long ever be 128 bits. Also remove the commented-out code that would handle an arbitrary base. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
51be17d lib/vsprintf.c: eliminate some branches Since FORMAT_TYPE_INT is simply 1 more than FORMAT_TYPE_UINT, and similarly for BYTE/UBYTE, SHORT/USHORT, LONG/ULONG, we can eliminate a few instructions by making SIGN have the value 1 instead of 2, and then use arithmetic instead of branches for computing the right spec->type. It's a little hacky, but certainly in the same spirit as SMALL needing to have the value 0x20. For example for the spec->qualifier == 'l' case, gcc now generates 75e: 0f b6 53 01 movzbl 0x1(%rbx),%edx 762: 83 e2 01 and $0x1,%edx 765: 83 c2 09 add $0x9,%edx 768: 88 13 mov %dl,(%rbx) instead of 763: 0f b6 53 01 movzbl 0x1(%rbx),%edx 767: 83 e2 02 and $0x2,%edx 76a: 80 fa 01 cmp $0x1,%dl 76d: 19 d2 sbb %edx,%edx 76f: 83 c2 0a add $0xa,%edx 772: 88 13 mov %dl,(%rbx) Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
7b1460e printk: comment pr_cont() stating it is only to continue a line KERN_CONT is nicely commented in kern_levels.h, but pr_cont() is now used more often, and it lacks the comment stating what it is used for. It can be confused as continuing the log level, but that is not its purpose. Its purpose is to continue a line that had no newline enclosed. This should be documented by pr_cont() as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
e243304 powerpc/powernv: reboot when requested by firmware Use orderly_reboot so userspace will to shut itself down via the reboot path. This is required for graceful reboot initiated by the BMC, such as when a user uses ipmitool to issue a 'chassis power cycle' command. Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fabian Frederick <fabf@skynet.be> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
7a54f46 kernel/reboot.c: add orderly_reboot for graceful reboot The kernel has orderly_poweroff which allows the kernel to initiate a graceful shutdown of userspace, by running /sbin/poweroff. This adds orderly_reboot that will cause userspace to shut itself down by calling /sbin/reboot. This will be used for shutdown initiated by a system controller on platforms that do not use ACPI. orderly_reboot() should be used when the system wants to allow userspace to gracefully shut itself down. For cases where the system may imminently catch on fire, the existing emergency_restart() provides an immediate reboot without involving userspace. Signed-off-by: Joel Stanley <joel@jms.id.au> Cc: Fabian Frederick <fabf@skynet.be> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
7975a9b drivers/sbus/char/envctrl.c: ignore orderly_poweroff return value orderly_poweroff() unconditionally returns 0, so remove the dead code that checks the return value. A future patch will change the return type to void. Signed-off-by: Joel Stanley <joel@jms.id.au> Acked-by: David S. Miller <davem@davemloft.net> Cc: Fabian Frederick <fabf@skynet.be> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:23 UTC
972fae6 kernel/hung_task.c: change hung_task.c to use for_each_process_thread() In check_hung_uninterruptible_tasks() avoid the use of deprecated while_each_thread(). The "max_count" logic will prevent a livelock - see commit 0c740d0a ("introduce for_each_thread() to replace the buggy while_each_thread()"). Having said this let's use for_each_process_thread(). Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Wysochanski <dwysocha@redhat.com> Cc: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
96831c0 kernel/resource.c: remove deprecated __check_region() and friends All users of __check_region(), check_region(), and check_mem_region() are gone. We got rid of the last user in v4.0-rc1. Remove them. bloat-o-meter on x86_64 shows: add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-102 (-102) function old new delta __kstrtab___check_region 15 - -15 __ksymtab___check_region 16 - -16 __check_region 71 - -71 Signed-off-by: Jakub Sitnicki <jsitnicki@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
2813893 kernel: conditionally support non-root users, groups and capabilities There are a lot of embedded systems that run most or all of their functionality in init, running as root:root. For these systems, supporting multiple users is not necessary. This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for non-root users, non-root groups, and capabilities optional. It is enabled under CONFIG_EXPERT menu. When this symbol is not defined, UID and GID are zero in any possible case and processes always have all capabilities. The following syscalls are compiled out: setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset. Also, groups.c is compiled out completely. In kernel/capability.c, capable function was moved in order to avoid adding two ifdef blocks. This change saves about 25 KB on a defconfig build. The most minimal kernels have total text sizes in the high hundreds of kB rather than low MB. (The 25k goes down a bit with allnoconfig, but not that much. The kernel was booted in Qemu. All the common functionalities work. Adding users/groups is not possible, failing with -ENOSYS. Bloat-o-meter output: add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
c79574a lib/test-hexdump.c: fix initconst confusion const char *...[] is not const, but an array of pointer to const. So these arrays cannot be __initconst, but must be __initdata This fixes section conflicts with LTO. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
946e879 paride: fix the "verbose" module param The verbose module parameter can be set to 2 for extremely verbose messages so the type should be int instead of bool. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Tim Waugh <tim@cyberelk.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
23f40a9 include/linux: remove empty conditionals Commit 607ca46e97a1 ("UAPI: (Scripted) Disintegrate include/linux") left behind some empty conditional blocks. Since they are useless and may cause a reader to wonder whether something is missing, remove them. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
e4bc332 /proc/PID/status: show all sets of pid according to ns If some issues occurred inside a container guest, host user could not know which process is in trouble just by guest pid: the users of container guest only knew the pid inside containers. This will bring obstacle for trouble shooting. This patch adds four fields: NStgid, NSpid, NSpgid and NSsid: a) In init_pid_ns, nothing changed; b) In one pidns, will tell the pid inside containers: NStgid: 21776 5 1 NSpid: 21776 5 1 NSpgid: 21776 5 1 NSsid: 21729 1 0 ** Process id is 21776 in level 0, 5 in level 1, 1 in level 2. c) If pidns is nested, it depends on which pidns are you in. NStgid: 5 1 NSpid: 5 1 NSpgid: 5 1 NSsid: 1 0 ** Views from level 1 [akpm@linux-foundation.org: add CONFIG_PID_NS ifdef] Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Tested-by: Serge Hallyn <serge.hallyn@canonical.com> Tested-by: Nathan Scott <nathans@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
201c7b7 zram: fix error return code Return a negative error code on failure. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier ret; expression e1,e2; @@ ( if (\(ret < 0\|ret != 0\)) { ... return ret; } | ret = 0 ) ... when != ret = e1 when != &ret *if(...) { ... when != ret = e2 when forall return ret; } // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
160a117 zsmalloc: remove extra cond_resched() in __zs_compact Do not perform cond_resched() before the busy compaction loop in __zs_compact(), because this loop does it when needed. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
81da9b1 zsmalloc: fix fatal corruption due to wrong size class selection There is no point in overriding the size class below. It causes fatal corruption on the next chunk on the 3264-bytes size class, which is the last size class that is not huge. For example, if the requested size was exactly 3264 bytes, current zsmalloc allocates and returns a chunk from the size class of 3264 bytes, not 4096. User access to this chunk may overwrite head of the next adjacent chunk. Here is the panic log captured when freelist was corrupted due to this: Kernel BUG at ffffffc00030659c [verbose debug info unavailable] Internal error: Oops - BUG: 96000006 [#1] PREEMPT SMP Modules linked in: exynos-snapshot: core register saved(CPU:5) CPUMERRSR: 0000000000000000, L2MERRSR: 0000000000000000 exynos-snapshot: context saved(CPU:5) exynos-snapshot: item - log_kevents is disabled CPU: 5 PID: 898 Comm: kswapd0 Not tainted 3.10.61-4497415-eng #1 task: ffffffc0b8783d80 ti: ffffffc0b71e8000 task.ti: ffffffc0b71e8000 PC is at obj_idx_to_offset+0x0/0x1c LR is at obj_malloc+0x44/0xe8 pc : [<ffffffc00030659c>] lr : [<ffffffc000306604>] pstate: a0000045 sp : ffffffc0b71eb790 x29: ffffffc0b71eb790 x28: ffffffc00204c000 x27: 000000000001d96f x26: 0000000000000000 x25: ffffffc098cc3500 x24: ffffffc0a13f2810 x23: ffffffc098cc3501 x22: ffffffc0a13f2800 x21: 000011e1a02006e3 x20: ffffffc0a13f2800 x19: ffffffbc02a7e000 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000feb x15: 0000000000000000 x14: 00000000a01003e3 x13: 0000000000000020 x12: fffffffffffffff0 x11: ffffffc08b264000 x10: 00000000e3a01004 x9 : ffffffc08b263fea x8 : ffffffc0b1e611c0 x7 : ffffffc000307d24 x6 : 0000000000000000 x5 : 0000000000000038 x4 : 000000000000011e x3 : ffffffbc00003e90 x2 : 0000000000000cc0 x1 : 00000000d0100371 x0 : ffffffbc00003e90 Reported-by: Sooyong Suk <s.suk@samsung.com> Signed-off-by: Heesub Shin <heesub.shin@samsung.com> Tested-by: Sooyong Suk <s.suk@samsung.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
839373e zsmalloc: remove unnecessary insertion/removal of zspage in compaction In putback_zspage, we don't need to insert a zspage into list of zspage in size_class again to just fix fullness group. We could do directly without reinsertion so we could save some instuctions. Reported-by: Heesub Shin <heesub.shin@samsung.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Ganesh Mahendran <opensource.ganesh@gmail.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Juneho Choi <juno.choi@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
495819e zsmalloc: micro-optimize zs_object_copy() A micro-optimization. Avoid additional branching and reduce (a bit) registry pressure (f.e. s_off += size; d_off += size; may be calculated twise: first for >= PAGE_SIZE check and later for offset update in "else" clause). scripts/bloat-o-meter shows some improvement add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10) function old new delta zs_object_copy 550 540 -10 Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:22 UTC
1ec7cfb zsmalloc: remove synchronize_rcu from zs_compact() Do not synchronize rcu in zs_compact(). Neither zsmalloc not zram use rcu. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
8f7d282 zram: deprecate zram attrs sysfs nodes Add Documentation/ABI/obsolete/sysfs-block-zram file and list obsolete and deprecated attributes there. The patch also adds additional information to zram documentation and describes the basic strategy: - the existing RW nodes will be downgraded to WO nodes (in 4.11) - deprecated RO sysfs nodes will eventually be removed (in 4.11) Users will be additionally notified about deprecated attr usage by pr_warn_once() (added to every deprecated attr _show()), as suggested by Minchan Kim. User space is advised to use zram<id>/stat, zram<id>/io_stat and zram<id>/mm_stat files. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
4f2109f zram: export new 'mm_stat' sysfs attrs Per-device `zram<id>/mm_stat' file provides mm statistics of a particular zram device in a format similar to block layer statistics. The file consists of a single line and represents the following stats (separated by whitespace): orig_data_size compr_data_size mem_used_total mem_limit mem_used_max zero_pages num_migrated Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
2f6a3be zram: export new 'io_stat' sysfs attrs Per-device `zram<id>/io_stat' file provides accumulated I/O statistics of particular zram device in a format similar to block layer statistics. The file consists of a single line and represents the following stats (separated by whitespace): failed_reads failed_writes invalid_io notify_free Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
77ba015 zram: describe device attrs in documentation Briefly describe exported device stat attrs in zram documentation. We will eventually get rid of per-stat sysfs nodes and, thus, clean up Documentation/ABI/testing/sysfs-block-zram file, which is the only source of information about device sysfs nodes. Add `num_migrated' description, since there is no independent `num_migrated' sysfs node (and no corresponding sysfs-block-zram entry), it will be exported via zram<id>/mm_stat file. At this point we can provide minimal description, because sysfs-block-zram still contains detailed information. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
8811a94 zram: use generic start/end io accounting Use bio generic_start_io_acct() and generic_end_io_acct() to account device's block layer statistics. This will let users to monitor zram activities using sysstat and similar packages/tools. Apart from the usual per-stat sysfs attr, zram IO stats are now also available in '/sys/block/zram<id>/stat' and '/proc/diskstats' files. We will slowly get rid of per-stat sysfs files. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
c72c616 zram: move compact_store() to sysfs functions area A cosmetic change. We have a new code layout and keep zram per-device sysfs store and show functions in one place. Move compact_store() to that handlers block to conform to current layout. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
10447b6 zram: remove `num_migrated' device attr This patch introduces rework to zram stats. We have per-stat sysfs nodes, and it makes things a bit hard to use in user space: it doesn't give an immediate stats 'snapshot', it requires user space to use more syscalls - open, read, close for every stat file, with appropriate error checks on every step, etc. First, zram now accounts block layer statistics, available in /sys/block/zram<id>/stat and /proc/diskstats files. So some new stats are available (see Documentation/block/stat.txt), besides, zram's activities now can be monitored by sysstat's iostat or similar tools. Example: cat /sys/block/zram0/stat 248 0 1984 0 251029 0 2008232 5120 0 5116 5116 Second, group currently exported on per-stat basis nodes into two categories (files): -- zram<id>/io_stat accumulates device's IO stats, that are not accounted by block layer, and contains: failed_reads failed_writes invalid_io notify_free Example: cat /sys/block/zram0/io_stat 0 0 0 652572 -- zram<id>/mm_stat accumulates zram mm stats and contains: orig_data_size compr_data_size mem_used_total mem_limit mem_used_max zero_pages num_migrated Example: cat /sys/block/zram0/mm_stat 434634752 270288572 279158784 0 579895296 15060 0 per-stat sysfs nodes are now considered to be deprecated and we plan to remove them (and clean up some of the existing stat code) in two years (as of now, there is no warning printed to syslog about deprecated stats being used). User space is advised to use the above mentioned 3 files. This patch (of 7): Remove sysfs `num_migrated' attribute. We are moving away from per-stat device attrs towards 3 stat files that will accumulate io and mm stats in a format similar to block layer statistics in /sys/block/<dev>/stat. That will be easier to use in user space, and reduce the number of syscalls needed to read zram device statistics. `num_migrated' will return back in zram<id>/mm_stat file. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
888fa37 mm/zsmalloc.c: fix comment for get_pages_per_zspage Signed-off-by: Yinghao Xie <yinghao.xie@sumsung.com> Suggested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
d02be50 zsmalloc: zsmalloc documentation Create zsmalloc doc which explains design concept and stat information. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
248ca1b zsmalloc: add fullness into stat During investigating compaction, fullness information of each class is helpful for investigating how the compaction works well. With that, we could know how compaction works well more clear on each size class. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
7b60a68 zsmalloc: record handle in page->private for huge object We store handle on header of each allocated object so it increases the size of each object by sizeof(unsigned long). If zram stores 4096 bytes to zsmalloc(ie, bad compression), zsmalloc needs 4104B-class to add handle. However, 4104B-class has 1-pages_per_zspage so wasted size by internal fragment is 8192 - 4104, which is terrible. So this patch records the handle in page->private on such huge object(ie, pages_per_zspage == 1 && maxobj_per_zspage == 1) instead of header of each object so we could use 4096B-class, not 4104B-class. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
4e3ba87 zram: support compaction Now that zsmalloc supports compaction, zram can use it. For the first step, this patch exports compact knob via sysfs so user can do compaction via "echo 1 > /sys/block/zram0/compact". Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:21 UTC
d3d07c9 zsmalloc: adjust ZS_ALMOST_FULL Curretly, zsmalloc regards a zspage as ZS_ALMOST_EMPTY if the zspage has under 1/4 used objects(ie, fullness_threshold_frac). It could make result in loose packing since zsmalloc migrates only ZS_ALMOST_EMPTY zspage out. This patch changes the rule so that zsmalloc makes zspage which has above 3/4 used object ZS_ALMOST_FULL so it could make tight packing. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
312fcae zsmalloc: support compaction This patch provides core functions for migration of zsmalloc. Migraion policy is simple as follows. for each size class { while { src_page = get zs_page from ZS_ALMOST_EMPTY if (!src_page) break; dst_page = get zs_page from ZS_ALMOST_FULL if (!dst_page) dst_page = get zs_page from ZS_ALMOST_EMPTY if (!dst_page) break; migrate(from src_page, to dst_page); } } For migration, we need to identify which objects in zspage are allocated to migrate them out. We could know it by iterating of freed objects in a zspage because first_page of zspage keeps free objects singly-linked list but it's not efficient. Instead, this patch adds a tag(ie, OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check whether the object is allocated easily. This patch adds another status bit in handle to synchronize between user access through zs_map_object and migration. During migration, we cannot move objects user are using due to data coherency between old object and new object. [akpm@linux-foundation.org: zsmalloc.c needs sched.h for cond_resched()] Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
c780626 zsmalloc: factor out obj_[malloc|free] In later patch, migration needs some part of functions in zs_malloc and zs_free so this patch factor out them. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
2e40e16 zsmalloc: decouple handle and object Recently, we started to use zram heavily and some of issues popped. 1) external fragmentation I got a report from Juneho Choi that fork failed although there are plenty of free pages in the system. His investigation revealed zram is one of the culprit to make heavy fragmentation so there was no more contiguous 16K page for pgd to fork in the ARM. 2) non-movable pages Other problem of zram now is that inherently, user want to use zram as swap in small memory system so they use zRAM with CMA to use memory efficiently. However, unfortunately, it doesn't work well because zRAM cannot use CMA's movable pages unless it doesn't support compaction. I got several reports about that OOM happened with zram although there are lots of swap space and free space in CMA area. 3) internal fragmentation zRAM has started support memory limitation feature to limit memory usage and I sent a patchset(https://lkml.org/lkml/2014/9/21/148) for VM to be harmonized with zram-swap to stop anonymous page reclaim if zram consumed memory up to the limit although there are free space on the swap. One problem for that direction is zram has no way to know any hole in memory space zsmalloc allocated by internal fragmentation so zram would regard swap is full although there are free space in zsmalloc. For solving the issue, zram want to trigger compaction of zsmalloc before it decides full or not. This patchset is first step to support above issues. For that, it adds indirect layer between handle and object location and supports manual compaction to solve 3th problem first of all. After this patchset got merged, next step is to make VM aware of zsmalloc compaction so that generic compaction will move zsmalloced-pages automatically in runtime. In my imaginary experiment(ie, high compress ratio data with heavy swap in/out on 8G zram-swap), data is as follows, Before = zram allocated object : 60212066 bytes zram total used: 140103680 bytes ratio: 42.98 percent MemFree: 840192 kB Compaction After = frag ratio after compaction zram allocated object : 60212066 bytes zram total used: 76185600 bytes ratio: 79.03 percent MemFree: 901932 kB Juneho reported below in his real platform with small aging. So, I think the benefit would be bigger in real aging system for a long time. - frag_ratio increased 3% (ie, higher is better) - memfree increased about 6MB - In buddy info, Normal 2^3: 4, 2^2: 1: 2^1 increased, Highmem: 2^1 21 increased frag ratio after swap fragment used : 156677 kbytes total: 166092 kbytes frag_ratio : 94 meminfo before compaction MemFree: 83724 kB Node 0, zone Normal 13642 1364 57 10 61 17 9 5 4 0 0 Node 0, zone HighMem 425 29 1 0 0 0 0 0 0 0 0 num_migrated : 23630 compaction done frag ratio after compaction used : 156673 kbytes total: 160564 kbytes frag_ratio : 97 meminfo after compaction MemFree: 89060 kB Node 0, zone Normal 14076 1544 67 14 61 17 9 5 4 0 0 Node 0, zone HighMem 863 50 1 0 0 0 0 0 0 0 0 This patchset adds more logics(about 480 lines) in zsmalloc but when I tested heavy swapin/out program, the regression for swapin/out speed is marginal because most of overheads were caused by compress/decompress and other MM reclaim stuff. This patch (of 7): Currently, handle of zsmalloc encodes object's location directly so it makes support of migration hard. This patch decouples handle and object via adding indirect layer. For that, it allocates handle dynamically and returns it to user. The handle is the address allocated by slab allocation so it's unique and we could keep object's location in the memory space allocated for handle. With it, we can change object's position without changing handle itself. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
018e9a4 mm/compaction.c: fix "suitable_migration_target() unused" warning mm/compaction.c:250:13: warning: 'suitable_migration_target' defined but not used [-Wunused-function] Reported-by: Fengguang Wu <fengguang.wu@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
be64f88 dax: unify ext2/4_{dax,}_file_operations The original dax patchset split the ext2/4_file_operations because of the two NULL splice_read/splice_write in the dax case. In the vfs if splice_read/splice_write are NULL we then call default_splice_read/write. What we do here is make generic_file_splice_read aware of IS_DAX() so the original ext2/4_file_operations can be used as is. For write it appears that iter_file_splice_write is just fine. It uses the regular f_op->write(file,..) or new_sync_write(file, ...). Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Dave Chinner <dchinner@redhat.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
0e3b210 dax: use pfn_mkwrite to update c/mtime + freeze protection From: Yigal Korman <yigal@plexistor.com> [v1] Without this patch, c/mtime is not updated correctly when mmap'ed page is first read from and then written to. A new xfstest is submitted for testing this (generic/080) [v2] Jan Kara has pointed out that if we add the sb_start/end_pagefault pair in the new pfn_mkwrite we are then fixing another bug where: A user could start writing to the page while filesystem is frozen. Signed-off-by: Yigal Korman <yigal@plexistor.com> Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
dd90618 mm: new pfn_mkwrite same as page_mkwrite for VM_PFNMAP This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to get notified when access is a write to a read-only PFN. This can happen if we mmap() a file then first mmap-read from it to page-in a read-only PFN, than we mmap-write to the same page. We need this functionality to fix a DAX bug, where in the scenario above we fail to set ctime/mtime though we modified the file. An xfstest is attached to this patchset that shows the failure and the fix. (A DAX patch will follow) This functionality is extra important for us, because upon dirtying of a pmem page we also want to RDMA the page to a remote cluster node. We define a new pfn_mkwrite and do not reuse page_mkwrite because 1 - The name ;-) 2 - But mainly because it would take a very long and tedious audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP users. To make sure they do not now CRASH. For example current DAX code (which this is for) would crash. If we would want to reuse page_mkwrite, We will need to first patch all users, so to not-crash-on-no-page. Then enable this patch. But even if I did that I would not sleep so well at night. Adding a new vector is the safest thing to do, and is not that expensive. an extra pointer at a static function vector per driver. Also the new vector is better for performance, because else we Will call all current Kernel vectors, so to: check-ha-no-page-do-nothing and return. No need to call it from do_shared_fault because do_wp_page is called to change pte permissions anyway. Signed-off-by: Yigal Korman <yigal@plexistor.com> Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
2682582 mm/memory: also print a_ops->readpage in print_bad_pte() A lot of filesystems use generic_file_mmap() and filemap_fault(), f_op->mmap and vm_ops->fault aren't enough to identify filesystem. This prints file name, vm_ops->fault, f_op->mmap and a_ops->readpage (which is almost always implemented and filesystem-specific). Example: [ 23.676410] BUG: Bad page map in process sh pte:1b7e6025 pmd:19bbd067 [ 23.676887] page:ffffea00006df980 count:4 mapcount:1 mapping:ffff8800196426c0 index:0x97 [ 23.677481] flags: 0x10000000000000c(referenced|uptodate) [ 23.677896] page dumped because: bad pte [ 23.678205] addr:00007f52fcb17000 vm_flags:00000075 anon_vma: (null) mapping:ffff8800196426c0 index:97 [ 23.678922] file:libc-2.19.so fault:filemap_fault mmap:generic_file_readonly_mmap readpage:v9fs_vfs_readpage [akpm@linux-foundation.org: use pr_alert, per Kirill] Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Sasha Levin <sasha.levin@oracle.com> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
9239361 mm/mempool.c: kasan: poison mempool elements Mempools keep allocated objects in reserved for situations when ordinary allocation may not be possible to satisfy. These objects shouldn't be accessed before they leave the pool. This patch poison elements when get into the pool and unpoison when they leave it. This will let KASan to detect use-after-free of mempool's elements. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Tested-by: David Rientjes <rientjes@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Chernenkov <drcheren@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
bda6d33 mm/cma_debug.c: remove blank lines before DEFINE_SIMPLE_ATTRIBUTE() Like EXPORT_SYMBOL(): the positioning communicates that the macro pertains to the immediately preceding function. Cc: Dmitry Safonov <d.safonov@partner.samsung.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Stefan Strogin <stefan.strogin@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pintu Kumar <pintu.k@samsung.com> Cc: Weijie Yang <weijie.yang@samsung.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Vyacheslav Tyrtov <v.tyrtov@samsung.com> Cc: Aleksei Mateosian <a.mateosian@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
2e32b94 mm: cma: add functions to get region pages counters Here are two functions that provide interface to compute/get used size and size of biggest free chunk in cma region. Add that information to debugfs. [akpm@linux-foundation.org: move debug code from cma.c into cma_debug.c] [stefan.strogin@gmail.com: move code from cma_get_used() and cma_get_maxchunk() to cma_used_get() and cma_maxchunk_get()] Signed-off-by: Dmitry Safonov <d.safonov@partner.samsung.com> Signed-off-by: Stefan Strogin <stefan.strogin@gmail.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pintu Kumar <pintu.k@samsung.com> Cc: Weijie Yang <weijie.yang@samsung.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Vyacheslav Tyrtov <v.tyrtov@samsung.com> Cc: Aleksei Mateosian <a.mateosian@samsung.com> Signed-off-by: Stefan Strogin <stefan.strogin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:20 UTC
79553da thp: cleanup khugepaged startup Few trivial cleanups: - no need to call set_recommended_min_free_kbytes() from late_initcall() -- start_khugepaged() calls it; - no need to call set_recommended_min_free_kbytes() from start_khugepaged() if khugepaged is not started; - there isn't much point in running start_khugepaged() if we've just set transparent_hugepage_flags to zero; - start_khugepaged() is misnamed -- it also used to stop the thread; Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
e39155e mm: uninline and cleanup page-mapping related helpers Most-used page->mapping helper -- page_mapping() -- has already uninlined. Let's uninline also page_rmapping() and page_anon_vma(). It saves us depending on configuration around 400 bytes in text: text data bss dec hex filename 660318 99254 410000 1169572 11d8a4 mm/built-in.o-before 659854 99254 410000 1169108 11d6d4 mm/built-in.o I also tried to make code a bit more clean. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
99e8ea6 mm: cma: add trace events for CMA allocations and freeings Add trace events for cma_alloc() and cma_release(). The cma_alloc tracepoint is used both for successful and failed allocations, in case of allocation failure pfn=-1UL is stored and printed. Signed-off-by: Stefan Strogin <stefan.strogin@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mpn@google.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
cdd7875 include/linux/mm.h: simplify flag check Flip the flag test so that it is the simplest. No functional change, just a small readability improvement: No code changed: # arch/x86/kernel/sys_x86_64.o: text data bss dec hex filename 1551 24 0 1575 627 sys_x86_64.o.before 1551 24 0 1575 627 sys_x86_64.o.after md5: 70708d1b1ad35cc891118a69dc1a63f9 sys_x86_64.o.before.asm 70708d1b1ad35cc891118a69dc1a63f9 sys_x86_64.o.after.asm Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
6a4055b mm/memblock.c: add debug output for memblock_add() memblock_reserve() calls memblock_reserve_region() which prints debugging information if 'memblock=debug' was passed on the command line. This patch adds the same behaviour, but for memblock_add function(). [akpm@linux-foundation.org: s/memblock_memory/memblock_add/ in message] Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Emil Medve <Emilian.Medve@freescale.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
7e1f049 mm: hugetlb: cleanup using paeg_huge_active() Now we have an easy access to hugepages' activeness, so existing helpers to get the information can be cleaned up. [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
bcc5422 mm: hugetlb: introduce page_huge_active We are not safe from calling isolate_huge_page() on a hugepage concurrently, which can make the victim hugepage in invalid state and results in BUG_ON(). The root problem of this is that we don't have any information on struct page (so easily accessible) about hugepages' activeness. Note that hugepages' activeness means just being linked to hstate->hugepage_activelist, which is not the same as normal pages' activeness represented by PageActive flag. Normal pages are isolated by isolate_lru_page() which prechecks PageLRU before isolation, so let's do similarly for hugetlb with a new paeg_huge_active(). set/clear_page_huge_active() should be called within hugetlb_lock. But hugetlb_cow() and hugetlb_no_page() don't do this, being justified because in these functions set_page_huge_active() is called right after the hugepage is allocated and no other thread tries to isolate it. [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/, make it return bool] [fengguang.wu@intel.com: set_page_huge_active() can be static] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
822fc61 mm: don't call __page_cache_release for hugetlb __put_compound_page() calls __page_cache_release() to do some freeing work, but it's obviously for thps, not for hugetlb. We don't care because PageLRU is always cleared and page->mem_cgroup is always NULL for hugetlb. But it's not correct and has potential risks, so let's make it conditional. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
9fcd145 mm/mmap.c: use while instead of if+goto The creators of the C language gave us the while keyword. Let's use that instead of synthesizing it from if+goto. Made possible by 6597d783397a ("mm/mmap.c: replace find_vma_prepare() with clearer find_vma_links()"). [akpm@linux-foundation.org: fix 80-col overflows] Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
215ba78 mm, selftests: test return value of munmap for MAP_HUGETLB memory When MAP_HUGETLB memory is unmapped, the length must be hugepage aligned, otherwise it fails with -EINVAL. All tests currently behave correctly, but it's better to explcitly test the return value for completeness and document the requirement, especially if users copy map_hugetlb.c as a sample implementation. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Joern Engel <joern@logfs.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Eric B Munson <emunson@akamai.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
80d6b94 mm, doc: cleanup and clarify munmap behavior for hugetlb memory munmap(2) of hugetlb memory requires a length that is hugepage aligned, otherwise it may fail. Add this to the documentation. This also cleans up the documentation and separates it into logical units: one part refers to MAP_HUGETLB and another part refers to requirements for shared memory segments. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Joern Engel <joern@logfs.org> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Eric B Munson <emunson@akamai.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
ae7efa5 thp: do not adjust zone water marks if khugepaged is not started set_recommended_min_free_kbytes() adjusts zone water marks to be suitable for khugepaged. We avoid doing this if khugepaged is disabled, but don't catch the case when khugepaged is failed to start. Let's address this by checking khugepaged_thread instead of khugepaged_enabled() in set_recommended_min_free_kbytes(). It's NULL if the kernel thread is stopped or failed to start. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:19 UTC
65ebb64 thp: handle errors in hugepage_init() properly We miss error-handling in few cases hugepage_init(). Let's fix that. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:18 UTC
bdfedb7 mm, mempool: poison elements backed by slab allocator Mempools keep elements in a reserved pool for contexts in which allocation may not be possible. When an element is allocated from the reserved pool, its memory contents is the same as when it was added to the reserved pool. Because of this, elements lack any free poisoning to detect use-after-free errors. This patch adds free poisoning for elements backed by the slab allocator. This is possible because the mempool layer knows the object size of each element. When an element is added to the reserved pool, it is poisoned with POISON_FREE. When it is removed from the reserved pool, the contents are checked for POISON_FREE. If there is a mismatch, a warning is emitted to the kernel log. This is only effective for configs with CONFIG_DEBUG_SLAB or CONFIG_SLUB_DEBUG_ON. [fabio.estevam@freescale.com: use '%zu' for printing 'size_t' variable] [arnd@arndb.de: add missing include] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:18 UTC
e244c9e mm, mempool: disallow mempools based on slab caches with constructors All occurrences of mempools based on slab caches with object constructors have been removed from the tree, so disallow creating them. We can only dereference mem->ctor in mm/mempool.c without including mm/slab.h in include/linux/mempool.h. So simply note the restriction, just like the comment restricting usage of __GFP_ZERO, and warn on kernels with CONFIG_DEBUG_VM() if such a mempool is allocated from. We don't want to incur this check on every element allocation, so use VM_BUG_ON(). Signed-off-by: David Rientjes <rientjes@google.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 April 2015, 23:35:18 UTC
back to top