Revision history – Software Heritage archive

Revision	Author	Date	Message	Commit Date
3768f0b	Joe Perches	15 December 2009, 02:00:54 UTC	MAINTAINERS: Update file patterns for WOLFSON MICROELECTRONICS PMIC DRIVERS One of the includes pointed to a non-existent directory Add Documentation/hwmon/wm83?? Add sound/soc/codecs/wm(8350\|8400).h files Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:29 UTC
e74b98e	Joe Perches	15 December 2009, 02:00:53 UTC	MAINTAINERS: remove file pattern from KERNEL VIRTUAL MACHINE (KVM) FOR AMD-V Commit 6c8166a77c98f473eb91e96a61c3cf78ac617278 ("KVM: SVM: Fold kvm_svm.h info svm.c") folded this file away. Signed-off-by: Joe Perches <joe@perches.com> Cc: Avi Kivity <avi@redhat.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:29 UTC
b57fe92	Joe Perches	15 December 2009, 02:00:52 UTC	MAINTAINERS: rename PALM TREO section and file patterns Signed-off-by: Joe Perches <joe@perches.com> Cc: Tomas Cech <sleep_walker@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:29 UTC
d1f2895	KOSAKI Motohiro	15 December 2009, 02:00:52 UTC	MAINTAINERS: mark cifs mailing list as "moderated for non-subscribers" If non-subscribers post bug report to CIFS mailing list, they will get following messages. Your mail to 'linux-cifs-client' with the subject [PATCH x/x] cifs: xxxxxxxxxxxxx Is being held until the list moderator can review it for approval. The reason it is being held: Post by non-member to a members-only list Either the message will get posted to the list, or you will receive notification of the moderator's decision. If you would like to cancel this posting, please visit the following URL: members-only list should be written as so in MAINTAINERS file. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Steve French <sfrench@samba.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:29 UTC
60db31a	Joe Perches	15 December 2009, 02:00:50 UTC	scripts/get_maintainer.pl: support multiple VCSs - add mercurial Restructure a bit for multiple version control systems support. Use a hash for each supported VCS that contains the commands and patterns used to find commits, logs, and signers. --git command line options are still used for hg except for --git-since. Use --hg-since instead. The number of commits can differ for git and hg, so --rolestats might be different. Style changes: Use common push style push(@foo...), simplify a return Bumped version to 0.23. Signed-off-by: Joe Perches <joe@perches.com> Cc: Marti Raudsepp <marti@juffo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:29 UTC
a8af243	Joe Perches	15 December 2009, 02:00:49 UTC	scripts/get_maintainer.pl: fix --non with --git-blame and cleanups Fix email matching without name --n and --git-blame Using --non and --git-blame caused maintainer signature matching to fail. Fixed that by adding 3rd argument to sub format_email to control show/hide name portion of address Slurp -f file instead of reading line-by-line for K: pattern matching. Suggested by Wolfram Sang as more efficient Refactor git command execution Break into 2 functions, execute/analyze Share code between --git and --git-blame Don't warn multiple times when git isn't installed Improve stats reporting --git-min-percent and -- rolestats now count the total number of commits for either the period of --git-since or if using --git-blame the commits used by the current file and calculate commit % as # of commits signed / total commits * 100 Code style cleaning Use consistent sub foo { my (args...) = @_; Signed-off-by: Joe Perches <joe@perches.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Greg KH <greg@kroah.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:28 UTC
3c7385b	Joe Perches	15 December 2009, 02:00:46 UTC	scripts/get_maintainer.pl: add --roles and --rolestats --roles shows the role of each email address, i.e. why it was selected. --rolestats selects --roles and adds git log/blame signers #'s and % Multiple roles are possible (supporter, maintainer, git-signer...) --roles or --rolestats is meant to help identify appropriate maintainers to notify and should not be used with "git send-email --cc-cmd" Example output: Existing: $ ./scripts/get_maintainer.pl -f arch/x86/kernel/acpi/boot.c Corentin Chary <corentincj@iksaif.net> Karol Kozimor <sziwan@users.sourceforge.net> Len Brown <len.brown@intel.com> Pavel Machek <pavel@ucw.cz> Rafael J. Wysocki <rjw@sisk.pl> Thomas Gleixner <tglx@linutronix.de> Ingo Molnar <mingo@redhat.com> H. Peter Anvin <hpa@zytor.com> x86@kernel.org Yinghai Lu <yhlu.kernel@gmail.com> Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> acpi4asus-user@lists.sourceforge.net linux-pm@lists.linux-foundation.org linux-kernel@vger.kernel.org With --roles $ ./scripts/get_maintainer.pl --roles -f arch/x86/kernel/acpi/boot.c Corentin Chary <corentincj@iksaif.net> (maintainer:ASUS ACPI EXTRAS...) Karol Kozimor <sziwan@users.sourceforge.net> (maintainer:ASUS ACPI EXTRAS...) Len Brown <len.brown@intel.com> (supporter:SUSPEND TO RAM,git-signer) Pavel Machek <pavel@ucw.cz> (supporter:SUSPEND TO RAM) Rafael J. Wysocki <rjw@sisk.pl> (supporter:SUSPEND TO RAM) Thomas Gleixner <tglx@linutronix.de> (maintainer:X86 ARCHITECTURE...) Ingo Molnar <mingo@redhat.com> (maintainer:X86 ARCHITECTURE...,git-signer) H. Peter Anvin <hpa@zytor.com> (maintainer:X86 ARCHITECTURE...) x86@kernel.org (maintainer:X86 ARCHITECTURE...) Yinghai Lu <yhlu.kernel@gmail.com> (git-signer) Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> (git-signer) acpi4asus-user@lists.sourceforge.net (open list:ASUS ACPI EXTRAS...) linux-pm@lists.linux-foundation.org (open list:SUSPEND TO RAM) linux-kernel@vger.kernel.org (open list) With --rolestats $ ./scripts/get_maintainer.pl --rolestats -f arch/x86/kernel/acpi/boot.c Corentin Chary <corentincj@iksaif.net> (maintainer:ASUS ACPI EXTRAS...) Karol Kozimor <sziwan@users.sourceforge.net> (maintainer:ASUS ACPI EXTRAS...) Len Brown <len.brown@intel.com> (supporter:SUSPEND TO RAM,git-signer:16/79=20%) Pavel Machek <pavel@ucw.cz> (supporter:SUSPEND TO RAM) Rafael J. Wysocki <rjw@sisk.pl> (supporter:SUSPEND TO RAM) Thomas Gleixner <tglx@linutronix.de> (maintainer:X86 ARCHITECTURE...) Ingo Molnar <mingo@redhat.com> (maintainer:X86 ARCHITECTURE...,git-signer:29/79=37%) H. Peter Anvin <hpa@zytor.com> (maintainer:X86 ARCHITECTURE...) x86@kernel.org (maintainer:X86 ARCHITECTURE...) Yinghai Lu <yhlu.kernel@gmail.com> (git-signer:12/79=15%) Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> (git-signer:6/79=8%) acpi4asus-user@lists.sourceforge.net (open list:ASUS ACPI EXTRAS...) linux-pm@lists.linux-foundation.org (open list:SUSPEND TO RAM) linux-kernel@vger.kernel.org (open list) With --rolestats and --git-blame $ ./scripts/get_maintainer.pl --rolestats --git-blame -f arch/x86/kernel/acpi/boot.c Corentin Chary <corentincj@iksaif.net> (maintainer:ASUS ACPI EXTRAS...) Karol Kozimor <sziwan@users.sourceforge.net> (maintainer:ASUS ACPI EXTRAS...) Len Brown <len.brown@intel.com> (supporter:SUSPEND TO RAM,git-signer:16/79=20%,commits:22/154=14%) Pavel Machek <pavel@ucw.cz> (supporter:SUSPEND TO RAM) Rafael J. Wysocki <rjw@sisk.pl> (supporter:SUSPEND TO RAM) Thomas Gleixner <tglx@linutronix.de> (maintainer:X86 ARCHITECTURE...) Ingo Molnar <mingo@redhat.com> (maintainer:X86 ARCHITECTURE...,git-signer:29/79=37%,commits:36/154=23%) H. Peter Anvin <hpa@zytor.com> (maintainer:X86 ARCHITECTURE...) x86@kernel.org (maintainer:X86 ARCHITECTURE...) Yinghai Lu <yhlu.kernel@gmail.com> (git-signer:12/79=15%,commits:9/154=6%) Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> (git-signer:6/79=8%) Andi Kleen <ak@suse.de> (commits:11/154=7%) Andrew Morton <akpm@osdl.org> (commits:10/154=6%) acpi4asus-user@lists.sourceforge.net (open list:ASUS ACPI EXTRAS...) linux-pm@lists.linux-foundation.org (open list:SUSPEND TO RAM) linux-kernel@vger.kernel.org (open list) Other changes: Format git-signers email addresses a bit to reduce bad signatures Command line bad arguments emitted a verbose usage(), just show --help Version number bumped to .22 Ben Hutchings had the idea and created a good deal of this implementation. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:28 UTC
5ada918	Bernhard Walle	15 December 2009, 02:00:43 UTC	vt: introduce and use vt_kmsg_redirect() function The kernel offers with TIOCL_GETKMSGREDIRECT ioctl() the possibility to redirect the kernel messages to a specific console. However, since it's not possible to switch to the kernel message console after a panic(), it would be nice if the kernel would print the panic message on the current console. This patch series adds a new interface to access the global kmsg_redirect variable by a function to be able to use it in code where CONFIG_VT_CONSOLE is not set (kernel/panic.c). This patch: Instead of using and exporting a global value kmsg_redirect, introduce a function vt_kmsg_redirect() that both can set and return the console where messages are printed. Change all users of kmsg_redirect (the VT code itself and kernel/power.c) to the new interface. The main advantage is that vt_kmsg_redirect() can also be used when CONFIG_VT_CONSOLE is not set. Signed-off-by: Bernhard Walle <bernhard@bwalle.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:28 UTC
c95d1e5	Andres Salomon	15 December 2009, 02:00:41 UTC	cs5535: drop the Geode-specific MFGPT/GPIO code With generic modular drivers handling all of this stuff, the geode-specific code can go away. The cs5535-gpio, cs5535-mfgpt, and cs5535-clockevt drivers now handle this. Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:28 UTC
f3a57a6	Andres Salomon	15 December 2009, 02:00:40 UTC	cs5535: define lxfb/gxfb MSRs in linux/cs5535.h ..and include them in the lxfb/gxfb drivers rather than asm/geode.h (where possible). Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:28 UTC
f060f27	Andres Salomon	15 December 2009, 02:00:40 UTC	cs5535: move VSA2 checks into linux/cs5535.h Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:28 UTC
2e8c124	Andres Salomon	15 December 2009, 02:00:39 UTC	cs5535: move the DIVIL MSR definition into linux/cs5535.h The only thing that uses this is the reboot_fixups code. Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:28 UTC
c30d7d2	Andres Salomon	15 December 2009, 02:00:38 UTC	cs5535: add a generic clock event MFGPT driver This is based on the old code in arch/x86/kernel/mfgpt_32.c, but is modular and not Geode-specific. There's no reason why the clock event device needs to be registered so early at boot; the clockevent code is perfectly capable of dynamic switching. [akpm@linux-foundation.org: add linux/irq.h include] Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:28 UTC
82dca61	Andres Salomon	15 December 2009, 02:00:37 UTC	cs5535: add a generic MFGPT driver This is based on the old code on arch/x86/kernel/mfgpt_32.c, except it's not x86 specific, it's modular, and it makes use of a PCI BAR rather than a random MSR. Currently module unloading is not supported; it's uncertain whether or not it can be made work with the hardware. [akpm@linux-foundation.org: add X86 dependency] Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Chris Ball <cjb@laptop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:28 UTC
3c55494	Andres Salomon	15 December 2009, 02:00:36 UTC	ALSA: cs5535audio: free OLPC quirks from reliance on MGEODE_LX cpu optimization Previously, OLPC support for the mic extensions was only enabled in the ALSA driver if CONFIG_OLPC and CONFIG_MGEODE_LX were both set. This was because the old geode GPIO code was written in a manner that assumed CONFIG_MGEODE_LX. With the new cs553x-gpio driver, this is no longer the case; as such, we can drop the requirement on CONFIG_MGEODE_LX and instead include a requirement on GPIOLIB. We use the generic GPIO API rather than the cs553x-specific API. Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:27 UTC
1ea3fa7	Tobias Mueller	15 December 2009, 02:00:35 UTC	cs5535-gpio: request function, mask & names added Changed number of gpio pins to 32 (according to datasheet) Added mask to disable some pins Added gpio_request for checking mask and disabling special pin functions Added pin names [dilinger@collabora.co.uk: make printk usage consistent] Signed-off-by: Tobias Mueller <Tobias_Mueller@twam.info> Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: David Brownell <david-b@pacbell.net> Cc: Alessandro Zummo <alessandro.zummo@towertech.it> Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:27 UTC
5f0a96b	Andres Salomon	15 December 2009, 02:00:32 UTC	cs5535-gpio: add AMD CS5535/CS5536 GPIO driver support This creates a CS5535/CS5536 GPIO driver which uses a gpio_chip backend (allowing GPIO users to use the generic GPIO API if desired) while also allowing architecture-specific users directly (via the cs5535_gpio_* functions). Tested on an OLPC machine. Some Leemotes also use CS5536 (with a mips cpu), which is why this is in drivers/gpio rather than arch/x86. Currently, it conflicts with older geode GPIO support; once MFGPT support is reworked to also be more generic, the older geode code will be removed. Signed-off-by: Andres Salomon <dilinger@collabora.co.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jordan Crouse <jordan@cosmicpenguin.net> Cc: David Brownell <david-b@pacbell.net> Reviewed-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:27 UTC
1f2f38d	Thadeu Lima de Souza Cascardo	15 December 2009, 02:00:31 UTC	drivers/char/misc.c: use bitmap/bitops functions for dynamic minor number allocation Use DECLARE_BITMAP(), find_first_zero_bit(), set_bit() and clear_bit() instead of rewriting code to do it with the minor number dynamic allocation bitmap. We need to invert the bit position to keep the code behaviour of using the last minor numbers first, since we don't have a find_last_zero_bit. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:27 UTC
4ae717d	Thadeu Lima de Souza Cascardo	15 December 2009, 02:00:30 UTC	drivers/char/misc.c: clear allocation bit in minor bitmap when device register fails If there's a failure creating the device (because there's already one with the same name, for example), the current implementation does not clear the bit for the allocated minor and that number is lost for future allocations. Second, the test currently in misc_deregister is broken, since it does not test for the 0 minor. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:27 UTC
603c4ba	Phil Carmody	15 December 2009, 02:00:29 UTC	err.h: add helper function to simplify pointer error checking There are quite a few instances in the kernel of checks of pointers both against NULL and against the errno range, handling both cases identically. This additional helper function would simplify such code. [akpm@linux-foundation.org: build fix] Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:27 UTC
9385565	Jean Delvare	15 December 2009, 02:00:28 UTC	ioc3/ioc4: fix error path on driver registration Two IOC3 and IOC4 drivers have broken error paths on registration. Fix them. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Pat Gefre <pfg@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:27 UTC
2ea5d35	Jean Delvare	15 December 2009, 02:00:27 UTC	ioc3/ioc4: various section fixes Several IOC3 and IOC4 drivers misuse the __devinit and __devexit section markers. Use __init and __exit instead as appropriate, then add __devinit and __devexit where they really belong for PCI drivers. Also make ioc4_serial_init static. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Pat Gefre <pfg@sgi.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:27 UTC
e4c570c	Hiroshi Shimamoto	15 December 2009, 02:00:26 UTC	task_struct: make journal_info conditional journal_info in task_struct is used in journaling file system only. So introduce CONFIG_FS_JOURNAL_INFO and make it conditional. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:27 UTC
8420e7e	Alexey Dobriyan	15 December 2009, 02:00:25 UTC	Make DEBUG_BUGVERBOSE default to y It's easy to lose useful DEBUG_BUGVERBOSE by switching EMBEDDED left and right. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:26 UTC
8a64f33	Joe Perches	15 December 2009, 02:00:25 UTC	kernel.h: add printk_ratelimited and pr_<level>_rl Add a printk_ratelimited statement expression macro that uses a per-call ratelimit_state so that multiple subsystems output messages are not suppressed by a global __ratelimit state. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: s/_rl/_ratelimited/g] Signed-off-by: Joe Perches <joe@perches.com> Cc: Naohiro Ooiwa <nooiwa@miraclelinux.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:26 UTC
2643434	Thadeu Lima de Souza Cascardo	15 December 2009, 02:00:23 UTC	misc: remove MAC pmu function declaration from misc device class Commit 8c8709334cec803368a432a33e0f2e116d48fe07 has removed the pmu_device_init call from misc_init, but unlike other similar commits, has not removed its declaration. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:26 UTC
dfc6a73	H Hartley Sweeten	15 December 2009, 02:00:22 UTC	kernel/sys.c: fix "warning: do-while statement is not a compound statement" noise do_each_thread/while_each_thread wrap a block of code that is in this format: for (...) do ... while If curly braces do not surround the inner loop the following warning is generated by sparse: warning: do-while statement is not a compound statement Fix the warning by adding the braces. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:26 UTC
948c1e2	Amerigo Wang	15 December 2009, 02:00:22 UTC	kallsyms: remove deprecated print_fn_descriptor_symbol() According to feature-removal-schedule.txt, it is the time to remove print_fn_descriptor_symbol(). And a quick grep shows that it no longer has any callers. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:26 UTC
29671f2	Amerigo Wang	15 December 2009, 02:00:21 UTC	rwsem: fix rwsem_is_locked() bugs rwsem_is_locked() tests ->activity without locks, so we should always keep ->activity consistent. However, the code in __rwsem_do_wake() breaks this rule, it updates ->activity after _all_ readers waken up, this may give some reader a wrong ->activity value, thus cause rwsem_is_locked() behaves wrong. Quote from Andrew: " - we have one or more processes sleeping in down_read(), waiting for access. - we wake one or more processes up without altering ->activity - they start to run and they do rwsem_is_locked(). This incorrectly returns "false", because the waker process is still crunching away in __rwsem_do_wake(). - the waker now alters ->activity, but it was too late. " So we need get a spinlock to protect this. And rwsem_is_locked() should not block, thus we use spin_trylock_irqsave(). [akpm@linux-foundation.org: simplify code] Reported-by: Brian Behlendorf <behlendorf1@llnl.gov> Cc: Ben Woodard <bwoodard@llnl.gov> Cc: David Howells <dhowells@redhat.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:26 UTC
118d52d	Amerigo Wang	15 December 2009, 02:00:20 UTC	rwsem-spinlock: remove useless function exports These functions need not to be exported, since no drivers should use them. __init_rwsem() is an exception, because init_rwsem(), which is a macro, is used. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:26 UTC
0b2749a	Joe Perches	15 December 2009, 02:00:19 UTC	kernel.h: remove initialization of bool in printk_once Don't initialize __print_once. Invert the test to reduce initialized data. defconfig before: $size vmlinux text data bss dec hex filename 6976022 679572 1359668 9015262 898fde vmlinux defconfig after: $size vmlinux text data bss dec hex filename 6976006 679508 1359700 9015214 898fae vmlinux Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:26 UTC
196a15b	H Hartley Sweeten	15 December 2009, 02:00:18 UTC	init/main.c: fix symbol shadows noise The symbol 'call' is a static symbol used for initcall_debug. This same symbol name is used locally by a couple functions and produces the following sparse warnings: warning: symbol 'call' shadows an earlier one Fix this noise by renaming the local symbols. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:26 UTC
4d00928	Daniel Mack	15 December 2009, 02:00:17 UTC	drivers/misc: add driver for Texas Instruments DAC7512 Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: "H Hartley Sweeten" <hartleys@visionengravers.com> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:25 UTC
c0f68c2	Xiao Guangrong	15 December 2009, 02:00:16 UTC	generic-ipi: cleanup for generic_smp_call_function_interrupt() Use smp_processor_id() instead of get_cpu() and put_cpu() in generic_smp_call_function_interrupt(), It's no need to disable preempt, because we must call generic_smp_call_function_interrupt() with interrupts disabled. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:25 UTC
4eb174b	Michael Hennerich	15 December 2009, 02:00:15 UTC	ad525x_dpot: new driver for AD525x digital potentiometers This driver supports the non-volatile digital potentiometers via I2C: AD5258, AD5259, AD5251, AD5252, AD5253, AD5254, and AD5255 It provides a sysfs interface to each device for reading/writing which is documented in Documentation/misc-devices/ad525x_dpot.txt. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Chris Verges <chrisv@cyberswitching.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:25 UTC
00b5586	Joe Perches	15 December 2009, 02:00:14 UTC	dynamic_debug.h/kernel.h: Remove KBUILD_MODNAME from dynamic_pr_debug If CONFIG_DYNAMIC_DEBUG is enabled and a source file has: #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include <linux/kernel.h> dynamic_debug.h will duplicate KBUILD_MODNAME in the output string. Remove the use of KBUILD_MODNAME from the output format string generated by dynamic_debug.h If CONFIG_DYNAMIC_DEBUG is not enabled, no compile-time check is done to printk/dev_printk arguments. Add it. Signed-off-by: Joe Perches <joe@perches.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:25 UTC
42f247c	Cesar Eduardo Barros	15 December 2009, 02:00:13 UTC	WARN_ONCE(): use bool for boolean flag Commit 70867453092297be9afb2249e712a1f960ec0a09 ("printk_once(): use bool for boolean flag") changed printk_once() to use bool instead of int for its guard variable. Do the same change to WARN_ONCE() and WARN_ON_ONCE(), for the same reasons. This resulted in a reduction of 1462 bytes on a x86-64 defconfig: text data bss dec hex filename 8101271 1207116 992764 10301151 9d2edf vmlinux.before 8100553 1207148 991988 10299689 9d2929 vmlinux.after Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> Cc: Roland Dreier <rolandd@cisco.com> Cc: Daniel Walker <dwalker@fifo99.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:25 UTC
6613c5e	Alexey Dobriyan	15 December 2009, 02:00:11 UTC	uml: convert to seq_file/proc_fops Convert code away from ->read_proc/->write_proc interfaces. Switch to proc_create()/proc_create_data() which make addition of proc entries reliable wrt NULL ->proc_fops, NULL ->data and so on. Problem with ->read_proc et al is described here commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in /proc entries" Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:25 UTC
2886a8b	Arjan van de Ven	15 December 2009, 02:00:11 UTC	floppy: Add an extra bound check on ioctl arguments gcc is not convinced that the floppy.c ioctl has sufficient bound checks: In function `copy_from_user', inlined from `fd_copyin' at drivers/block/floppy.c:3080, inlined from `fd_ioctl' at drivers/block/floppy.c:3503: arch/x86/include/asm/uaccess_32.h:211: warning: call to `copy_from_user_overflow' declared with attribute warning: copy_from_user buffer size is not provably correct And frankly, as a human I have a hard time proving the same more or less (the size comes from the ioctl argument. humpf. maybe. the code isn't very nice) This patch adds an explicit check to make 100% sure it's safe, better than finding out later that there indeed was a gap. [akpm@linux-foundation.org: add WARN_ON()] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:25 UTC
faa7b7d	Julia Lawall	15 December 2009, 02:00:09 UTC	drivers/cpuidle: Move dereference after NULL test It does not seem possible that ldev can be NULL, so drop the unnecessary test. If ldev can somehow be NULL, then the initialization of last_idx should be moved below the test. A simplified version of the semantic match that detects this problem is as follows (http://coccinelle.lip6.fr/): // <smpl> @match exists@ expression x, E; identifier fld; @@ * x->fld ... when != $x = E\\|&x$ * x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:25 UTC
4714521	Alexey Dobriyan	15 December 2009, 02:00:08 UTC	const: constify remaining dev_pm_ops Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:25 UTC
0ead0f8	Alexey Dobriyan	15 December 2009, 02:00:06 UTC	alpha: convert srm code to seq_file Convert code away from ->read_proc/->write_proc interfaces. Switch to proc_create()/proc_create_data() which make addition of proc entries reliable wrt NULL ->proc_fops, NULL ->data and so on. Problem with ->read_proc et al is described here commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in /proc entries" Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:24 UTC
4614a69	john stultz	15 December 2009, 02:00:05 UTC	procfs: allow threads to rename siblings via /proc/pid/tasks/tid/comm Setting a thread's comm to be something unique is a very useful ability and is helpful for debugging complicated threaded applications. However currently the only way to set a thread name is for the thread to name itself via the PR_SET_NAME prctl. However, there may be situations where it would be advantageous for a thread dispatcher to be naming the threads its managing, rather then having the threads self-describe themselves. This sort of behavior is available on other systems via the pthread_setname_np() interface. This patch exports a task's comm via proc/pid/comm and proc/pid/task/tid/comm interfaces, and allows thread siblings to write to these values. [akpm@linux-foundation.org: cleanups] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Mike Fulton <fultonm@ca.ibm.com> Cc: Sean Foley <Sean_Foley@ca.ibm.com> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:24 UTC
7e1e0ef	Steven J. Magnani	15 December 2009, 02:00:04 UTC	procfs: use proper units for noMMU statm On no-MMU systems, sizes reported in /proc/n/statm have units of bytes. Per Documentation/filesystems/proc.txt, these values should be in pages. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:24 UTC
ea63763	Jie Zhang	15 December 2009, 02:00:02 UTC	nommu: fix malloc performance by adding uninitialized flag The NOMMU code currently clears all anonymous mmapped memory. While this is what we want in the default case, all memory allocation from userspace under NOMMU has to go through this interface, including malloc() which is allowed to return uninitialized memory. This can easily be a significant performance penalty. So for constrained embedded systems were security is irrelevant, allow people to avoid clearing memory unnecessarily. This also alters the ELF-FDPIC binfmt such that it obtains uninitialised memory for the brk and stack region. Signed-off-by: Jie Zhang <jie.zhang@analog.com> Signed-off-by: Robin Getz <rgetz@blackfin.uclinux.org> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Greg Ungerer <gerg@snapgear.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:24 UTC
5dc3764	Naoya Horiguchi	15 December 2009, 02:00:01 UTC	mm hugetlb: add hugepage support to pagemap This patch enables extraction of the pfn of a hugepage from /proc/pid/pagemap in an architecture independent manner. Details ------- My test program (leak_pagemap) works as follows: - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,) - read()/write() something on it, - call page-types with option -p, - munmap() and unlink() the file on hugetlbfs Without my patches ------------------ $ ./leak_pagemap flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 1 0 __________________________________ 0x0000000000000804 1 0 __R________M______________________ referenced,mmap 0x000000000000086c 81 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap 0x0000000000005808 5 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 101 0 The output of page-types don't show any hugepage. With my patches --------------- $ ./leak_pagemap flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000000 1 0 __________________________________ 0x0000000000030000 51100 199 ________________TG________________ compound_tail,huge 0x0000000000028018 100 0 ___UD__________H_G________________ uptodate,dirty,compound_head,huge 0x0000000000000804 1 0 __R________M______________________ referenced,mmap 0x000000000000080c 1 0 __RU_______M______________________ referenced,uptodate,mmap 0x000000000000086c 80 0 __RU_lA____M______________________ referenced,uptodate,lru,active,mmap 0x0000000000005808 4 0 ___U_______Ma_b___________________ uptodate,mmap,anonymous,swapbacked 0x0000000000005868 12 0 ___U_lA____Ma_b___________________ uptodate,lru,active,mmap,anonymous,swapbacked 0x000000000000586c 1 0 __RU_lA____Ma_b___________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked total 51300 200 The output of page-types shows 51200 pages contributing to hugepages, containing 100 head pages and 51100 tail pages as expected. [akpm@linux-foundation.org: build fix] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:24 UTC
d33b9f4	Naoya Horiguchi	15 December 2009, 01:59:59 UTC	mm: hugetlb: fix hugepage memory leak in walk_page_range() Most callers of pmd_none_or_clear_bad() check whether the target page is in a hugepage or not, but walk_page_range() do not check it. So if we read /proc/pid/pagemap for the hugepage on x86 machine, the hugepage memory is leaked as shown below. This patch fixes it. Details ======= My test program (leak_pagemap) works as follows: - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,) - read()/write() something on it, - call page-types with option -p (walk around the page tables), - munmap() and unlink() the file on hugetlbfs Without my patches ------------------ $ cat /proc/meminfo \|grep "HugePage" HugePages_Total: 1000 HugePages_Free: 1000 HugePages_Rsvd: 0 HugePages_Surp: 0 $ ./leak_pagemap [snip output] $ cat /proc/meminfo \|grep "HugePage" HugePages_Total: 1000 HugePages_Free: 900 HugePages_Rsvd: 0 HugePages_Surp: 0 $ ls /hugetlbfs/ $ 100 hugepages are accounted as used while there is no file on hugetlbfs. With my patches --------------- $ cat /proc/meminfo \|grep "HugePage" HugePages_Total: 1000 HugePages_Free: 1000 HugePages_Rsvd: 0 HugePages_Surp: 0 $ ./leak_pagemap [snip output] $ cat /proc/meminfo \|grep "HugePage" HugePages_Total: 1000 HugePages_Free: 1000 HugePages_Rsvd: 0 HugePages_Surp: 0 $ ls /hugetlbfs $ No memory leaks. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:24 UTC
4f16fc1	Naoya Horiguchi	15 December 2009, 01:59:58 UTC	mm: hugetlb: fix hugepage memory leak in mincore() Most callers of pmd_none_or_clear_bad() check whether the target page is in a hugepage or not, but mincore() and walk_page_range() do not check it. So if we use mincore() on a hugepage on x86 machine, the hugepage memory is leaked as shown below. This patch fixes it by extending mincore() system call to support hugepages. Details ======= My test program (leak_mincore) works as follows: - creat() and mmap() a file on hugetlbfs (file size is 200MB == 100 hugepages,) - read()/write() something on it, - call mincore() for first ten pages and printf() the values of vec - munmap() and unlink() the file on hugetlbfs Without my patch ---------------- $ cat /proc/meminfo\| grep "HugePage" HugePages_Total: 1000 HugePages_Free: 1000 HugePages_Rsvd: 0 HugePages_Surp: 0 $ ./leak_mincore vec[0] 0 vec[1] 0 vec[2] 0 vec[3] 0 vec[4] 0 vec[5] 0 vec[6] 0 vec[7] 0 vec[8] 0 vec[9] 0 $ cat /proc/meminfo \|grep "HugePage" HugePages_Total: 1000 HugePages_Free: 999 HugePages_Rsvd: 0 HugePages_Surp: 0 $ ls /hugetlbfs/ $ Return values in vec from mincore() are set to 0, while the hugepage should be in memory, and 1 hugepage is still accounted as used while there is no file on hugetlbfs. With my patch ------------- $ cat /proc/meminfo\| grep "HugePage" HugePages_Total: 1000 HugePages_Free: 1000 HugePages_Rsvd: 0 HugePages_Surp: 0 $ ./leak_mincore vec[0] 1 vec[1] 1 vec[2] 1 vec[3] 1 vec[4] 1 vec[5] 1 vec[6] 1 vec[7] 1 vec[8] 1 vec[9] 1 $ cat /proc/meminfo \|grep "HugePage" HugePages_Total: 1000 HugePages_Free: 1000 HugePages_Rsvd: 0 HugePages_Surp: 0 $ ls /hugetlbfs/ $ Return value in *vec set to 1 and no memory leaks. [akpm@linux-foundation.org: cleanup] [akpm@linux-foundation.org: build fix] Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:24 UTC
536240f	Mel Gorman	15 December 2009, 01:59:56 UTC	hugetlb: abort a hugepage pool resize if a signal is pending If a user asks for a hugepage pool resize but specified a large number, the machine can begin trashing. In response, they might hit ctrl-c but signals are ignored and the pool resize continues until it fails an allocation. This can take a considerable amount of time so this patch aborts a pool resize if a signal is pending. Suggested by Dave Hansen. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:24 UTC
6927c1d	Lee Schermerhorn	15 December 2009, 01:59:55 UTC	mlock: replace stale comments in munlock_vma_page() Cleanup stale comments on munlock_vma_page(). Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:23 UTC
418b27e	Lee Schermerhorn	15 December 2009, 01:59:54 UTC	mm: remove unevictable_migrate_page function unevictable_migrate_page() in mm/internal.h is a relic of the since removed UNEVICTABLE_LRU Kconfig option. This patch removes the function and open codes the test in migrate_page_copy(). Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:23 UTC
4eb2b1d	Mel Gorman	15 December 2009, 01:59:53 UTC	hugetlb: acquire the i_mmap_lock before walking the prio_tree to unmap a page When the owner of a mapping fails COW because a child process is holding a reference, the children VMAs are walked and the page is unmapped. The i_mmap_lock is taken for the unmapping of the page but not the walking of the prio_tree. In theory, that tree could be changing if the lock is not held. This patch takes the i_mmap_lock properly for the duration of the prio_tree walk. [hugh.dickins@tiscali.co.uk: Spotted the problem in the first place] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:23 UTC
70da234	Amerigo Wang	15 December 2009, 01:59:52 UTC	'sysctl_max_map_count' should be non-negative Jan Engelhardt reported we have this problem: setting max_map_count to a value large enough results in programs dying at first try. This is on 2.6.31.6: 15:59 borg:/proc/sys/vm # echo $[1<<31-1] >max_map_count 15:59 borg:/proc/sys/vm # cat max_map_count 1073741824 15:59 borg:/proc/sys/vm # echo $[1<<31] >max_map_count 15:59 borg:/proc/sys/vm # cat max_map_count Killed This is because we have a chance to make 'max_map_count' negative. but it's meaningless. Make it only accept non-negative values. Reported-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: James Morris <jmorris@namei.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:23 UTC
f096e59	Huang Shijie	15 December 2009, 01:59:51 UTC	include/linux/mm.h: remove unneeded ifdef The check code for CONFIG_SWAP is redundant, because there is a non-CONFIG_SWAP version for PageSwapCache() which just returns 0. Signed-off-by: Huang Shijie <shijie8@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:22 UTC
c9d0bf2	Magnus Damm	15 December 2009, 01:59:49 UTC	mm: uncached vma support with writenotify Modify the generic mmap() code to keep the cache attribute in vma->vm_page_prot regardless if writenotify is enabled or not. Without this patch the cache configuration selected by f_op->mmap() is overwritten if writenotify is enabled, making it impossible to keep the vma uncached. Needed by drivers such as drivers/video/sh_mobile_lcdcfb.c which uses deferred io together with uncached memory. Signed-off-by: Magnus Damm <damm@opensource.se> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jaya Kumar <jayakumar.lkml@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:21 UTC
62c0c2f	Huang Shijie	15 December 2009, 01:59:48 UTC	vmscan: simplify code Simplify the code for shrink_inactive_list(). Signed-off-by: Huang Shijie <shijie8@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:21 UTC
b39415b	Rik van Riel	15 December 2009, 01:59:48 UTC	vmscan: do not evict inactive pages when skipping an active list scan In AIM7 runs, recent kernels start swapping out anonymous pages well before they should. This is due to shrink_list falling through to shrink_inactive_list if !inactive_anon_is_low(zone, sc), when all we really wanted to do is pre-age some anonymous pages to give them extra time to be referenced while on the inactive list. The obvious fix is to make sure that shrink_list does not fall through to scanning/reclaiming inactive pages when we called it to scan one of the active lists. This change should be safe because the loop in shrink_zone ensures that we will still shrink the anon and file inactive lists whenever we should. [kosaki.motohiro@jp.fujitsu.com: inactive_file_is_low() should be inactive_anon_is_low()] Reported-by: Larry Woodman <lwoodman@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Tomasz Chmielewski <mangoo@wpkg.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:21 UTC
8aa043d	Jan Beulich	15 December 2009, 01:59:46 UTC	mm/bootmem.c: properly __init-annotate helper functions Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:20 UTC
9ae49fa	David Rientjes	15 December 2009, 01:59:46 UTC	mm: slab-allocate memory section nodemask for large systems Nodemasks should not be allocated on the stack for large systems (when it is larger than 256 bytes) since there is a threat of overflow. This patch causes the unregister_mem_sect_under_nodes() nodemask to be allocated on the stack for smaller systems and be allocated by slab for larger systems. GFP_KERNEL is used since remove_memory_block() can block. Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Alex Chiang <achiang@hp.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:20 UTC
caed0f4	KOSAKI Motohiro	15 December 2009, 01:59:45 UTC	mm: simplify try_to_unmap_one() SWAP_MLOCK mean "We marked the page as PG_MLOCK, please move it to unevictable-lru". So, following code is easy confusable. if (vma->vm_flags & VM_LOCKED) { ret = SWAP_MLOCK; goto out_unmap; } Plus, if the VMA doesn't have VM_LOCKED, We don't need to check the needed of calling mlock_vma_page(). Also, add some commentary to try_to_unmap_one(). Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:20 UTC
23ce932	Rakib Mullick	15 December 2009, 01:59:44 UTC	mm: fix section mismatch in memory_hotplug.c __free_pages_bootmem() is a __meminit function - which has been called from put_pages_bootmem thus causes a section mismatch warning. We were warned by the following warning: LD mm/built-in.o WARNING: mm/built-in.o(.text+0x26b22): Section mismatch in reference from the function put_page_bootmem() to the function .meminit.text:__free_pages_bootmem() The function put_page_bootmem() references the function __meminit __free_pages_bootmem(). This is often because put_page_bootmem lacks a __meminit annotation or the annotation of __free_pages_bootmem is wrong. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:20 UTC
b76c8cf	Larry Woodman	15 December 2009, 01:59:37 UTC	hugetlb: prevent deadlock in __unmap_hugepage_range() when alloc_huge_page() fails hugetlb_fault() takes the mm->page_table_lock spinlock then calls hugetlb_cow(). If the alloc_huge_page() in hugetlb_cow() fails due to an insufficient huge page pool it calls unmap_ref_private() with the mm->page_table_lock held. unmap_ref_private() then calls unmap_hugepage_range() which tries to acquire the mm->page_table_lock. [<ffffffff810928c3>] print_circular_bug_tail+0x80/0x9f [<ffffffff8109280b>] ? check_noncircular+0xb0/0xe8 [<ffffffff810935e0>] __lock_acquire+0x956/0xc0e [<ffffffff81093986>] lock_acquire+0xee/0x12e [<ffffffff8111a7a6>] ? unmap_hugepage_range+0x3e/0x84 [<ffffffff8111a7a6>] ? unmap_hugepage_range+0x3e/0x84 [<ffffffff814c348d>] _spin_lock+0x40/0x89 [<ffffffff8111a7a6>] ? unmap_hugepage_range+0x3e/0x84 [<ffffffff8111afee>] ? alloc_huge_page+0x218/0x318 [<ffffffff8111a7a6>] unmap_hugepage_range+0x3e/0x84 [<ffffffff8111b2d0>] hugetlb_cow+0x1e2/0x3f4 [<ffffffff8111b935>] ? hugetlb_fault+0x453/0x4f6 [<ffffffff8111b962>] hugetlb_fault+0x480/0x4f6 [<ffffffff8111baee>] follow_hugetlb_page+0x116/0x2d9 [<ffffffff814c31a7>] ? _spin_unlock_irq+0x3a/0x5c [<ffffffff81107b4d>] __get_user_pages+0x2a3/0x427 [<ffffffff81107d0f>] get_user_pages+0x3e/0x54 [<ffffffff81040b8b>] get_user_pages_fast+0x170/0x1b5 [<ffffffff81160352>] dio_get_page+0x64/0x14a [<ffffffff8116112a>] __blockdev_direct_IO+0x4b7/0xb31 [<ffffffff8115ef91>] blkdev_direct_IO+0x58/0x6e [<ffffffff8115e0a4>] ? blkdev_get_blocks+0x0/0xb8 [<ffffffff810ed2c5>] generic_file_aio_read+0xdd/0x528 [<ffffffff81219da3>] ? avc_has_perm+0x66/0x8c [<ffffffff81132842>] do_sync_read+0xf5/0x146 [<ffffffff8107da00>] ? autoremove_wake_function+0x0/0x5a [<ffffffff81211857>] ? security_file_permission+0x24/0x3a [<ffffffff81132fd8>] vfs_read+0xb5/0x126 [<ffffffff81133f6b>] ? fget_light+0x5e/0xf8 [<ffffffff81133131>] sys_read+0x54/0x8c [<ffffffff81011e42>] system_call_fastpath+0x16/0x1b This can be fixed by dropping the mm->page_table_lock around the call to unmap_ref_private() if alloc_huge_page() fails, its dropped right below in the normal path anyway. However, earlier in the that function, it's also possible to call into the page allocator with the same spinlock held. What this patch does is drop the spinlock before the page allocator is potentially entered. The check for page allocation failure can be made without the page_table_lock as well as the copy of the huge page. Even if the PTE changed while the spinlock was held, the consequence is that a huge page is copied unnecessarily. This resolves both the double taking of the lock and sleeping with the spinlock held. [mel@csn.ul.ie: Cover also the case where process can sleep with spinlock] Signed-off-by: Larry Woodman <lwooman@redhat.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:20 UTC
b4e655a	Andrew Morton	15 December 2009, 01:59:35 UTC	mm: memory_hotplug: make offline_pages() static It has no references outside memory_hotplug.c. Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andi Kleen <andi@firstfloor.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:20 UTC
d0f209f	Hugh Dickins	15 December 2009, 01:59:34 UTC	ksm: remove unswappable max_kernel_pages Now that ksm pages are swappable, and the known holes plugged, remove mention of unswappable kernel pages from KSM documentation and comments. Remove the totalram_pages/4 initialization of max_kernel_pages. In fact, remove max_kernel_pages altogether - we can reinstate it if removal turns out to break someone's script; but if we later want to limit KSM's memory usage, limiting the stable nodes would not be an effective approach. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:20 UTC
62b61f6	Hugh Dickins	15 December 2009, 01:59:33 UTC	ksm: memory hotremove migration only The previous patch enables page migration of ksm pages, but that soon gets into trouble: not surprising, since we're using the ksm page lock to lock operations on its stable_node, but page migration switches the page whose lock is to be used for that. Another layer of locking would fix it, but do we need that yet? Do we actually need page migration of ksm pages? Yes, memory hotremove needs to offline sections of memory: and since we stopped allocating ksm pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE candidates for migration. But KSM is currently unconscious of NUMA issues, happily merging pages from different NUMA nodes: at present the rule must be, not to use MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of ksm pages does not make sense yet. So, to complete support for ksm swapping we need to make hotremove safe. ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages are freed before migration reaches them, stable_nodes may be left still pointing to struct pages which have been removed from the system: the stable_node needs to identify a page by pfn rather than page pointer, then it can safely prune them when MEM_OFFLINE. And make NUMA migration skip PageKsm pages where it skips PageReserved. But it's only when we reach unmap_and_move() that the page lock is taken and we can be sure that raised pagecount has prevented a PageAnon from being upgraded: so add offlining arg to migrate_pages(), to migrate ksm page when offlining (has sufficient locking) but reject it otherwise. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:20 UTC
e9995ef	Hugh Dickins	15 December 2009, 01:59:31 UTC	ksm: rmap_walk to remove_migation_ptes A side-effect of making ksm pages swappable is that they have to be placed on the LRUs: which then exposes them to isolate_lru_page() and hence to page migration. Add rmap_walk() for remove_migration_ptes() to use: rmap_walk_anon() and rmap_walk_file() in rmap.c, but rmap_walk_ksm() in ksm.c. Perhaps some consolidation with existing code is possible, but don't attempt that yet (try_to_unmap needs to handle nonlinears, but migration pte removal does not). rmap_walk() is sadly less general than it appears: rmap_walk_anon(), like remove_anon_migration_ptes() which it replaces, avoids calling page_lock_anon_vma(), because that includes a page_mapped() test which fails when all migration ptes are in place. That was valid when NUMA page migration was introduced (holding mmap_sem provided the missing guarantee that anon_vma's slab had not already been destroyed), but I believe not valid in the memory hotremove case added since. For now do the same as before, and consider the best way to fix that unlikely race later on. When fixed, we can probably use rmap_walk() on hwpoisoned ksm pages too: for now, they remain among hwpoison's various exceptions (its PageKsm test comes before the page is locked, but its page_lock_anon_vma fails safely if an anon gets upgraded). Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:20 UTC
407f9c8	Hugh Dickins	15 December 2009, 01:59:30 UTC	ksm: mem cgroup charge swapin copy But ksm swapping does require one small change in mem cgroup handling. When do_swap_page()'s call to ksm_might_need_to_copy() does indeed substitute a duplicate page to accommodate a different anon_vma (or a the !PageSwapCache check in mem_cgroup_try_charge_swapin(). That was returning success without charging, on the assumption that pte_same() would fail after, which is not the case here. Originally I proposed that success, so that an unshrinkable mem cgroup at its limit would not fail unnecessarily; but that's a minor point, and there are plenty of other places where we may fail an overallocation which might later prove unnecessary. So just go ahead and do what all the other exceptions do: proceed to charge current mm. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
80e1482	Hugh Dickins	15 December 2009, 01:59:29 UTC	ksm: share anon page without allocating When ksm pages were unswappable, it made no sense to include them in mem cgroup accounting; but now that they are swappable (although I see no strict logical connection) the principle of least surprise implies that they should be accounted (with the usual dissatisfaction, that a shared page is accounted to only one of the cgroups using it). This patch was intended to add mem cgroup accounting where necessary; but turned inside out, it now avoids allocating a ksm page, instead upgrading an anon page to ksm - which brings its existing mem cgroup accounting with it. Thus mem cgroups don't appear in the patch at all. This upgrade from PageAnon to PageKsm takes place under page lock (via a somewhat hacky NULL kpage interface), and audit showed only one place which needed to cope with the race - page_referenced() is sometimes used without page lock, so page_lock_anon_vma() needs an ACCESS_ONCE() to be sure of getting anon_vma and flags together (no problem if the page goes ksm an instant after, the integrity of that anon_vma list is unaffected). Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
4035c07	Hugh Dickins	15 December 2009, 01:59:27 UTC	ksm: take keyhole reference to page There's a lamentable flaw in KSM swapping: the stable_node holds a reference to the ksm page, so the page to be freed cannot actually be freed until ksmd works its way around to removing the last rmap_item from its stable_node. Which in some configurations may take minutes: not quite responsive enough for memory reclaim. And we don't want to twist KSM and its locking more tightly into the rest of mm. What a pity. But although the stable_node needs to hold a pointer to the ksm page, does it actually need to raise the reference count of that page? No. It would need to do so if struct pages were ordinary kmalloc'ed objects; but they are more stable than that, and reused in particular ways according to particular rules. Access to stable_node from its pointer in struct page is no problem, so long as we never free a stable_node before the ksm page itself has been freed. Access to struct page from its pointer in stable_node: reintroduce get_ksm_page(), and let that peep out through its keyhole (the stable_node pointer to ksm page), to see if that struct page still holds the right key to open it (the ksm page mapping pointer back to this stable_node). This relies upon the established way in which free_hot_cold_page() sets an anon (including ksm) page->mapping to NULL; and relies upon no other user of a struct page to put something which looks like the original stable_node pointer (with two low bits also set) into page->mapping. It also needs get_page_unless_zero() technique pioneered by speculative pagecache; and uses rcu_read_lock() to keep the guarantees that gives. There are several drivers which put pointers of their own into page-> mapping; but none of those could coincide with our stable_node pointers, since KSM won't free a stable_node until it sees that the page has gone. The only problem case found is the pagetable spinlock USE_SPLIT_PTLOCKS places in struct page (my own abuse): to accommodate GENERIC_LOCKBREAK's break_lock on 32-bit, that spans both page->private and page->mapping. Since break_lock is only 0 or 1, again no confusion for get_ksm_page(). But what of DEBUG_SPINLOCK on 64-bit bigendian? When owner_cpu is 3 (matching PageKsm low bits), it might see 0xdead4ead00000003 in page-> mapping, which might coincide? We could get around that by... but a better answer is to suppress USE_SPLIT_PTLOCKS when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, to stop bloating sizeof(struct page) in their case - already proposed in an earlier mm/Kconfig patch. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
db114b8	Hugh Dickins	15 December 2009, 01:59:25 UTC	ksm: hold anon_vma in rmap_item For full functionality, page_referenced_one() and try_to_unmap_one() need to know the vma: to pass vma down to arch-dependent flushes, or to observe VM_LOCKED or VM_EXEC. But KSM keeps no record of vma: nor can it, since vmas get split and merged without its knowledge. Instead, note page's anon_vma in its rmap_item when adding to stable tree: all the vmas which might map that page are listed by its anon_vma. page_referenced_ksm() and try_to_unmap_ksm() then traverse the anon_vma, first to find the probable vma, that which matches rmap_item's mm; but if that is not enough to locate all instances, traverse again to try the others. This catches those occasions when fork has duplicated a pte of a ksm page, but ksmd has not yet come around to assign it an rmap_item. But each rmap_item in the stable tree which refers to an anon_vma needs to take a reference to it. Andrea's anon_vma design cleverly avoided a reference count (an anon_vma was free when its list of vmas was empty), but KSM now needs to add that. Is a 32-bit count sufficient? I believe so - the anon_vma is only free when both count is 0 and list is empty. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
5ad6468	Hugh Dickins	15 December 2009, 01:59:24 UTC	ksm: let shared pages be swappable Initial implementation for swapping out KSM's shared pages: add page_referenced_ksm() and try_to_unmap_ksm(), which rmap.c calls when faced with a PageKsm page. Most of what's needed can be got from the rmap_items listed from the stable_node of the ksm page, without discovering the actual vma: so in this patch just fake up a struct vma for page_referenced_one() or try_to_unmap_one(), then refine that in the next patch. Add VM_NONLINEAR to ksm_madvise()'s list of exclusions: it has always been implicit there (being only set with VM_SHARED, already excluded), but let's make it explicit, to help justify the lack of nonlinear unmap. Rely on the page lock to protect against concurrent modifications to that page's node of the stable tree. The awkward part is not swapout but swapin: do_swap_page() and page_add_anon_rmap() now have to allow for new possibilities - perhaps a ksm page still in swapcache, perhaps a swapcache page associated with one location in one anon_vma now needed for another location or anon_vma. (And the vma might even be no longer VM_MERGEABLE when that happens.) ksm_might_need_to_copy() checks for that case, and supplies a duplicate page when necessary, simply leaving it to a subsequent pass of ksmd to rediscover the identity and merge them back into one ksm page. Disappointingly primitive: but the alternative would have to accumulate unswappable info about the swapped out ksm pages, limiting swappability. Remove page_add_ksm_rmap(): page_add_anon_rmap() now has to allow for the particular case it was handling, so just use it instead. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
73848b4	Hugh Dickins	15 December 2009, 01:59:22 UTC	ksm: fix mlockfreed to munlocked When KSM merges an mlocked page, it has been forgetting to munlock it: that's been left to free_page_mlock(), which reports it in /proc/vmstat as unevictable_pgs_mlockfreed instead of unevictable_pgs_munlocked (and whinges "Page flag mlocked set for process" in mmotm, whereas mainline is silently forgiving). Call munlock_vma_page() to fix that. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Chris Wright <chrisw@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
08beca4	Hugh Dickins	15 December 2009, 01:59:21 UTC	ksm: stable_node point to page and back Add a pointer to the ksm page into struct stable_node, holding a reference to the page while the node exists. Put a pointer to the stable_node into the ksm page's ->mapping. Then we don't need get_ksm_page() while traversing the stable tree: the page to compare against is sure to be present and correct, even if it's no longer visible through any of its existing rmap_items. And we can handle the forked ksm page case more efficiently: no need to memcmp our way through the tree to find its match. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
7b6ba2c	Hugh Dickins	15 December 2009, 01:59:20 UTC	ksm: separate stable_node Though we still do well to keep rmap_items in the unstable tree without a separate tree_item at the node, for several reasons it becomes awkward to keep rmap_items in the stable tree without a separate stable_node: lack of space in the nicely-sized rmap_item, the need for an anchor as rmap_items are removed, the need for a node even when temporarily no rmap_items are attached to it. So declare struct stable_node (rb_node to place it in the tree and hlist_head for the rmap_items hanging off it), and convert stable tree handling to use it: without yet taking advantage of it. Note how one stable_tree_insert() of a node now has _two_ stable_tree_append()s of the two rmap_items being merged. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
6514d51	Hugh Dickins	15 December 2009, 01:59:19 UTC	ksm: singly-linked rmap_list Free up a pointer in struct rmap_item, by making the mm_slot's rmap_list a singly-linked list: we always traverse that list sequentially, and we don't even lose any prefetches (but should consider adding a few later). Name it rmap_list throughout. Do we need to free up that pointer? Not immediately, and in the end, we could continue to avoid it with a union; but having done the conversion, let's keep it this way, since there's no downside, and maybe we'll want more in future (struct rmap_item is a cache-friendly 32 bytes on 32-bit and 64 bytes on 64-bit, so we shall want to avoid expanding it). Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
8dd3557	Hugh Dickins	15 December 2009, 01:59:18 UTC	ksm: cleanup some function arguments Cleanup: make argument names more consistent from cmp_and_merge_page() down to replace_page(), so that it's easier to follow the rmap_item's page and the matching tree_page and the merged kpage through that code. In some places, e.g. break_cow(), pass rmap_item instead of separate mm and address. cmp_and_merge_page() initialize tree_page to NULL, to avoid a "may be used uninitialized" warning seen in one config by Anil SB. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:19 UTC
31e855e	Hugh Dickins	15 December 2009, 01:59:17 UTC	ksm: remove redundancies when merging page There is no need for replace_page() to calculate a write-protected prot vm_page_prot must already be write-protected for an anonymous page (see mm/memory.c do_anonymous_page() for similar reliance on vm_page_prot). There is no need for try_to_merge_one_page() to get_page and put_page on newpage and oldpage: in every case we already hold a reference to each of them. But some instinct makes me move try_to_merge_one_page()'s unlock_page of oldpage down after replace_page(): that doesn't increase contention on the ksm page, and makes thinking about the transition easier. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:18 UTC
93d1771	Hugh Dickins	15 December 2009, 01:59:16 UTC	ksm: three remove_rmap_item_from_tree cleanups 1. remove_rmap_item_from_tree() is called as a precaution from various places: don't dirty the rmap_item cacheline unnecessarily, just mask the flags out of the address when they have been set. 2. First get_next_rmap_item() removes an unstable rmap_item from its tree, then shortly afterwards cmp_and_merge_page() removes a stable rmap_item from its tree: it's easier just to do both at once (but definitely keep the BUG_ON(age > 1) which guards against a future omission). 3. When cmp_and_merge_page() moves an rmap_item from unstable to stable tree, it does its own rb_erase() and accounting: that's better expressed by remove_rmap_item_from_tree(). Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:18 UTC
338fde9	KOSAKI Motohiro	15 December 2009, 01:59:15 UTC	vmscan: make consistent of reclaim bale out between do_try_to_free_page and shrink_zone Fix small inconsistent of ">" and ">=". Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:18 UTC
ece74b2	KOSAKI Motohiro	15 December 2009, 01:59:14 UTC	vmscan: kill sc.swap_cluster_max Now, All caller of reclaim use swap_cluster_max as SWAP_CLUSTER_MAX. Then, we can remove it perfectly. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:18 UTC
4f0ddfd	KOSAKI Motohiro	15 December 2009, 01:59:13 UTC	vmscan: zone_reclaim() don't use insane swap_cluster_max In old days, we didn't have sc.nr_to_reclaim and it brought sc.swap_cluster_max misuse. huge sc.swap_cluster_max might makes unnecessary OOM risk and no performance benefit. Now, we can stop its insane thing. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:18 UTC
7b51755	KOSAKI Motohiro	15 December 2009, 01:59:12 UTC	vmscan: kill hibernation specific reclaim logic and unify it shrink_all_zone() was introduced by commit d6277db4ab (swsusp: rework memory shrinker) for hibernate performance improvement. and sc.swap_cluster_max was introduced by commit a06fe4d307 (Speed freeing memory for suspend). commit a06fe4d307 said Without the patch: Freed 14600 pages in 1749 jiffies = 32.61 MB/s (Anomolous!) Freed 88563 pages in 14719 jiffies = 23.50 MB/s Freed 205734 pages in 32389 jiffies = 24.81 MB/s With the patch: Freed 68252 pages in 496 jiffies = 537.52 MB/s Freed 116464 pages in 569 jiffies = 798.54 MB/s Freed 209699 pages in 705 jiffies = 1161.89 MB/s At that time, their patch was pretty worth. However, Modern Hardware trend and recent VM improvement broke its worth. From several reason, I think we should remove shrink_all_zones() at all. detail: 1) Old days, shrink_zone()'s slowness was mainly caused by stupid io-throttle at no i/o congestion. but current shrink_zone() is sane, not slow. 2) shrink_all_zone() try to shrink all pages at a time. but it doesn't works fine on numa system. example) System has 4GB memory and each node have 2GB. and hibernate need 1GB. optimal) steal 500MB from each node. shrink_all_zones) steal 1GB from node-0. Oh, Cache balancing logic was broken. ;) Unfortunately, Desktop system moved ahead NUMA at nowadays. (Side note, if hibernate require 2GB, shrink_all_zones() never success on above machine) 3) if the node has several I/O flighting pages, shrink_all_zones() makes pretty bad result. schenario) hibernate need 1GB 1) shrink_all_zones() try to reclaim 1GB from Node-0 2) but it only reclaimed 990MB 3) stupidly, shrink_all_zones() try to reclaim 1GB from Node-1 4) it reclaimed 990MB Oh, well. it reclaimed twice much than required. In the other hand, current shrink_zone() has sane baling out logic. then, it doesn't make overkill reclaim. then, we lost shrink_zones()'s risk. 4) SplitLRU VM always keep active/inactive ratio very carefully. inactive list only shrinking break its assumption. it makes unnecessary OOM risk. it obviously suboptimal. Now, shrink_all_memory() is only the wrapper function of do_try_to_free_pages(). it bring good reviewability and debuggability, and solve above problems. side note: Reclaim logic unificication makes two good side effect. - Fix recursive reclaim bug on shrink_all_memory(). it did forgot to use PF_MEMALLOC. it mean the system be able to stuck into deadlock. - Now, shrink_all_memory() got lockdep awareness. it bring good debuggability. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:18 UTC
22fba33	KOSAKI Motohiro	15 December 2009, 01:59:10 UTC	vmscan: separate sc.swap_cluster_max and sc.nr_max_reclaim Currently, sc.scap_cluster_max has double meanings. 1) reclaim batch size as isolate_lru_pages()'s argument 2) reclaim baling out thresolds The two meanings pretty unrelated. Thus, Let's separate it. this patch doesn't change any behavior. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:18 UTC
cba5dd7	Alex Chiang	15 December 2009, 01:59:09 UTC	Documentation: ABI: /sys/devices/system/cpu/cpu#/node Describe NUMA node symlink created for CPUs when CONFIG_NUMA is set. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:18 UTC
1830794	Alex Chiang	15 December 2009, 01:59:08 UTC	mm: add numa node symlink for cpu devices in sysfs You can discover which CPUs belong to a NUMA node by examining /sys/devices/system/node/node#/ However, it's not convenient to go in the other direction, when looking at /sys/devices/system/cpu/cpu#/ Yes, you can muck about in sysfs, but adding these symlinks makes life a lot more convenient. Signed-off-by: Alex Chiang <achiang@hp.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:18 UTC
b9d52da	Alex Chiang	15 December 2009, 01:59:07 UTC	mm: refactor unregister_cpu_under_node() By returning early if the node is not online, we can unindent the interesting code by two levels. No functional change. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:17 UTC
f8246f3	Alex Chiang	15 December 2009, 01:59:06 UTC	mm: refactor register_cpu_under_node() By returning early if the node is not online, we can unindent the interesting code by one level. No functional change. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:17 UTC
dee5d0d	Alex Chiang	15 December 2009, 01:59:05 UTC	mm: add numa node symlink for memory section in sysfs Commit c04fc586c (mm: show node to memory section relationship with symlinks in sysfs) created symlinks from nodes to memory sections, e.g. /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 If you're examining the memory section though and are wondering what node it might belong to, you can find it by grovelling around in sysfs, but it's a little cumbersome. Add a reverse symlink for each memory section that points back to the node to which it belongs. Signed-off-by: Alex Chiang <achiang@hp.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: David Rientjes <rientjes@google.com> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:17 UTC
d99be1a	Hugh Dickins	15 December 2009, 01:59:04 UTC	mm: sigbus instead of abusing oom When do_nonlinear_fault() realizes that the page table must have been corrupted for it to have been called, it does print_bad_pte() and returns ... VM_FAULT_OOM, which is hard to understand. It made some sense when I did it for 2.6.15, when do_page_fault() just killed the current process; but nowadays it lets the OOM killer decide who to kill - so page table corruption in one process would be liable to kill another. Change it to return VM_FAULT_SIGBUS instead: that doesn't guarantee that the process will be killed, but is good enough for such a rare abnormality, accompanied as it is by the "BUG: Bad page map" message. And recent HWPOISON work has copied that code into do_swap_page(), when it finds an impossible swap entry: fix that to VM_FAULT_SIGBUS too. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:17 UTC
a70caa8	Hugh Dickins	15 December 2009, 01:59:02 UTC	mm: stop ptlock enlarging struct page CONFIG_DEBUG_SPINLOCK adds 12 or 16 bytes to a 32- or 64-bit spinlock_t, and CONFIG_DEBUG_LOCK_ALLOC adds another 12 or 24 bytes to it: lockdep enables both of those, and CONFIG_LOCK_STAT adds 8 or 16 bytes to that. When 2.6.15 placed the split page table lock inside struct page (usually sized 32 or 56 bytes), only CONFIG_DEBUG_SPINLOCK was a possibility, and we ignored the enlargement (but fitted in CONFIG_GENERIC_LOCKBREAK's 4 by letting the spinlock_t occupy both page->private and page->mapping). Should these debugging options be allowed to double the size of a struct page, when only one minority use of the page (as a page table) needs to fit a spinlock in there? Perhaps not. Take the easy way out: switch off SPLIT_PTLOCK_CPUS when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC is in force. I've sometimes tried to be cleverer, kmallocing a cacheline for the spinlock when it doesn't fit, but given up each time. Falling back to mm->page_table_lock (as we do when ptlock is not split) lets lockdep check out the strictest path anyway. And now that some arches allow 8192 cpus, use 999999 for infinity. (What has this got to do with KSM swapping? It doesn't care about the size of struct page, but may care about random junk in page->mapping - to be explained separately later.) Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:17 UTC
1cb1729	Hugh Dickins	15 December 2009, 01:59:01 UTC	mm: pass address down to rmap ones KSM swapping will know where page_referenced_one() and try_to_unmap_one() should look. It could hack page->index to get them to do what it wants, but it seems cleaner now to pass the address down to them. Make the same change to page_mkclean_one(), since it follows the same pattern; but there's no real need in its case. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:17 UTC
af8e335	Hugh Dickins	15 December 2009, 01:58:59 UTC	mm: CONFIG_MMU for PG_mlocked Remove three degrees of obfuscation, left over from when we had CONFIG_UNEVICTABLE_LRU. MLOCK_PAGES is CONFIG_HAVE_MLOCKED_PAGE_BIT is CONFIG_HAVE_MLOCK is CONFIG_MMU. rmap.o (and memory-failure.o) are only built when CONFIG_MMU, so don't need such conditions at all. Somehow, I feel no compulsion to remove the CONFIG_HAVE_MLOCK* lines from 169 defconfigs: leave those to evolve in due course. Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:17 UTC
53f79ac	Hugh Dickins	15 December 2009, 01:58:58 UTC	mm: mlocking in try_to_unmap_one There's contorted mlock/munlock handling in try_to_unmap_anon() and try_to_unmap_file(), which we'd prefer not to repeat for KSM swapping. Simplify it by moving it all down into try_to_unmap_one(). One thing is then lost, try_to_munlock()'s distinction between when no vma holds the page mlocked, and when a vma does mlock it, but we could not get mmap_sem to set the page flag. But its only caller takes no interest in that distinction (and is better testing SWAP_MLOCK anyway), so let's keep the code simple and return SWAP_AGAIN for both cases. try_to_unmap_file()'s TTU_MUNLOCK nonlinear handling was particularly amusing: once unravelled, it turns out to have been choosing between two different ways of doing the same nothing. Ah, no, one way was actually returning SWAP_FAIL when it meant to return SWAP_SUCCESS. [kosaki.motohiro@jp.fujitsu.com: comment adding to mlocking in try_to_unmap_one] [akpm@linux-foundation.org: remove test of MLOCK_PAGES] Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:17 UTC
3ca7b3c	Hugh Dickins	15 December 2009, 01:58:57 UTC	mm: define PAGE_MAPPING_FLAGS At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit set in page->mapping, with the higher bits a pointer to the anon_vma; and have defined PageKsm(page) as that with NULL anon_vma. But KSM swapping will need to store a pointer there: so in preparation for that, now define PAGE_MAPPING_FLAGS as the low two bits, including PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some other use for the bit emerges). Declare page_rmapping(page) to return the pointer part of page->mapping, and page_anon_vma(page) to return the anon_vma pointer when that's what it is. Use these in a few appropriate places: notably, unuse_vma() has been testing page->mapping, but is better to be testing page_anon_vma() (cases may be added in which flag bits are set without any pointer). Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Izik Eidus <ieidus@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nick Piggin <npiggin@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:17 UTC
bb3ab59	KOSAKI Motohiro	15 December 2009, 01:58:55 UTC	vmscan: stop kswapd waiting on congestion when the min watermark is not being met If reclaim fails to make sufficient progress, the priority is raised. Once the priority is higher, kswapd starts waiting on congestion. However, if the zone is below the min watermark then kswapd needs to continue working without delay as there is a danger of an increased rate of GFP_ATOMIC allocation failure. This patch changes the conditions under which kswapd waits on congestion by only going to sleep if the min watermarks are being met. [mel@csn.ul.ie: add stats to track how relevant the logic is] [mel@csn.ul.ie: make kswapd only check its own zones and rename the relevant counters] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:16 UTC
f50de2d	Mel Gorman	15 December 2009, 01:58:53 UTC	vmscan: have kswapd sleep for a short interval and double check it should be asleep After kswapd balances all zones in a pgdat, it goes to sleep. In the event of no IO congestion, kswapd can go to sleep very shortly after the high watermark was reached. If there are a constant stream of allocations from parallel processes, it can mean that kswapd went to sleep too quickly and the high watermark is not being maintained for sufficient length time. This patch makes kswapd go to sleep as a two-stage process. It first tries to sleep for HZ/10. If it is woken up by another process or the high watermark is no longer met, it's considered a premature sleep and kswapd continues work. Otherwise it goes fully to sleep. This adds more counters to distinguish between fast and slow breaches of watermarks. A "fast" premature sleep is one where the low watermark was hit in a very short time after kswapd going to sleep. A "slow" premature sleep indicates that the high watermark was breached after a very short interval. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Frans Pop <elendil@planet.nl> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:16 UTC
273f047	Huang Shijie	15 December 2009, 01:58:51 UTC	rmap: move label `out' to a better place When the code jumps to the `out', `referenced' is still zero. So there is no need to check it. Signed-off-by: Huang Shijie <shijie8@gmail.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:16 UTC
7b51159	Huang Shijie	15 December 2009, 01:58:51 UTC	rmap: simplify try_to_unmap_file() Just simplify the code when `mlocked' is true. Signed-off-by: Huang Shijie <shijie8@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:16 UTC
8051be5	Huang Shijie	15 December 2009, 01:58:50 UTC	rmap: fix the comment for try_to_unmap_anon Fix the comment for try_to_unmap_anon() with the new arguments. Signed-off-by: Huang Shijie <shijie8@gmail.com> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:16 UTC
6aceb53	Vincent Li	15 December 2009, 01:58:49 UTC	mm/vmscan: change comment generic_file_write to __generic_file_aio_write Commit 543ade1fc9 ("Streamline generic_file_* interfaces and filemap cleanups") removed generic_file_write() in filemap. Change the comment in vmscan pageout() to __generic_file_aio_write(). Signed-off-by: Vincent Li <macli@brc.ubc.ca> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	15 December 2009, 16:53:16 UTC

Newer
Older