Revision fb59e9f1e9786635ea12e12bf6adbb132e10f979 authored by Hugh Dickins on 04 March 2008, 22:29:16 UTC, committed by Linus Torvalds on 05 March 2008, 00:35:15 UTC
While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list
called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list.
 I couldn't see what racing tasks on other cpus were doing, but surmise that
another must have been in mem_cgroup_charge_common on the same page, between
its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc
which I'd almost changed to a kmalloc).

Normally such a race cannot happen, the ref_cnt prevents it, the final
uncharge cannot race with the initial charge.  But force_empty buggers the
ref_cnt, that's what it's all about; and thereafter forced pages are
vulnerable to races such as this (just think of a shared page also mapped into
an mm of another mem_cgroup than that just emptied).  And remain vulnerable
until they're freed indefinitely later.

This patch just fixes the oops by moving the unlock_page_cgroups down below
adding to and removing from the list (only possible given the previous patch);
and while we're at it, we might as well make it an invariant that
page->page_cgroup is always set while pc is on lru.

But this behaviour of force_empty seems highly unsatisfactory to me: why have
a ref_cnt if we always have to cope with it being violated (as in the earlier
page migration patch).  We may prefer force_empty to move pages to an orphan
mem_cgroup (could be the root, but better not), from which other cgroups could
recover them; we might need to reverse the locking again; but no time now for
such concerns.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 9b3c0a0
Raw File
Kconfig.preempt

choice
	prompt "Preemption Model"
	default PREEMPT_NONE

config PREEMPT_NONE
	bool "No Forced Preemption (Server)"
	help
	  This is the traditional Linux preemption model, geared towards
	  throughput. It will still provide good latencies most of the
	  time, but there are no guarantees and occasional longer delays
	  are possible.

	  Select this option if you are building a kernel for a server or
	  scientific/computation system, or if you want to maximize the
	  raw processing power of the kernel, irrespective of scheduling
	  latencies.

config PREEMPT_VOLUNTARY
	bool "Voluntary Kernel Preemption (Desktop)"
	help
	  This option reduces the latency of the kernel by adding more
	  "explicit preemption points" to the kernel code. These new
	  preemption points have been selected to reduce the maximum
	  latency of rescheduling, providing faster application reactions,
	  at the cost of slightly lower throughput.

	  This allows reaction to interactive events by allowing a
	  low priority process to voluntarily preempt itself even if it
	  is in kernel mode executing a system call. This allows
	  applications to run more 'smoothly' even when the system is
	  under load.

	  Select this if you are building a kernel for a desktop system.

config PREEMPT
	bool "Preemptible Kernel (Low-Latency Desktop)"
	help
	  This option reduces the latency of the kernel by making
	  all kernel code (that is not executing in a critical section)
	  preemptible.  This allows reaction to interactive events by
	  permitting a low priority process to be preempted involuntarily
	  even if it is in kernel mode executing a system call and would
	  otherwise not be about to reach a natural preemption point.
	  This allows applications to run more 'smoothly' even when the
	  system is under load, at the cost of slightly lower throughput
	  and a slight runtime overhead to kernel code.

	  Select this if you are building a kernel for a desktop or
	  embedded system with latency requirements in the milliseconds
	  range.

endchoice

config RCU_TRACE
	bool "Enable tracing for RCU - currently stats in debugfs"
	select DEBUG_FS
	default y
	help
	  This option provides tracing in RCU which presents stats
	  in debugfs for debugging RCU implementation.

	  Say Y here if you want to enable RCU tracing
	  Say N if you are unsure.
back to top