Revision fb59e9f1e9786635ea12e12bf6adbb132e10f979 authored by Hugh Dickins on 04 March 2008, 22:29:16 UTC, committed by Linus Torvalds on 05 March 2008, 00:35:15 UTC
While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list
called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list.
 I couldn't see what racing tasks on other cpus were doing, but surmise that
another must have been in mem_cgroup_charge_common on the same page, between
its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc
which I'd almost changed to a kmalloc).

Normally such a race cannot happen, the ref_cnt prevents it, the final
uncharge cannot race with the initial charge.  But force_empty buggers the
ref_cnt, that's what it's all about; and thereafter forced pages are
vulnerable to races such as this (just think of a shared page also mapped into
an mm of another mem_cgroup than that just emptied).  And remain vulnerable
until they're freed indefinitely later.

This patch just fixes the oops by moving the unlock_page_cgroups down below
adding to and removing from the list (only possible given the previous patch);
and while we're at it, we might as well make it an invariant that
page->page_cgroup is always set while pc is on lru.

But this behaviour of force_empty seems highly unsatisfactory to me: why have
a ref_cnt if we always have to cope with it being violated (as in the earlier
page migration patch).  We may prefer force_empty to move pages to an orphan
mem_cgroup (could be the root, but better not), from which other cgroups could
recover them; we might need to reverse the locking again; but no time now for
such concerns.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 9b3c0a0
History
File Mode Size
irq
power
time
.gitignore -rw-r--r-- 63 bytes
Kconfig.hz -rw-r--r-- 1.6 KB
Kconfig.preempt -rw-r--r-- 2.3 KB
Makefile -rw-r--r-- 3.8 KB
acct.c -rw-r--r-- 15.7 KB
audit.c -rw-r--r-- 36.3 KB
audit.h -rw-r--r-- 5.9 KB
audit_tree.c -rw-r--r-- 20.6 KB
auditfilter.c -rw-r--r-- 45.8 KB
auditsc.c -rw-r--r-- 62.3 KB
backtracetest.c -rw-r--r-- 1.3 KB
capability.c -rw-r--r-- 8.3 KB
cgroup.c -rw-r--r-- 78.3 KB
cgroup_debug.c -rw-r--r-- 1.9 KB
compat.c -rw-r--r-- 27.4 KB
configs.c -rw-r--r-- 3.1 KB
cpu.c -rw-r--r-- 9.7 KB
cpuset.c -rw-r--r-- 67.1 KB
delayacct.c -rw-r--r-- 4.4 KB
dma.c -rw-r--r-- 3.7 KB
exec_domain.c -rw-r--r-- 4.3 KB
exit.c -rw-r--r-- 41.5 KB
extable.c -rw-r--r-- 2.0 KB
fork.c -rw-r--r-- 43.2 KB
futex.c -rw-r--r-- 51.9 KB
futex_compat.c -rw-r--r-- 4.4 KB
hrtimer.c -rw-r--r-- 36.7 KB
itimer.c -rw-r--r-- 7.8 KB
kallsyms.c -rw-r--r-- 11.8 KB
kexec.c -rw-r--r-- 35.3 KB
kfifo.c -rw-r--r-- 5.1 KB
kmod.c -rw-r--r-- 14.1 KB
kprobes.c -rw-r--r-- 25.9 KB
ksysfs.c -rw-r--r-- 3.8 KB
kthread.c -rw-r--r-- 7.1 KB
latencytop.c -rw-r--r-- 5.2 KB
lockdep.c -rw-r--r-- 79.1 KB
lockdep_internals.h -rw-r--r-- 2.4 KB
lockdep_proc.c -rw-r--r-- 17.0 KB
marker.c -rw-r--r-- 23.7 KB
module.c -rw-r--r-- 66.3 KB
mutex-debug.c -rw-r--r-- 3.0 KB
mutex-debug.h -rw-r--r-- 1.7 KB
mutex.c -rw-r--r-- 10.4 KB
mutex.h -rw-r--r-- 1.1 KB
notifier.c -rw-r--r-- 15.8 KB
ns_cgroup.c -rw-r--r-- 2.2 KB
nsproxy.c -rw-r--r-- 5.1 KB
panic.c -rw-r--r-- 7.6 KB
params.c -rw-r--r-- 17.5 KB
pid.c -rw-r--r-- 12.7 KB
pid_namespace.c -rw-r--r-- 4.3 KB
pm_qos_params.c -rw-r--r-- 11.5 KB
posix-cpu-timers.c -rw-r--r-- 43.5 KB
posix-timers.c -rw-r--r-- 29.3 KB
printk.c -rw-r--r-- 32.2 KB
profile.c -rw-r--r-- 15.9 KB
ptrace.c -rw-r--r-- 15.5 KB
rcuclassic.c -rw-r--r-- 15.6 KB
rcupdate.c -rw-r--r-- 4.2 KB
rcupreempt.c -rw-r--r-- 31.7 KB
rcupreempt_trace.c -rw-r--r-- 9.3 KB
rcutorture.c -rw-r--r-- 26.6 KB
relay.c -rw-r--r-- 28.9 KB
res_counter.c -rw-r--r-- 2.8 KB
resource.c -rw-r--r-- 16.1 KB
rtmutex-debug.c -rw-r--r-- 5.7 KB
rtmutex-debug.h -rw-r--r-- 1.4 KB
rtmutex-tester.c -rw-r--r-- 9.0 KB
rtmutex.c -rw-r--r-- 25.9 KB
rtmutex.h -rw-r--r-- 1.1 KB
rtmutex_common.h -rw-r--r-- 3.3 KB
rwsem.c -rw-r--r-- 2.6 KB
sched.c -rw-r--r-- 194.0 KB
sched_debug.c -rw-r--r-- 9.3 KB
sched_fair.c -rw-r--r-- 33.1 KB
sched_idletask.c -rw-r--r-- 2.9 KB
sched_rt.c -rw-r--r-- 27.8 KB
sched_stats.h -rw-r--r-- 7.1 KB
seccomp.c -rw-r--r-- 1.6 KB
signal.c -rw-r--r-- 65.4 KB
softirq.c -rw-r--r-- 14.6 KB
softlockup.c -rw-r--r-- 7.7 KB
spinlock.c -rw-r--r-- 11.1 KB
srcu.c -rw-r--r-- 8.4 KB
stacktrace.c -rw-r--r-- 462 bytes
stop_machine.c -rw-r--r-- 5.0 KB
sys.c -rw-r--r-- 42.5 KB
sys_ni.c -rw-r--r-- 4.6 KB
sysctl.c -rw-r--r-- 67.1 KB
sysctl_check.c -rw-r--r-- 50.3 KB
taskstats.c -rw-r--r-- 13.6 KB
test_kprobes.c -rw-r--r-- 4.9 KB
time.c -rw-r--r-- 17.3 KB
timeconst.pl -rw-r--r-- 9.4 KB
timer.c -rw-r--r-- 36.3 KB
tsacct.c -rw-r--r-- 3.9 KB
uid16.c -rw-r--r-- 5.1 KB
user.c -rw-r--r-- 12.2 KB
user_namespace.c -rw-r--r-- 1.6 KB
utsname.c -rw-r--r-- 1.5 KB
utsname_sysctl.c -rw-r--r-- 3.4 KB
wait.c -rw-r--r-- 7.1 KB
workqueue.c -rw-r--r-- 21.6 KB

back to top