Revision 8aef18845266f5c05904c610088f2d1ed58f6be3 authored by Al Viro on 16 June 2011, 14:10:06 UTC, committed by Al Viro on 16 June 2011, 15:28:16 UTC
[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <sys/wait.h>
	int main(int argc, char **argv)
	{
		int pid, ws;
		struct stat buf;
		pid = fork();
		stat(argv[1], &buf);
		if (pid > 0) wait(&ws);
		return 0;
	}

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

	mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

	/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

	umount /mnt/data

 (4) Unmount the original mount:

	umount /mnt

     At this point the kernel should throw a BUG with something like the
     following:

	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1 parent 50338b8
History
File Mode Size
ABI
DocBook
PCI
RCU
accounting
acpi
aoe
arm
auxdisplay
blackfin
block
blockdev
cdrom
cgroups
connector
console
cpu-freq
cpuidle
cris
crypto
development-process
device-mapper
devicetree
driver-model
dvb
early-userspace
fault-injection
fb
filesystems
firmware_class
frv
hid
hwmon
i2c
i2o
ia64
ide
infiniband
input
ioctl
isdn
ja_JP
kbuild
kdump
ko_KR
laptops
leds
m68k
make
mips
misc-devices
mmc
mn10300
mtd
namespaces
netlabel
networking
nfc
parisc
pcmcia
power
powerpc
pps
prctl
pti
ptp
rapidio
s390
scheduler
scsi
security
serial
sh
sound
sparc
spi
sysctl
target
telephony
thermal
timers
trace
usb
video4linux
virtual
vm
w1
watchdog
wimax
x86
zh_CN
.gitignore -rw-r--r-- 107 bytes
00-INDEX -rw-r--r-- 11.8 KB
BUG-HUNTING -rw-r--r-- 8.1 KB
Changes -rw-r--r-- 12.1 KB
CodingStyle -rw-r--r-- 29.1 KB
DMA-API-HOWTO.txt -rw-r--r-- 28.0 KB
DMA-API.txt -rw-r--r-- 26.5 KB
DMA-ISA-LPC.txt -rw-r--r-- 5.2 KB
DMA-attributes.txt -rw-r--r-- 1.3 KB
HOWTO -rw-r--r-- 27.2 KB
IPMI.txt -rw-r--r-- 28.5 KB
IRQ-affinity.txt -rw-r--r-- 2.5 KB
IRQ.txt -rw-r--r-- 962 bytes
Intel-IOMMU.txt -rw-r--r-- 3.8 KB
Makefile -rw-r--r-- 160 bytes
ManagementStyle -rw-r--r-- 12.9 KB
SAK.txt -rw-r--r-- 2.8 KB
SM501.txt -rw-r--r-- 2.8 KB
SecurityBugs -rw-r--r-- 1.8 KB
SubmitChecklist -rw-r--r-- 4.3 KB
SubmittingDrivers -rw-r--r-- 6.3 KB
SubmittingPatches -rw-r--r-- 28.5 KB
VGA-softcursor.txt -rw-r--r-- 2.0 KB
applying-patches.txt -rw-r--r-- 19.5 KB
atomic_ops.txt -rw-r--r-- 19.0 KB
bad_memory.txt -rw-r--r-- 1.1 KB
basic_profiling.txt -rw-r--r-- 1.7 KB
binfmt_misc.txt -rw-r--r-- 5.9 KB
braille-console.txt -rw-r--r-- 1.4 KB
bt8xxgpio.txt -rw-r--r-- 4.3 KB
btmrvl.txt -rw-r--r-- 2.9 KB
bus-virt-phys-mapping.txt -rw-r--r-- 7.9 KB
cachetlb.txt -rw-r--r-- 17.1 KB
circular-buffers.txt -rw-r--r-- 7.6 KB
coccinelle.txt -rw-r--r-- 7.9 KB
cpu-hotplug.txt -rw-r--r-- 14.6 KB
cpu-load.txt -rw-r--r-- 3.0 KB
cputopology.txt -rw-r--r-- 3.8 KB
dcdbas.txt -rw-r--r-- 3.6 KB
debugging-modules.txt -rw-r--r-- 954 bytes
debugging-via-ohci1394.txt -rw-r--r-- 7.4 KB
dell_rbu.txt -rw-r--r-- 4.9 KB
devices.txt -rw-r--r-- 115.8 KB
dmaengine.txt -rw-r--r-- 4.4 KB
dontdiff -rw-r--r-- 2.5 KB
dynamic-debug-howto.txt -rw-r--r-- 9.5 KB
edac.txt -rw-r--r-- 26.9 KB
eisa.txt -rw-r--r-- 7.1 KB
email-clients.txt -rw-r--r-- 8.8 KB
feature-removal-schedule.txt -rw-r--r-- 21.7 KB
flexible-arrays.txt -rw-r--r-- 5.5 KB
futex-requeue-pi.txt -rw-r--r-- 5.0 KB
gcov.txt -rw-r--r-- 7.5 KB
gpio.txt -rw-r--r-- 30.4 KB
highuid.txt -rw-r--r-- 2.4 KB
hw_random.txt -rw-r--r-- 3.5 KB
hwspinlock.txt -rw-r--r-- 11.7 KB
init.txt -rw-r--r-- 2.5 KB
initrd.txt -rw-r--r-- 14.1 KB
intel_txt.txt -rw-r--r-- 10.2 KB
io-mapping.txt -rw-r--r-- 3.2 KB
io_ordering.txt -rw-r--r-- 1.9 KB
iostats.txt -rw-r--r-- 7.9 KB
irqflags-tracing.txt -rw-r--r-- 2.6 KB
isapnp.txt -rw-r--r-- 433 bytes
java.txt -rw-r--r-- 10.7 KB
kernel-doc-nano-HOWTO.txt -rw-r--r-- 11.4 KB
kernel-docs.txt -rw-r--r-- 33.6 KB
kernel-parameters.txt -rw-r--r-- 92.5 KB
kmemcheck.txt -rw-r--r-- 29.8 KB
kmemleak.txt -rw-r--r-- 7.7 KB
kobject.txt -rw-r--r-- 17.6 KB
kprobes.txt -rw-r--r-- 29.5 KB
kref.txt -rw-r--r-- 6.1 KB
ldm.txt -rw-r--r-- 3.8 KB
local_ops.txt -rw-r--r-- 6.1 KB
lockdep-design.txt -rw-r--r-- 8.8 KB
lockstat.txt -rw-r--r-- 10.7 KB
logo.gif -rw-r--r-- 16.0 KB
logo.txt -rw-r--r-- 563 bytes
magic-number.txt -rw-r--r-- 9.7 KB
mca.txt -rw-r--r-- 11.3 KB
md.txt -rw-r--r-- 24.1 KB
media-framework.txt -rw-r--r-- 13.9 KB
memory-barriers.txt -rw-r--r-- 82.3 KB
memory-hotplug.txt -rw-r--r-- 15.0 KB
memory.txt -rw-r--r-- 1.2 KB
mono.txt -rw-r--r-- 2.5 KB
mutex-design.txt -rw-r--r-- 5.8 KB
nmi_watchdog.txt -rw-r--r-- 4.2 KB
nommu-mmap.txt -rw-r--r-- 12.7 KB
numastat.txt -rw-r--r-- 866 bytes
oops-tracing.txt -rw-r--r-- 12.5 KB
padata.txt -rw-r--r-- 7.3 KB
parport-lowlevel.txt -rw-r--r-- 32.2 KB
parport.txt -rw-r--r-- 8.8 KB
pi-futex.txt -rw-r--r-- 5.7 KB
pnp.txt -rw-r--r-- 6.8 KB
preempt-locking.txt -rw-r--r-- 5.2 KB
printk-formats.txt -rw-r--r-- 1.0 KB
prio_tree.txt -rw-r--r-- 5.2 KB
rbtree.txt -rw-r--r-- 8.8 KB
rfkill.txt -rw-r--r-- 4.8 KB
robust-futex-ABI.txt -rw-r--r-- 8.7 KB
robust-futexes.txt -rw-r--r-- 9.4 KB
rt-mutex-design.txt -rw-r--r-- 32.8 KB
rt-mutex.txt -rw-r--r-- 3.5 KB
rtc.txt -rw-r--r-- 15.5 KB
serial-console.txt -rw-r--r-- 4.0 KB
sgi-ioc4.txt -rw-r--r-- 2.0 KB
sgi-visws.txt -rw-r--r-- 678 bytes
sparse.txt -rw-r--r-- 3.0 KB
spinlocks.txt -rw-r--r-- 7.5 KB
stable_api_nonsense.txt -rw-r--r-- 9.2 KB
stable_kernel_rules.txt -rw-r--r-- 3.6 KB
svga.txt -rw-r--r-- 14.1 KB
sysfs-rules.txt -rw-r--r-- 8.1 KB
sysrq.txt -rw-r--r-- 11.6 KB
unaligned-memory-access.txt -rw-r--r-- 10.0 KB
unicode.txt -rw-r--r-- 6.5 KB
unshare.txt -rw-r--r-- 13.1 KB
vgaarbiter.txt -rw-r--r-- 8.1 KB
video-output.txt -rw-r--r-- 1.1 KB
volatile-considered-harmful.txt -rw-r--r-- 5.6 KB
workqueue.txt -rw-r--r-- 15.6 KB
xz.txt -rw-r--r-- 5.7 KB
zorro.txt -rw-r--r-- 2.8 KB

back to top