Revision - 8aef188 - VFS: Fix vfsmount overput on simultaneous automount

Revision 8aef18845266f5c05904c610088f2d1ed58f6be3 authored by Al Viro on 16 June 2011, 14:10:06 UTC, committed by Al Viro on 16 June 2011, 15:28:16 UTC

VFS: Fix vfsmount overput on simultaneous automount

[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <sys/wait.h>
	int main(int argc, char **argv)
	{
		int pid, ws;
		struct stat buf;
		pid = fork();
		stat(argv[1], &buf);
		if (pid > 0) wait(&ws);
		return 0;
	}

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

	mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

	/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

	umount /mnt/data

 (4) Unmount the original mount:

	umount /mnt

     At this point the kernel should throw a BUG with something like the
     following:

	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

1 parent 50338b8

Files
Changes

Permalinks

SM501.txt

			SM501 Driver
			============

Copyright 2006, 2007 Simtec Electronics

The Silicon Motion SM501 multimedia companion chip is a multifunction device
which may provide numerous interfaces including USB host controller USB gadget,
asynchronous serial ports, audio functions, and a dual display video interface.
The device may be connected by PCI or local bus with varying functions enabled.

Core
----

The core driver in drivers/mfd provides common services for the
drivers which manage the specific hardware blocks. These services
include locking for common registers, clock control and resource
management.

The core registers drivers for both PCI and generic bus based
chips via the platform device and driver system.

On detection of a device, the core initialises the chip (which may
be specified by the platform data) and then exports the selected
peripheral set as platform devices for the specific drivers.

The core re-uses the platform device system as the platform device
system provides enough features to support the drivers without the
need to create a new bus-type and the associated code to go with it.


Resources
---------

Each peripheral has a view of the device which is implicitly narrowed to
the specific set of resources that peripheral requires in order to
function correctly.

The centralised memory allocation allows the driver to ensure that the
maximum possible resource allocation can be made to the video subsystem
as this is by-far the most resource-sensitive of the on-chip functions.

The primary issue with memory allocation is that of moving the video
buffers once a display mode is chosen. Indeed when a video mode change
occurs the memory footprint of the video subsystem changes.

Since video memory is difficult to move without changing the display
(unless sufficient contiguous memory can be provided for the old and new
modes simultaneously) the video driver fully utilises the memory area
given to it by aligning fb0 to the start of the area and fb1 to the end
of it. Any memory left over in the middle is used for the acceleration
functions, which are transient and thus their location is less critical
as it can be moved.


Configuration
-------------

The platform device driver uses a set of platform data to pass
configurations through to the core and the subsidiary drivers
so that there can be support for more than one system carrying
an SM501 built into a single kernel image.

The PCI driver assumes that the PCI card behaves as per the Silicon
Motion reference design.

There is an errata (AB-5) affecting the selection of the
of the M1XCLK and M1CLK frequencies. These two clocks
must be sourced from the same PLL, although they can then
be divided down individually. If this is not set, then SM501 may
lock and hang the whole system. The driver will refuse to
attach if the PLL selection is different.

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...