Revision 8aef18845266f5c05904c610088f2d1ed58f6be3 authored by Al Viro on 16 June 2011, 14:10:06 UTC, committed by Al Viro on 16 June 2011, 15:28:16 UTC
[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <sys/wait.h>
	int main(int argc, char **argv)
	{
		int pid, ws;
		struct stat buf;
		pid = fork();
		stat(argv[1], &buf);
		if (pid > 0) wait(&ws);
		return 0;
	}

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

	mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

	/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

	umount /mnt/data

 (4) Unmount the original mount:

	umount /mnt

     At this point the kernel should throw a BUG with something like the
     following:

	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.

Tested-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Ian Kent <raven@themaw.net>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1 parent 50338b8
Raw File
Changes
Intro
=====

This document is designed to provide a list of the minimum levels of
software necessary to run the 2.6 kernels, as well as provide brief
instructions regarding any other "Gotchas" users may encounter when
trying life on the Bleeding Edge.  If upgrading from a pre-2.4.x
kernel, please consult the Changes file included with 2.4.x kernels for
additional information; most of that information will not be repeated
here.  Basically, this document assumes that your system is already
functional and running at least 2.4.x kernels.

This document is originally based on my "Changes" file for 2.0.x kernels
and therefore owes credit to the same people as that file (Jared Mauch,
Axel Boldt, Alessandro Sigala, and countless other users all over the
'net).

Current Minimal Requirements
============================

Upgrade to at *least* these software revisions before thinking you've
encountered a bug!  If you're unsure what version you're currently
running, the suggested command should tell you.

Again, keep in mind that this list assumes you are already
functionally running a Linux 2.4 kernel.  Also, not all tools are
necessary on all systems; obviously, if you don't have any ISDN
hardware, for example, you probably needn't concern yourself with
isdn4k-utils.

o  Gnu C                  3.2                     # gcc --version
o  Gnu make               3.80                    # make --version
o  binutils               2.12                    # ld -v
o  util-linux             2.10o                   # fdformat --version
o  module-init-tools      0.9.10                  # depmod -V
o  e2fsprogs              1.41.4                  # e2fsck -V
o  jfsutils               1.1.3                   # fsck.jfs -V
o  reiserfsprogs          3.6.3                   # reiserfsck -V
o  xfsprogs               2.6.0                   # xfs_db -V
o  squashfs-tools         4.0                     # mksquashfs -version
o  btrfs-progs            0.18                    # btrfsck
o  pcmciautils            004                     # pccardctl -V
o  quota-tools            3.09                    # quota -V
o  PPP                    2.4.0                   # pppd --version
o  isdn4k-utils           3.1pre1                 # isdnctrl 2>&1|grep version
o  nfs-utils              1.0.5                   # showmount --version
o  procps                 3.2.0                   # ps --version
o  oprofile               0.9                     # oprofiled --version
o  udev                   081                     # udevd --version
o  grub                   0.93                    # grub --version || grub-install --version
o  mcelog                 0.6                     # mcelog --version
o  iptables               1.4.2                   # iptables -V


Kernel compilation
==================

GCC
---

The gcc version requirements may vary depending on the type of CPU in your
computer.

Make
----

You will need Gnu make 3.80 or later to build the kernel.

Binutils
--------

Linux on IA-32 has recently switched from using as86 to using gas for
assembling the 16-bit boot code, removing the need for as86 to compile
your kernel.  This change does, however, mean that you need a recent
release of binutils.

Perl
----

You will need perl 5 and the following modules: Getopt::Long, Getopt::Std,
File::Basename, and File::Find to build the kernel.


System utilities
================

Architectural changes
---------------------

DevFS has been obsoleted in favour of udev
(http://www.kernel.org/pub/linux/utils/kernel/hotplug/)

32-bit UID support is now in place.  Have fun!

Linux documentation for functions is transitioning to inline
documentation via specially-formatted comments near their
definitions in the source.  These comments can be combined with the
SGML templates in the Documentation/DocBook directory to make DocBook
files, which can then be converted by DocBook stylesheets to PostScript,
HTML, PDF files, and several other formats.  In order to convert from
DocBook format to a format of your choice, you'll need to install Jade as
well as the desired DocBook stylesheets.

Util-linux
----------

New versions of util-linux provide *fdisk support for larger disks,
support new options to mount, recognize more supported partition
types, have a fdformat which works with 2.4 kernels, and similar goodies.
You'll probably want to upgrade.

Ksymoops
--------

If the unthinkable happens and your kernel oopses, you may need the
ksymoops tool to decode it, but in most cases you don't.
In the 2.6 kernel it is generally preferred to build the kernel with
CONFIG_KALLSYMS so that it produces readable dumps that can be used as-is
(this also produces better output than ksymoops).
If for some reason your kernel is not build with CONFIG_KALLSYMS and
you have no way to rebuild and reproduce the Oops with that option, then
you can still decode that Oops with ksymoops.

Module-Init-Tools
-----------------

A new module loader is now in the kernel that requires module-init-tools
to use.  It is backward compatible with the 2.4.x series kernels.

Mkinitrd
--------

These changes to the /lib/modules file tree layout also require that
mkinitrd be upgraded.

E2fsprogs
---------

The latest version of e2fsprogs fixes several bugs in fsck and
debugfs.  Obviously, it's a good idea to upgrade.

JFSutils
--------

The jfsutils package contains the utilities for the file system.
The following utilities are available:
o fsck.jfs - initiate replay of the transaction log, and check
  and repair a JFS formatted partition.
o mkfs.jfs - create a JFS formatted partition.
o other file system utilities are also available in this package.

Reiserfsprogs
-------------

The reiserfsprogs package should be used for reiserfs-3.6.x
(Linux kernels 2.4.x). It is a combined package and contains working
versions of mkreiserfs, resize_reiserfs, debugreiserfs and
reiserfsck. These utils work on both i386 and alpha platforms.

Xfsprogs
--------

The latest version of xfsprogs contains mkfs.xfs, xfs_db, and the
xfs_repair utilities, among others, for the XFS filesystem.  It is
architecture independent and any version from 2.0.0 onward should
work correctly with this version of the XFS kernel code (2.6.0 or
later is recommended, due to some significant improvements).

PCMCIAutils
-----------

PCMCIAutils replaces pcmcia-cs (see below). It properly sets up
PCMCIA sockets at system startup and loads the appropriate modules
for 16-bit PCMCIA devices if the kernel is modularized and the hotplug
subsystem is used.

Pcmcia-cs
---------

PCMCIA (PC Card) support is now partially implemented in the main
kernel source. The "pcmciautils" package (see above) replaces pcmcia-cs
for newest kernels.

Quota-tools
-----------

Support for 32 bit uid's and gid's is required if you want to use
the newer version 2 quota format.  Quota-tools version 3.07 and
newer has this support.  Use the recommended version or newer
from the table above.

Intel IA32 microcode
--------------------

A driver has been added to allow updating of Intel IA32 microcode,
accessible as a normal (misc) character device.  If you are not using
udev you may need to:

mkdir /dev/cpu
mknod /dev/cpu/microcode c 10 184
chmod 0644 /dev/cpu/microcode

as root before you can use this.  You'll probably also want to
get the user-space microcode_ctl utility to use with this.

Powertweak
----------

If you are running v0.1.17 or earlier, you should upgrade to
version v0.99.0 or higher. Running old versions may cause problems
with programs using shared memory.

udev
----
udev is a userspace application for populating /dev dynamically with
only entries for devices actually present.  udev replaces the basic
functionality of devfs, while allowing persistent device naming for
devices.

FUSE
----

Needs libfuse 2.4.0 or later.  Absolute minimum is 2.3.0 but mount
options 'direct_io' and 'kernel_cache' won't work.

Networking
==========

General changes
---------------

If you have advanced network configuration needs, you should probably
consider using the network tools from ip-route2.

Packet Filter / NAT
-------------------
The packet filtering and NAT code uses the same tools like the previous 2.4.x
kernel series (iptables).  It still includes backwards-compatibility modules
for 2.2.x-style ipchains and 2.0.x-style ipfwadm.

PPP
---

The PPP driver has been restructured to support multilink and to
enable it to operate over diverse media layers.  If you use PPP,
upgrade pppd to at least 2.4.0.

If you are not using udev, you must have the device file /dev/ppp
which can be made by:

mknod /dev/ppp c 108 0

as root.

Isdn4k-utils
------------

Due to changes in the length of the phone number field, isdn4k-utils
needs to be recompiled or (preferably) upgraded.

NFS-utils
---------

In 2.4 and earlier kernels, the nfs server needed to know about any
client that expected to be able to access files via NFS.  This
information would be given to the kernel by "mountd" when the client
mounted the filesystem, or by "exportfs" at system startup.  exportfs
would take information about active clients from /var/lib/nfs/rmtab.

This approach is quite fragile as it depends on rmtab being correct
which is not always easy, particularly when trying to implement
fail-over.  Even when the system is working well, rmtab suffers from
getting lots of old entries that never get removed.

With 2.6 we have the option of having the kernel tell mountd when it
gets a request from an unknown host, and mountd can give appropriate
export information to the kernel.  This removes the dependency on
rmtab and means that the kernel only needs to know about currently
active clients.

To enable this new functionality, you need to:

  mount -t nfsd nfsd /proc/fs/nfsd

before running exportfs or mountd.  It is recommended that all NFS
services be protected from the internet-at-large by a firewall where
that is possible.

mcelog
------

In Linux 2.6.31+ the i386 kernel needs to run the mcelog utility
as a regular cronjob similar to the x86-64 kernel to process and log
machine check events when CONFIG_X86_NEW_MCE is enabled. Machine check
events are errors reported by the CPU. Processing them is strongly encouraged.
All x86-64 kernels since 2.6.4 require the mcelog utility to
process machine checks.

Getting updated software
========================

Kernel compilation
******************

gcc
---
o  <ftp://ftp.gnu.org/gnu/gcc/>

Make
----
o  <ftp://ftp.gnu.org/gnu/make/>

Binutils
--------
o  <ftp://ftp.kernel.org/pub/linux/devel/binutils/>

System utilities
****************

Util-linux
----------
o  <ftp://ftp.kernel.org/pub/linux/utils/util-linux/>

Ksymoops
--------
o  <ftp://ftp.kernel.org/pub/linux/utils/kernel/ksymoops/v2.4/>

Module-Init-Tools
-----------------
o  <ftp://ftp.kernel.org/pub/linux/kernel/people/rusty/modules/>

Mkinitrd
--------
o  <https://code.launchpad.net/initrd-tools/main>

E2fsprogs
---------
o  <http://prdownloads.sourceforge.net/e2fsprogs/e2fsprogs-1.29.tar.gz>

JFSutils
--------
o  <http://jfs.sourceforge.net/>

Reiserfsprogs
-------------
o  <http://www.kernel.org/pub/linux/utils/fs/reiserfs/>

Xfsprogs
--------
o  <ftp://oss.sgi.com/projects/xfs/>

Pcmciautils
-----------
o  <ftp://ftp.kernel.org/pub/linux/utils/kernel/pcmcia/>

Pcmcia-cs
---------
o  <http://pcmcia-cs.sourceforge.net/>

Quota-tools
----------
o  <http://sourceforge.net/projects/linuxquota/>

DocBook Stylesheets
-------------------
o  <http://nwalsh.com/docbook/dsssl/>

XMLTO XSLT Frontend
-------------------
o  <http://cyberelk.net/tim/xmlto/>

Intel P6 microcode
------------------
o  <http://www.urbanmyth.org/microcode/>

Powertweak
----------
o  <http://powertweak.sourceforge.net/>

udev
----
o <http://www.kernel.org/pub/linux/utils/kernel/hotplug/udev.html>

FUSE
----
o <http://sourceforge.net/projects/fuse>

mcelog
------
o <ftp://ftp.kernel.org/pub/linux/utils/cpu/mce/>

Networking
**********

PPP
---
o  <ftp://ftp.samba.org/pub/ppp/>

Isdn4k-utils
------------
o  <ftp://ftp.isdn4linux.de/pub/isdn4linux/utils/>

NFS-utils
---------
o  <http://sourceforge.net/project/showfiles.php?group_id=14>

Iptables
--------
o  <http://www.iptables.org/downloads.html>

Ip-route2
---------
o  <ftp://ftp.tux.org/pub/net/ip-routing/iproute2-2.2.4-now-ss991023.tar.gz>

OProfile
--------
o  <http://oprofile.sf.net/download/>

NFS-Utils
---------
o  <http://nfs.sourceforge.net/>

back to top