https://github.com/torvalds/linux
Revision 8e205f779d1443a94b5ae81aa359cb535dd3021e authored by Hugh Dickins on 23 July 2014, 21:00:10 UTC, committed by Linus Torvalds on 23 July 2014, 22:10:54 UTC
Commit f00cdc6df7d7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).

We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.

So return to the original umbrella approach, but keep away from i_mutex
this time.  We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.

This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.

This issue has been tagged with CVE-2014-4171, which will require commit
f00cdc6df7d7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org>	[3.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent c118678
History
Tip revision: 8e205f779d1443a94b5ae81aa359cb535dd3021e authored by Hugh Dickins on 23 July 2014, 21:00:10 UTC
shmem: fix faulting into a hole, not taking i_mutex
Tip revision: 8e205f7
File Mode Size
partitions
Kconfig -rw-r--r-- 3.6 KB
Kconfig.iosched -rw-r--r-- 1.6 KB
Makefile -rw-r--r-- 948 bytes
bio-integrity.c -rw-r--r-- 17.5 KB
bio.c -rw-r--r-- 49.5 KB
blk-cgroup.c -rw-r--r-- 29.5 KB
blk-cgroup.h -rw-r--r-- 16.7 KB
blk-core.c -rw-r--r-- 89.1 KB
blk-exec.c -rw-r--r-- 3.4 KB
blk-flush.c -rw-r--r-- 13.6 KB
blk-integrity.c -rw-r--r-- 11.7 KB
blk-ioc.c -rw-r--r-- 10.2 KB
blk-iopoll.c -rw-r--r-- 5.8 KB
blk-lib.c -rw-r--r-- 7.3 KB
blk-map.c -rw-r--r-- 8.2 KB
blk-merge.c -rw-r--r-- 14.6 KB
blk-mq-cpu.c -rw-r--r-- 1.6 KB
blk-mq-cpumap.c -rw-r--r-- 2.5 KB
blk-mq-sysfs.c -rw-r--r-- 10.5 KB
blk-mq-tag.c -rw-r--r-- 13.7 KB
blk-mq-tag.h -rw-r--r-- 2.0 KB
blk-mq.c -rw-r--r-- 46.9 KB
blk-mq.h -rw-r--r-- 3.1 KB
blk-settings.c -rw-r--r-- 27.2 KB
blk-softirq.c -rw-r--r-- 4.4 KB
blk-sysfs.c -rw-r--r-- 15.1 KB
blk-tag.c -rw-r--r-- 9.4 KB
blk-throttle.c -rw-r--r-- 45.8 KB
blk-timeout.c -rw-r--r-- 5.6 KB
blk.h -rw-r--r-- 7.9 KB
bounce.c -rw-r--r-- 6.5 KB
bsg-lib.c -rw-r--r-- 6.0 KB
bsg.c -rw-r--r-- 23.6 KB
cfq-iosched.c -rw-r--r-- 119.7 KB
cmdline-parser.c -rw-r--r-- 4.9 KB
compat_ioctl.c -rw-r--r-- 20.8 KB
deadline-iosched.c -rw-r--r-- 11.3 KB
elevator.c -rw-r--r-- 23.7 KB
genhd.c -rw-r--r-- 44.1 KB
ioctl.c -rw-r--r-- 10.7 KB
ioprio.c -rw-r--r-- 5.0 KB
noop-iosched.c -rw-r--r-- 2.7 KB
partition-generic.c -rw-r--r-- 14.0 KB
scsi_ioctl.c -rw-r--r-- 19.1 KB

back to top