https://github.com/torvalds/linux
Revision 2b4f3b4987b56365b981f44a7e843efa5b6619b9 authored by Suren Baghdasaryan on 06 July 2023, 01:13:59 UTC, committed by Andrew Morton on 08 July 2023, 16:29:29 UTC
Patch series "Avoid memory corruption caused by per-VMA locks", v4.

A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86.  Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking while
forking a child process.

Patch 1/2 in the series implements proper VMA locking during fork.  I
tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.

This fix can potentially regress some fork-heavy workloads.  Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression.  If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
optimizations are possible if this regression proves to be problematic.

Patch 2/2 disables per-VMA locks until the fix is tested and verified.


This patch (of 2):

When forking a child process, parent write-protects an anonymous page and
COW-shares it with the child being forked using copy_present_pte(). 
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap().  If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to some
stale writable TLB entries targeting the wrong (old) page.  Similar issue
happened in the past with userfaultfd (see flush_tlb_page() call inside
do_wp_page()).

Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue.  This
fix can potentially regress some fork-heavy workloads.  Kernel build time
did not show noticeable regression on a 56-core machine while a stress
test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~7%
regression.  If such fork time regression is unacceptable, disabling
CONFIG_PER_VMA_LOCK should restore its performance.  Further optimizations
are possible if this regression proves to be problematic.

Link: https://lkml.kernel.org/r/20230706011400.2949242-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-2-surenb@google.com
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D217624
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 3fbff91
History
Tip revision: 2b4f3b4987b56365b981f44a7e843efa5b6619b9 authored by Suren Baghdasaryan on 06 July 2023, 01:13:59 UTC
fork: lock VMAs of the parent process when forking
Tip revision: 2b4f3b4
File Mode Size
Documentation
LICENSES
arch
block
certs
crypto
drivers
fs
include
init
io_uring
ipc
kernel
lib
mm
net
rust
samples
scripts
security
sound
tools
usr
virt
.clang-format -rw-r--r-- 20.1 KB
.cocciconfig -rw-r--r-- 59 bytes
.get_maintainer.ignore -rw-r--r-- 151 bytes
.gitattributes -rw-r--r-- 105 bytes
.gitignore -rw-r--r-- 2.0 KB
.mailmap -rw-r--r-- 28.2 KB
.rustfmt.toml -rw-r--r-- 369 bytes
COPYING -rw-r--r-- 496 bytes
CREDITS -rw-r--r-- 100.3 KB
Kbuild -rw-r--r-- 2.5 KB
Kconfig -rw-r--r-- 555 bytes
MAINTAINERS -rw-r--r-- 699.8 KB
Makefile -rw-r--r-- 69.9 KB
README -rw-r--r-- 727 bytes

README

back to top