swh:1:snp:c2847dfd741eae21606027cf29250d1ebcd63fb4
Revision a0582f26ec9dfd5360ea2f35dd9a1b026f8adda0 authored by Pavel Tatashin on 31 May 2017, 15:25:24 UTC, committed by David S. Miller on 06 June 2017, 20:45:29 UTC
The current wrap implementation has a race issue: it is called outside of
the ctx_alloc_lock, and also does not wait for all CPUs to complete the
wrap.  This means that a thread can get a new context with a new version
and another thread might still be running with the same context. The
problem is especially severe on CPUs with shared TLBs, like sun4v. I used
the following test to very quickly reproduce the problem:
- start over 8K processes (must be more than context IDs)
- write and read values at a  memory location in every process.

Very quickly memory corruptions start happening, and what we read back
does not equal what we wrote.

Several approaches were explored before settling on this one:

Approach 1:
Move smp_new_mmu_context_version() inside ctx_alloc_lock, and wait for
every process to complete the wrap. (Note: every CPU must WAIT before
leaving smp_new_mmu_context_version_client() until every one arrives).

This approach ends up with deadlocks, as some threads own locks which other
threads are waiting for, and they never receive softint until these threads
exit smp_new_mmu_context_version_client(). Since we do not allow the exit,
deadlock happens.

Approach 2:
Handle wrap right during mondo interrupt. Use etrap/rtrap to enter into
into C code, and issue new versions to every CPU.
This approach adds some overhead to runtime: in switch_mm() we must add
some checks to make sure that versions have not changed due to wrap while
we were loading the new secondary context. (could be protected by PSTATE_IE
but that degrades performance as on M7 and older CPUs as it takes 50 cycles
for each access). Also, we still need a global per-cpu array of MMs to know
where we need to load new contexts, otherwise we can change context to a
thread that is going way (if we received mondo between switch_mm() and
switch_to() time). Finally, there are some issues with window registers in
rtrap() when context IDs are changed during CPU mondo time.

The approach in this patch is the simplest and has almost no impact on
runtime.  We use the array with mm's where last secondary contexts were
loaded onto CPUs and bump their versions to the new generation without
changing context IDs. If a new process comes in to get a context ID, it
will go through get_new_mmu_context() because of version mismatch. But the
running processes do not need to be interrupted. And wrap is quicker as we
do not need to xcall and wait for everyone to receive and complete wrap.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 7a5b4bb
History
Tip revision: 53d5fc89d66a778577295020dc57bb3ccec84354 authored by Linus Torvalds on 01 October 2021, 21:45:23 UTC
Merge tag 's390-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Tip revision: 53d5fc8
File Mode Size
Documentation
arch
block
certs
crypto
drivers
firmware
fs
include
init
ipc
kernel
lib
mm
net
samples
scripts
security
sound
tools
usr
virt
.cocciconfig -rw-r--r-- 59 bytes
.get_maintainer.ignore -rw-r--r-- 31 bytes
.gitattributes -rw-r--r-- 30 bytes
.gitignore -rw-r--r-- 1.3 KB
.mailmap -rw-r--r-- 8.1 KB
COPYING -rw-r--r-- 18.3 KB
CREDITS -rw-r--r-- 96.2 KB
Kbuild -rw-r--r-- 2.2 KB
Kconfig -rw-r--r-- 252 bytes
MAINTAINERS -rw-r--r-- 397.7 KB
Makefile -rw-r--r-- 58.6 KB
README -rw-r--r-- 722 bytes

README

back to top