https://github.com/torvalds/linux
Revision 0df9dab891ff0d9b646d82e4fe038229e4c02451 authored by Sean Christopherson on 16 September 2023, 00:39:15 UTC, committed by Paolo Bonzini on 23 September 2023, 09:35:48 UTC
Stop zapping invalidate TDP MMU roots via work queue now that KVM
preserves TDP MMU roots until they are explicitly invalidated.  Zapping
roots asynchronously was effectively a workaround to avoid stalling a vCPU
for an extended during if a vCPU unloaded a root, which at the time
happened whenever the guest toggled CR0.WP (a frequent operation for some
guest kernels).

While a clever hack, zapping roots via an unbound worker had subtle,
unintended consequences on host scheduling, especially when zapping
multiple roots, e.g. as part of a memslot.  Because the work of zapping a
root is no longer bound to the task that initiated the zap, things like
the CPU affinity and priority of the original task get lost.  Losing the
affinity and priority can be especially problematic if unbound workqueues
aren't affined to a small number of CPUs, as zapping multiple roots can
cause KVM to heavily utilize the majority of CPUs in the system, *beyond*
the CPUs KVM is already using to run vCPUs.

When deleting a memslot via KVM_SET_USER_MEMORY_REGION, the async root
zap can result in KVM occupying all logical CPUs for ~8ms, and result in
high priority tasks not being scheduled in in a timely manner.  In v5.15,
which doesn't preserve unloaded roots, the issues were even more noticeable
as KVM would zap roots more frequently and could occupy all CPUs for 50ms+.

Consuming all CPUs for an extended duration can lead to significant jitter
throughout the system, e.g. on ChromeOS with virtio-gpu, deleting memslots
is a semi-frequent operation as memslots are deleted and recreated with
different host virtual addresses to react to host GPU drivers allocating
and freeing GPU blobs.  On ChromeOS, the jitter manifests as audio blips
during games due to the audio server's tasks not getting scheduled in
promptly, despite the tasks having a high realtime priority.

Deleting memslots isn't exactly a fast path and should be avoided when
possible, and ChromeOS is working towards utilizing MAP_FIXED to avoid the
memslot shenanigans, but KVM is squarely in the wrong.  Not to mention
that removing the async zapping eliminates a non-trivial amount of
complexity.

Note, one of the subtle behaviors hidden behind the async zapping is that
KVM would zap invalidated roots only once (ignoring partial zaps from
things like mmu_notifier events).  Preserve this behavior by adding a flag
to identify roots that are scheduled to be zapped versus roots that have
already been zapped but not yet freed.

Add a comment calling out why kvm_tdp_mmu_invalidate_all_roots() can
encounter invalid roots, as it's not at all obvious why zapping
invalidated roots shouldn't simply zap all invalid roots.

Reported-by: Pattara Teerapong <pteerapong@google.com>
Cc: David Stevens <stevensd@google.com>
Cc: Yiwei Zhang<zzyiwei@google.com>
Cc: Paul Hsia <paulhsia@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230916003916.2545000-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
1 parent 441a5df
History
Tip revision: 0df9dab891ff0d9b646d82e4fe038229e4c02451 authored by Sean Christopherson on 16 September 2023, 00:39:15 UTC
KVM: x86/mmu: Stop zapping invalidated TDP MMU roots asynchronously
Tip revision: 0df9dab
File Mode Size
Documentation
LICENSES
arch
block
certs
crypto
drivers
fs
include
init
io_uring
ipc
kernel
lib
mm
net
rust
samples
scripts
security
sound
tools
usr
virt
.clang-format -rw-r--r-- 20.1 KB
.cocciconfig -rw-r--r-- 59 bytes
.get_maintainer.ignore -rw-r--r-- 151 bytes
.gitattributes -rw-r--r-- 105 bytes
.gitignore -rw-r--r-- 2.0 KB
.mailmap -rw-r--r-- 35.6 KB
.rustfmt.toml -rw-r--r-- 369 bytes
COPYING -rw-r--r-- 496 bytes
CREDITS -rw-r--r-- 100.0 KB
Kbuild -rw-r--r-- 2.5 KB
Kconfig -rw-r--r-- 555 bytes
MAINTAINERS -rw-r--r-- 709.5 KB
Makefile -rw-r--r-- 65.9 KB
README -rw-r--r-- 727 bytes

README

back to top