https://github.com/torvalds/linux
Revision 9a6e7c39810e4a8bc7fc95056cefb40583fe07ef authored by Wanpeng Li on 14 September 2017, 10:54:16 UTC, committed by Radim Krčmář on 14 September 2017, 16:43:43 UTC
qemu-system-x86-8600  [004] d..1  7205.687530: kvm_entry: vcpu 2
qemu-system-x86-8600  [004] ....  7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e
qemu-system-x86-8600  [004] ....  7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0
qemu-system-x86-8600  [004] ....  7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
qemu-system-x86-8600  [004] .N..  7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
    kworker/4:2-7814  [004] ....  7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
qemu-system-x86-8600  [004] ....  7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
qemu-system-x86-8600  [004] d..1  7205.687711: kvm_entry: vcpu 2

After running some memory intensive workload in guest, I catch the kworker
which completes the GUP too quickly, and queues an "Page Ready" #PF exception
after the "Page not Present" exception before the next vmentry as the above
trace which will result in #DF injected to guest.

This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
occurs before the next vmentry since the GUP has already got the required page
and shadow page table has already been fixed by "Page Ready" handler.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Fixes: 7c90705bf2a3 ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
[Changed indentation and added clearing of injected. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
1 parent 648d453
Raw File
Tip revision: 9a6e7c39810e4a8bc7fc95056cefb40583fe07ef authored by Wanpeng Li on 14 September 2017, 10:54:16 UTC
KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously
Tip revision: 9a6e7c3
stack.c
#include <linux/export.h>
#include <linux/fs.h>
#include <linux/fs_stack.h>

/* does _NOT_ require i_mutex to be held.
 *
 * This function cannot be inlined since i_size_{read,write} is rather
 * heavy-weight on 32-bit systems
 */
void fsstack_copy_inode_size(struct inode *dst, struct inode *src)
{
	loff_t i_size;
	blkcnt_t i_blocks;

	/*
	 * i_size_read() includes its own seqlocking and protection from
	 * preemption (see include/linux/fs.h): we need nothing extra for
	 * that here, and prefer to avoid nesting locks than attempt to keep
	 * i_size and i_blocks in sync together.
	 */
	i_size = i_size_read(src);

	/*
	 * But if CONFIG_LBDAF (on 32-bit), we ought to make an effort to
	 * keep the two halves of i_blocks in sync despite SMP or PREEMPT -
	 * though stat's generic_fillattr() doesn't bother, and we won't be
	 * applying quotas (where i_blocks does become important) at the
	 * upper level.
	 *
	 * We don't actually know what locking is used at the lower level;
	 * but if it's a filesystem that supports quotas, it will be using
	 * i_lock as in inode_add_bytes().
	 */
	if (sizeof(i_blocks) > sizeof(long))
		spin_lock(&src->i_lock);
	i_blocks = src->i_blocks;
	if (sizeof(i_blocks) > sizeof(long))
		spin_unlock(&src->i_lock);

	/*
	 * If CONFIG_SMP or CONFIG_PREEMPT on 32-bit, it's vital for
	 * fsstack_copy_inode_size() to hold some lock around
	 * i_size_write(), otherwise i_size_read() may spin forever (see
	 * include/linux/fs.h).  We don't necessarily hold i_mutex when this
	 * is called, so take i_lock for that case.
	 *
	 * And if CONFIG_LBDAF (on 32-bit), continue our effort to keep the
	 * two halves of i_blocks in sync despite SMP or PREEMPT: use i_lock
	 * for that case too, and do both at once by combining the tests.
	 *
	 * There is none of this locking overhead in the 64-bit case.
	 */
	if (sizeof(i_size) > sizeof(long) || sizeof(i_blocks) > sizeof(long))
		spin_lock(&dst->i_lock);
	i_size_write(dst, i_size);
	dst->i_blocks = i_blocks;
	if (sizeof(i_size) > sizeof(long) || sizeof(i_blocks) > sizeof(long))
		spin_unlock(&dst->i_lock);
}
EXPORT_SYMBOL_GPL(fsstack_copy_inode_size);

/* copy all attributes */
void fsstack_copy_attr_all(struct inode *dest, const struct inode *src)
{
	dest->i_mode = src->i_mode;
	dest->i_uid = src->i_uid;
	dest->i_gid = src->i_gid;
	dest->i_rdev = src->i_rdev;
	dest->i_atime = src->i_atime;
	dest->i_mtime = src->i_mtime;
	dest->i_ctime = src->i_ctime;
	dest->i_blkbits = src->i_blkbits;
	dest->i_flags = src->i_flags;
	set_nlink(dest, src->i_nlink);
}
EXPORT_SYMBOL_GPL(fsstack_copy_attr_all);
back to top