Revision 6e2df0581f569038719cf2bc2b3baa3fcc83cab4 authored by Peter Zijlstra on 08 November 2019, 10:11:52 UTC, committed by Peter Zijlstra on 08 November 2019, 21:34:14 UTC
Commit 67692435c411 ("sched: Rework pick_next_task() slow-path")
inadvertly introduced a race because it changed a previously
unexplored dependency between dropping the rq->lock and
sched_class::put_prev_task().

The comments about dropping rq->lock, in for example
newidle_balance(), only mentions the task being current and ->on_cpu
being set. But when we look at the 'change' pattern (in for example
sched_setnuma()):

	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq->curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

It becomes obvious that if we do this after put_prev_task() has
already been called on @p, things go sideways. This is exactly what
the commit in question allows to happen when it does:

	prev->sched_class->put_prev_task(rq, prev, rf);
	if (!rq->nr_running)
		newidle_balance(rq, rf);

The newidle_balance() call will drop rq->lock after we've called
put_prev_task() and that allows the above 'change' pattern to
interleave and mess up the state.

Furthermore, it turns out we lost the RT-pull when we put the last DL
task.

Fix both problems by extracting the balancing from put_prev_task() and
doing a multi-class balance() pass before put_prev_task().

Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Quentin Perret <qperret@google.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
1 parent e3b8b6a
History
File Mode Size
9p
adfs
affs
afs
autofs
befs
bfs
btrfs
cachefiles
ceph
cifs
coda
configfs
cramfs
crypto
debugfs
devpts
dlm
ecryptfs
efivarfs
efs
erofs
exportfs
ext2
ext4
f2fs
fat
freevxfs
fscache
fuse
gfs2
hfs
hfsplus
hostfs
hpfs
hugetlbfs
iomap
isofs
jbd2
jffs2
jfs
kernfs
lockd
minix
nfs
nfs_common
nfsd
nilfs2
nls
notify
ntfs
ocfs2
omfs
openpromfs
orangefs
overlayfs
proc
pstore
qnx4
qnx6
quota
ramfs
reiserfs
romfs
squashfs
sysfs
sysv
tracefs
ubifs
udf
ufs
unicode
verity
xfs
Kconfig -rw-r--r-- 7.6 KB
Kconfig.binfmt -rw-r--r-- 7.6 KB
Makefile -rw-r--r-- 4.4 KB
aio.c -rw-r--r-- 56.0 KB
anon_inodes.c -rw-r--r-- 4.6 KB
attr.c -rw-r--r-- 9.6 KB
bad_inode.c -rw-r--r-- 5.3 KB
binfmt_aout.c -rw-r--r-- 8.3 KB
binfmt_elf.c -rw-r--r-- 63.0 KB
binfmt_elf_fdpic.c -rw-r--r-- 47.1 KB
binfmt_em86.c -rw-r--r-- 2.8 KB
binfmt_flat.c -rw-r--r-- 28.0 KB
binfmt_misc.c -rw-r--r-- 18.5 KB
binfmt_script.c -rw-r--r-- 4.4 KB
block_dev.c -rw-r--r-- 55.9 KB
buffer.c -rw-r--r-- 90.2 KB
char_dev.c -rw-r--r-- 16.5 KB
compat.c -rw-r--r-- 3.2 KB
compat_binfmt_elf.c -rw-r--r-- 3.2 KB
compat_ioctl.c -rw-r--r-- 31.0 KB
coredump.c -rw-r--r-- 22.1 KB
d_path.c -rw-r--r-- 11.3 KB
dax.c -rw-r--r-- 45.9 KB
dcache.c -rw-r--r-- 83.9 KB
dcookies.c -rw-r--r-- 7.1 KB
direct-io.c -rw-r--r-- 40.8 KB
drop_caches.c -rw-r--r-- 1.8 KB
eventfd.c -rw-r--r-- 11.1 KB
eventpoll.c -rw-r--r-- 64.5 KB
exec.c -rw-r--r-- 46.9 KB
fcntl.c -rw-r--r-- 23.3 KB
fhandle.c -rw-r--r-- 6.8 KB
file.c -rw-r--r-- 24.2 KB
file_table.c -rw-r--r-- 10.2 KB
filesystems.c -rw-r--r-- 6.4 KB
fs-writeback.c -rw-r--r-- 74.4 KB
fs_context.c -rw-r--r-- 18.1 KB
fs_parser.c -rw-r--r-- 11.0 KB
fs_pin.c -rw-r--r-- 1.9 KB
fs_struct.c -rw-r--r-- 3.3 KB
fs_types.c -rw-r--r-- 2.5 KB
fsopen.c -rw-r--r-- 11.2 KB
inode.c -rw-r--r-- 60.7 KB
internal.h -rw-r--r-- 5.1 KB
io_uring.c -rw-r--r-- 95.7 KB
ioctl.c -rw-r--r-- 17.7 KB
libfs.c -rw-r--r-- 32.9 KB
locks.c -rw-r--r-- 78.9 KB
mbcache.c -rw-r--r-- 12.0 KB
mount.h -rw-r--r-- 4.0 KB
mpage.c -rw-r--r-- 21.1 KB
namei.c -rw-r--r-- 122.8 KB
namespace.c -rw-r--r-- 97.0 KB
no-block.c -rw-r--r-- 478 bytes
nsfs.c -rw-r--r-- 6.1 KB
open.c -rw-r--r-- 30.2 KB
pipe.c -rw-r--r-- 27.7 KB
pnode.c -rw-r--r-- 15.1 KB
pnode.h -rw-r--r-- 1.9 KB
posix_acl.c -rw-r--r-- 21.5 KB
proc_namespace.c -rw-r--r-- 7.8 KB
read_write.c -rw-r--r-- 51.6 KB
readdir.c -rw-r--r-- 13.3 KB
select.c -rw-r--r-- 34.2 KB
seq_file.c -rw-r--r-- 24.7 KB
signalfd.c -rw-r--r-- 9.0 KB
splice.c -rw-r--r-- 40.2 KB
stack.c -rw-r--r-- 2.5 KB
stat.c -rw-r--r-- 19.4 KB
statfs.c -rw-r--r-- 9.6 KB
super.c -rw-r--r-- 47.9 KB
sync.c -rw-r--r-- 10.4 KB
timerfd.c -rw-r--r-- 13.5 KB
userfaultfd.c -rw-r--r-- 51.2 KB
utimes.c -rw-r--r-- 7.3 KB
xattr.c -rw-r--r-- 23.5 KB

back to top