Revision 6e2df0581f569038719cf2bc2b3baa3fcc83cab4 authored by Peter Zijlstra on 08 November 2019, 10:11:52 UTC, committed by Peter Zijlstra on 08 November 2019, 21:34:14 UTC
Commit 67692435c411 ("sched: Rework pick_next_task() slow-path")
inadvertly introduced a race because it changed a previously
unexplored dependency between dropping the rq->lock and
sched_class::put_prev_task().

The comments about dropping rq->lock, in for example
newidle_balance(), only mentions the task being current and ->on_cpu
being set. But when we look at the 'change' pattern (in for example
sched_setnuma()):

	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq->curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

It becomes obvious that if we do this after put_prev_task() has
already been called on @p, things go sideways. This is exactly what
the commit in question allows to happen when it does:

	prev->sched_class->put_prev_task(rq, prev, rf);
	if (!rq->nr_running)
		newidle_balance(rq, rf);

The newidle_balance() call will drop rq->lock after we've called
put_prev_task() and that allows the above 'change' pattern to
interleave and mess up the state.

Furthermore, it turns out we lost the RT-pull when we put the last DL
task.

Fix both problems by extracting the balancing from put_prev_task() and
doing a multi-class balance() pass before put_prev_task().

Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Quentin Perret <qperret@google.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
1 parent e3b8b6a
History
File Mode Size
ABI
EDID
PCI
RCU
accounting
admin-guide
arm
arm64
block
bpf
cdrom
core-api
cpu-freq
crypto
dev-tools
devicetree
doc-guide
driver-api
fault-injection
fb
features
filesystems
firmware-guide
firmware_class
fpga
gpu
hid
hwmon
i2c
ia64
ide
iio
infiniband
input
ioctl
isdn
kbuild
kernel-hacking
leds
livepatch
locking
m68k
maintainer
media
mic
mips
misc-devices
netlabel
networking
nios2
openrisc
parisc
pcmcia
power
powerpc
process
riscv
s390
scheduler
scsi
security
sh
sound
sparc
sphinx
sphinx-static
spi
target
timers
trace
translations
usb
userspace-api
virt
virtual
vm
w1
watchdog
x86
xtensa
.gitignore -rw-r--r-- 13 bytes
COPYING-logo -rw-r--r-- 563 bytes
Changes l--------- 19 bytes
CodingStyle -rw-r--r-- 48 bytes
DMA-API-HOWTO.txt -rw-r--r-- 32.8 KB
DMA-API.txt -rw-r--r-- 27.3 KB
DMA-ISA-LPC.txt -rw-r--r-- 5.1 KB
DMA-attributes.txt -rw-r--r-- 6.9 KB
IPMI.txt -rw-r--r-- 29.7 KB
IRQ-affinity.txt -rw-r--r-- 2.5 KB
IRQ-domain.txt -rw-r--r-- 10.9 KB
IRQ.txt -rw-r--r-- 994 bytes
Kconfig -rw-r--r-- 360 bytes
Makefile -rw-r--r-- 5.3 KB
SubmittingPatches -rw-r--r-- 54 bytes
atomic_bitops.txt -rw-r--r-- 1.5 KB
atomic_t.txt -rw-r--r-- 6.9 KB
bus-virt-phys-mapping.txt -rw-r--r-- 8.0 KB
conf.py -rw-r--r-- 20.3 KB
crc32.txt -rw-r--r-- 8.6 KB
debugging-modules.txt -rw-r--r-- 954 bytes
debugging-via-ohci1394.txt -rw-r--r-- 7.5 KB
digsig.txt -rw-r--r-- 3.0 KB
docutils.conf -rw-r--r-- 159 bytes
dontdiff -rw-r--r-- 2.6 KB
futex-requeue-pi.txt -rw-r--r-- 5.1 KB
hwspinlock.txt -rw-r--r-- 15.1 KB
index.rst -rw-r--r-- 3.9 KB
io-mapping.txt -rw-r--r-- 3.3 KB
io_ordering.txt -rw-r--r-- 2.0 KB
irqflags-tracing.txt -rw-r--r-- 2.3 KB
kobject.txt -rw-r--r-- 18.5 KB
kprobes.txt -rw-r--r-- 30.3 KB
kref.txt -rw-r--r-- 8.9 KB
logo.gif -rw-r--r-- 16.0 KB
lzo.txt -rw-r--r-- 9.1 KB
mailbox.txt -rw-r--r-- 4.4 KB
memory-barriers.txt -rw-r--r-- 114.4 KB
nommu-mmap.txt -rw-r--r-- 12.4 KB
padata.txt -rw-r--r-- 7.3 KB
percpu-rw-semaphore.txt -rw-r--r-- 1.1 KB
pi-futex.txt -rw-r--r-- 5.7 KB
preempt-locking.txt -rw-r--r-- 5.5 KB
rbtree.txt -rw-r--r-- 14.8 KB
remoteproc.txt -rw-r--r-- 12.8 KB
robust-futex-ABI.txt -rw-r--r-- 8.7 KB
robust-futexes.txt -rw-r--r-- 9.5 KB
rpmsg.txt -rw-r--r-- 13.1 KB
speculation.txt -rw-r--r-- 2.8 KB
static-keys.txt -rw-r--r-- 13.0 KB
tee.txt -rw-r--r-- 5.2 KB
this_cpu_ops.txt -rw-r--r-- 11.2 KB
unaligned-memory-access.txt -rw-r--r-- 10.4 KB
xz.txt -rw-r--r-- 5.5 KB

back to top