Revision 6e2df0581f569038719cf2bc2b3baa3fcc83cab4 authored by Peter Zijlstra on 08 November 2019, 10:11:52 UTC, committed by Peter Zijlstra on 08 November 2019, 21:34:14 UTC
Commit 67692435c411 ("sched: Rework pick_next_task() slow-path")
inadvertly introduced a race because it changed a previously
unexplored dependency between dropping the rq->lock and
sched_class::put_prev_task().

The comments about dropping rq->lock, in for example
newidle_balance(), only mentions the task being current and ->on_cpu
being set. But when we look at the 'change' pattern (in for example
sched_setnuma()):

	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq->curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

It becomes obvious that if we do this after put_prev_task() has
already been called on @p, things go sideways. This is exactly what
the commit in question allows to happen when it does:

	prev->sched_class->put_prev_task(rq, prev, rf);
	if (!rq->nr_running)
		newidle_balance(rq, rf);

The newidle_balance() call will drop rq->lock after we've called
put_prev_task() and that allows the above 'change' pattern to
interleave and mess up the state.

Furthermore, it turns out we lost the RT-pull when we put the last DL
task.

Fix both problems by extracting the balancing from put_prev_task() and
doing a multi-class balance() pass before put_prev_task().

Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Quentin Perret <qperret@google.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>
1 parent e3b8b6a
Raw File
dontdiff
*.a
*.aux
*.bc
*.bin
*.bz2
*.c.[012]*.*
*.cis
*.cpio
*.csp
*.dsp
*.dvi
*.elf
*.eps
*.fw
*.gcno
*.gcov
*.gen.S
*.gif
*.grep
*.grp
*.gz
*.html
*.i
*.jpeg
*.ko
*.ll
*.log
*.lst
*.lzma
*.lzo
*.mo
*.moc
*.mod
*.mod.c
*.o
*.o.*
*.order
*.orig
*.out
*.patch
*.pdf
*.plist
*.png
*.pot
*.ps
*.rej
*.s
*.sgml
*.so
*.so.dbg
*.symtypes
*.tab.c
*.tab.h
*.tex
*.ver
*.xml
*.xz
*_MODULES
*_vga16.c
*~
\#*#
*.9
.*
.*.d
.mm
53c700_d.h
CVS
ChangeSet
GPATH
GRTAGS
GSYMS
GTAGS
Image
Module.markers
Module.symvers
PENDING
SCCS
System.map*
TAGS
aconf
af_names.h
aic7*reg.h*
aic7*reg_print.c*
aic7*seq.h*
aicasm
aicdb.h*
altivec*.c
asm-offsets.h
asm_offsets.h
autoconf.h*
av_permissions.h
bbootsect
bin2c
binkernel.spec
bootsect
bounds.h
bsetup
btfixupprep
build
bvmlinux
bzImage*
capability_names.h
capflags.c
classlist.h*
comp*.log
compile.h*
conf
config
config-*
config.mak
config.mak.autogen
conmakehash
consolemap_deftbl.c*
cpustr.h
crc32table.h*
cscope.*
defkeymap.c
devlist.h*
devicetable-offsets.h
dnotify_test
dslm
dtc
elf2ecoff
elfconfig.h*
evergreen_reg_safe.h
fixdep
flask.h
fore200e_mkfirm
fore200e_pca_fw.c*
gconf
gconf-cfg
gen-devlist
gen_crc32table
gen_init_cpio
generated
genheaders
genksyms
*_gray256.c
hpet_example
hugepage-mmap
hugepage-shm
ihex2fw
inat-tables.c
initramfs_list
int16.c
int1.c
int2.c
int32.c
int4.c
int8.c
kallsyms
keywords.c
ksym.c*
ksym.h*
*lex.c
*lex.*.c
linux
logo_*.c
logo_*_clut224.c
logo_*_mono.c
mach-types
mach-types.h
machtypes.h
map
map_hugetlb
mconf
mconf-cfg
miboot*
mk_elfconfig
mkboot
mkbugboot
mkcpustr
mkdep
mkprep
mkregtable
mktables
mktree
mkutf8data
modpost
modules.builtin
modules.builtin.modinfo
modules.order
modversions.h*
nconf
nconf-cfg
ncscope.*
offset.h
oui.c*
page-types
parse.c
parse.h
patches*
pca200e.bin
pca200e_ecd.bin2
perf.data
perf.data.old
perf-archive
piggyback
piggy.gzip
piggy.S
pnmtologo
ppc_defs.h*
pss_boot.h
qconf
qconf-cfg
r100_reg_safe.h
r200_reg_safe.h
r300_reg_safe.h
r420_reg_safe.h
r600_reg_safe.h
randomize_layout_hash.h
randomize_layout_seed.h
recordmcount
relocs
rlim_names.h
rn50_reg_safe.h
rs600_reg_safe.h
rv515_reg_safe.h
series
setup
setup.bin
setup.elf
sortextable
sImage
sm_tbl*
split-include
syscalltab.h
tables.c
tags
test_get_len
tftpboot.img
timeconst.h
times.h*
trix_boot.h
utsrelease.h*
vdso-syms.lds
vdso.lds
vdso32-int80-syms.lds
vdso32-syms.lds
vdso32-syscall-syms.lds
vdso32-sysenter-syms.lds
vdso32.lds
vdso32.so.dbg
vdso64.lds
vdso64.so.dbg
version.h*
vmImage
vmlinux
vmlinux-*
vmlinux.aout
vmlinux.bin.all
vmlinux.lds
vmlinuz
voffset.h
vsyscall.lds
vsyscall_32.lds
wanxlfw.inc
uImage
unifdef
utf8data.h
wakeup.bin
wakeup.elf
wakeup.lds
zImage*
zoffset.h
back to top