Revision - 6e2df05 - sched: Fix pick_next_task() vs 'change' pattern race

Revision 6e2df0581f569038719cf2bc2b3baa3fcc83cab4 authored by Peter Zijlstra on 08 November 2019, 10:11:52 UTC, committed by Peter Zijlstra on 08 November 2019, 21:34:14 UTC

sched: Fix pick_next_task() vs 'change' pattern race

Commit 67692435c411 ("sched: Rework pick_next_task() slow-path")
inadvertly introduced a race because it changed a previously
unexplored dependency between dropping the rq->lock and
sched_class::put_prev_task().

The comments about dropping rq->lock, in for example
newidle_balance(), only mentions the task being current and ->on_cpu
being set. But when we look at the 'change' pattern (in for example
sched_setnuma()):

	queued = task_on_rq_queued(p); /* p->on_rq == TASK_ON_RQ_QUEUED */
	running = task_current(rq, p); /* rq->curr == p */

	if (queued)
		dequeue_task(...);
	if (running)
		put_prev_task(...);

	/* change task properties */

	if (queued)
		enqueue_task(...);
	if (running)
		set_next_task(...);

It becomes obvious that if we do this after put_prev_task() has
already been called on @p, things go sideways. This is exactly what
the commit in question allows to happen when it does:

	prev->sched_class->put_prev_task(rq, prev, rf);
	if (!rq->nr_running)
		newidle_balance(rq, rf);

The newidle_balance() call will drop rq->lock after we've called
put_prev_task() and that allows the above 'change' pattern to
interleave and mess up the state.

Furthermore, it turns out we lost the RT-pull when we put the last DL
task.

Fix both problems by extracting the balancing from put_prev_task() and
doing a multi-class balance() pass before put_prev_task().

Fixes: 67692435c411 ("sched: Rework pick_next_task() slow-path")
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Quentin Perret <qperret@google.com>
Tested-by: Valentin Schneider <valentin.schneider@arm.com>

1 parent e3b8b6a

Files
Changes

Permalinks

sparse.rst

.. Copyright 2004 Linus Torvalds
.. Copyright 2004 Pavel Machek <pavel@ucw.cz>
.. Copyright 2006 Bob Copeland <me@bobcopeland.com>

Sparse
======

Sparse is a semantic checker for C programs; it can be used to find a
number of potential problems with kernel code.  See
https://lwn.net/Articles/689907/ for an overview of sparse; this document
contains some kernel-specific sparse information.


Using sparse for typechecking
-----------------------------

"__bitwise" is a type attribute, so you have to do something like this::

        typedef int __bitwise pm_request_t;

        enum pm_request {
                PM_SUSPEND = (__force pm_request_t) 1,
                PM_RESUME = (__force pm_request_t) 2
        };

which makes PM_SUSPEND and PM_RESUME "bitwise" integers (the "__force" is
there because sparse will complain about casting to/from a bitwise type,
but in this case we really _do_ want to force the conversion). And because
the enum values are all the same type, now "enum pm_request" will be that
type too.

And with gcc, all the "__bitwise"/"__force stuff" goes away, and it all
ends up looking just like integers to gcc.

Quite frankly, you don't need the enum there. The above all really just
boils down to one special "int __bitwise" type.

So the simpler way is to just do::

        typedef int __bitwise pm_request_t;

        #define PM_SUSPEND ((__force pm_request_t) 1)
        #define PM_RESUME ((__force pm_request_t) 2)

and you now have all the infrastructure needed for strict typechecking.

One small note: the constant integer "0" is special. You can use a
constant zero as a bitwise integer type without sparse ever complaining.
This is because "bitwise" (as the name implies) was designed for making
sure that bitwise types don't get mixed up (little-endian vs big-endian
vs cpu-endian vs whatever), and there the constant "0" really _is_
special.

Using sparse for lock checking
------------------------------

The following macros are undefined for gcc and defined during a sparse
run to use the "context" tracking feature of sparse, applied to
locking.  These annotations tell sparse when a lock is held, with
regard to the annotated function's entry and exit.

__must_hold - The specified lock is held on function entry and exit.

__acquires - The specified lock is held on function exit, but not entry.

__releases - The specified lock is held on function entry, but not exit.

If the function enters and exits without the lock held, acquiring and
releasing the lock inside the function in a balanced way, no
annotation is needed.  The three annotations above are for cases where
sparse would otherwise report a context imbalance.

Getting sparse
--------------

You can get latest released versions from the Sparse homepage at
https://sparse.wiki.kernel.org/index.php/Main_Page

Alternatively, you can get snapshots of the latest development version
of sparse using git to clone::

        git://git.kernel.org/pub/scm/devel/sparse/sparse.git

Once you have it, just do::

        make
        make install

as a regular user, and it will install sparse in your ~/bin directory.

Using sparse
------------

Do a kernel make with "make C=1" to run sparse on all the C files that get
recompiled, or use "make C=2" to run sparse on the files whether they need to
be recompiled or not.  The latter is a fast way to check the whole tree if you
have already built it.

The optional make variable CF can be used to pass arguments to sparse.  The
build system passes -Wbitwise to sparse automatically.

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...