Revision 63cae12bce9861cec309798d34701cf3da20bc71 authored by Peter Zijlstra on 09 December 2016, 13:59:00 UTC, committed by Ingo Molnar on 14 January 2017, 09:56:10 UTC
There is problem with installing an event in a task that is 'stuck' on
an offline CPU.

Blocked tasks are not dis-assosciated from offlined CPUs, after all, a
blocked task doesn't run and doesn't require a CPU etc.. Only on
wakeup do we ammend the situation and place the task on a available
CPU.

If we hit such a task with perf_install_in_context() we'll loop until
either that task wakes up or the CPU comes back online, if the task
waking depends on the event being installed, we're stuck.

While looking into this issue, I also spotted another problem, if we
hit a task with perf_install_in_context() that is in the middle of
being migrated, that is we observe the old CPU before sending the IPI,
but run the IPI (on the old CPU) while the task is already running on
the new CPU, things also go sideways.

Rework things to rely on task_curr() -- outside of rq->lock -- which
is rather tricky. Imagine the following scenario where we're trying to
install the first event into our task 't':

CPU0            CPU1            CPU2

                (current == t)

t->perf_event_ctxp[] = ctx;
smp_mb();
cpu = task_cpu(t);

                switch(t, n);
                                migrate(t, 2);
                                switch(p, t);

                                ctx = t->perf_event_ctxp[]; // must not be NULL

smp_function_call(cpu, ..);

                generic_exec_single()
                  func();
                    spin_lock(ctx->lock);
                    if (task_curr(t)) // false

                    add_event_to_ctx();
                    spin_unlock(ctx->lock);

                                perf_event_context_sched_in();
                                  spin_lock(ctx->lock);
                                  // sees event

So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'.
Because if CPU2's load of that variable were to observe NULL, it would
not try to schedule the ctx and we'd have a task running without its
counter, which would be 'bad'.

As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
first and not see the event yet, then CPU0 must observe task_curr()
and retry. If the install happens first, then we must see the event on
sched-in and all is well.

I think we can translate the first part (until the 'must not be NULL')
of the scenario to a litmus test like:

  C C-peterz

  {
  }

  P0(int *x, int *y)
  {
          int r1;

          WRITE_ONCE(*x, 1);
          smp_mb();
          r1 = READ_ONCE(*y);
  }

  P1(int *y, int *z)
  {
          WRITE_ONCE(*y, 1);
          smp_store_release(z, 1);
  }

  P2(int *x, int *z)
  {
          int r1;
          int r2;

          r1 = smp_load_acquire(z);
	  smp_mb();
          r2 = READ_ONCE(*x);
  }

  exists
  (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)

Where:
  x is perf_event_ctxp[],
  y is our tasks's CPU, and
  z is our task being placed on the rq of CPU2.

The P0 smp_mb() is the one added by this patch, ordering the store to
perf_event_ctxp[] from find_get_context() and the load of task_cpu()
in task_function_call().

The smp_store_release/smp_load_acquire model the RCpc locking of the
rq->lock and the smp_mb() of P2 is the context switch switching from
whatever CPU2 was running to our task 't'.

This litmus test evaluates into:

  Test C-peterz Allowed
  States 7
  0:r1=0; 2:r1=0; 2:r2=0;
  0:r1=0; 2:r1=0; 2:r2=1;
  0:r1=0; 2:r1=1; 2:r2=1;
  0:r1=1; 2:r1=0; 2:r2=0;
  0:r1=1; 2:r1=0; 2:r2=1;
  0:r1=1; 2:r1=1; 2:r2=0;
  0:r1=1; 2:r1=1; 2:r2=1;
  No
  Witnesses
  Positive: 0 Negative: 7
  Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
  Observation C-peterz Never 0 7
  Hash=e427f41d9146b2a5445101d3e2fcaa34

And the strong and weak model agree.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: jeremy.linton@arm.com
Link: http://lkml.kernel.org/r/20161209135900.GU3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent ad5013d
Raw File
vg468.h
/*
 * vg468.h 1.11 1999/10/25 20:03:34
 *
 * The contents of this file are subject to the Mozilla Public License
 * Version 1.1 (the "License"); you may not use this file except in
 * compliance with the License. You may obtain a copy of the License
 * at http://www.mozilla.org/MPL/
 *
 * Software distributed under the License is distributed on an "AS IS"
 * basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
 * the License for the specific language governing rights and
 * limitations under the License. 
 *
 * The initial developer of the original code is David A. Hinds
 * <dahinds@users.sourceforge.net>.  Portions created by David A. Hinds
 * are Copyright (C) 1999 David A. Hinds.  All Rights Reserved.
 *
 * Alternatively, the contents of this file may be used under the
 * terms of the GNU General Public License version 2 (the "GPL"), in which
 * case the provisions of the GPL are applicable instead of the
 * above.  If you wish to allow the use of your version of this file
 * only under the terms of the GPL and not to allow others to use
 * your version of this file under the MPL, indicate your decision by
 * deleting the provisions above and replace them with the notice and
 * other provisions required by the GPL.  If you do not delete the
 * provisions above, a recipient may use your version of this file
 * under either the MPL or the GPL.
 */

#ifndef _LINUX_VG468_H
#define _LINUX_VG468_H

/* Special bit in I365_IDENT used for Vadem chip detection */
#define I365_IDENT_VADEM	0x08

/* Special definitions in I365_POWER */
#define VG468_VPP2_MASK		0x0c
#define VG468_VPP2_5V		0x04
#define VG468_VPP2_12V		0x08

/* Unique Vadem registers */
#define VG469_VSENSE		0x1f	/* Card voltage sense */
#define VG469_VSELECT		0x2f	/* Card voltage select */
#define VG468_CTL		0x38	/* Control register */
#define VG468_TIMER		0x39	/* Timer control */
#define VG468_MISC		0x3a	/* Miscellaneous */
#define VG468_GPIO_CFG		0x3b	/* GPIO configuration */
#define VG469_EXT_MODE		0x3c	/* Extended mode register */
#define VG468_SELECT		0x3d	/* Programmable chip select */
#define VG468_SELECT_CFG	0x3e	/* Chip select configuration */
#define VG468_ATA		0x3f	/* ATA control */

/* Flags for VG469_VSENSE */
#define VG469_VSENSE_A_VS1	0x01
#define VG469_VSENSE_A_VS2	0x02
#define VG469_VSENSE_B_VS1	0x04
#define VG469_VSENSE_B_VS2	0x08

/* Flags for VG469_VSELECT */
#define VG469_VSEL_VCC		0x03
#define VG469_VSEL_5V		0x00
#define VG469_VSEL_3V		0x03
#define VG469_VSEL_MAX		0x0c
#define VG469_VSEL_EXT_STAT	0x10
#define VG469_VSEL_EXT_BUS	0x20
#define VG469_VSEL_MIXED	0x40
#define VG469_VSEL_ISA		0x80

/* Flags for VG468_CTL */
#define VG468_CTL_SLOW		0x01	/* 600ns memory timing */
#define VG468_CTL_ASYNC		0x02	/* Asynchronous bus clocking */
#define VG468_CTL_TSSI		0x08	/* Tri-state some outputs */
#define VG468_CTL_DELAY		0x10	/* Card detect debounce */
#define VG468_CTL_INPACK	0x20	/* Obey INPACK signal? */
#define VG468_CTL_POLARITY	0x40	/* VCCEN polarity */
#define VG468_CTL_COMPAT	0x80	/* Compatibility stuff */

#define VG469_CTL_WS_COMPAT	0x04	/* Wait state compatibility */
#define VG469_CTL_STRETCH	0x10	/* LED stretch */

/* Flags for VG468_TIMER */
#define VG468_TIMER_ZEROPWR	0x10	/* Zero power control */
#define VG468_TIMER_SIGEN	0x20	/* Power up */
#define VG468_TIMER_STATUS	0x40	/* Activity timer status */
#define VG468_TIMER_RES		0x80	/* Timer resolution */
#define VG468_TIMER_MASK	0x0f	/* Activity timer timeout */

/* Flags for VG468_MISC */
#define VG468_MISC_GPIO		0x04	/* General-purpose IO */
#define VG468_MISC_DMAWSB	0x08	/* DMA wait state control */
#define VG469_MISC_LEDENA	0x10	/* LED enable */
#define VG468_MISC_VADEMREV	0x40	/* Vadem revision control */
#define VG468_MISC_UNLOCK	0x80	/* Unique register lock */

/* Flags for VG469_EXT_MODE_A */
#define VG469_MODE_VPPST	0x03	/* Vpp steering control */
#define VG469_MODE_INT_SENSE	0x04	/* Internal voltage sense */
#define VG469_MODE_CABLE	0x08
#define VG469_MODE_COMPAT	0x10	/* i82365sl B or DF step */
#define VG469_MODE_TEST		0x20
#define VG469_MODE_RIO		0x40	/* Steer RIO to INTR? */

/* Flags for VG469_EXT_MODE_B */
#define VG469_MODE_B_3V		0x01	/* 3.3v for socket B */

#endif /* _LINUX_VG468_H */
back to top