swh:1:snp:77163734605b0ec556b01d897b7bb4a7e30d46b6

sort by:
Revision Author Date Message Commit Date
d2d4edb perf probe: Fix to show correct locations for events on modules Fix to show correct locations for events on modules by relocating given address instead of retrying after failure. This happens when the module text size is big enough, bigger than sh_addr, because the original code retries with given address + sh_addr if it failed to find CU DIE at the given address. Any address smaller than sh_addr always fails and it retries with the correct address, but addresses bigger than sh_addr will get a CU DIE which is on the given address (not adjusted by sh_addr). In my environment(x86-64), the sh_addr of ".text" section is 0x10030. Since i915 is a huge kernel module, we can see this issue as below. $ grep "[Tt] .*\[i915\]" /proc/kallsyms | sort | head -n1 ffffffffc0270000 t i915_switcheroo_can_switch [i915] ffffffffc0270000 + 0x10030 = ffffffffc0280030, so we'll check symbols cross this boundary. $ grep "[Tt] .*\[i915\]" /proc/kallsyms | grep -B1 ^ffffffffc028\ | head -n 2 ffffffffc027ff80 t haswell_init_clock_gating [i915] ffffffffc0280110 t valleyview_init_clock_gating [i915] So setup probes on both function and see what happen. $ sudo ./perf probe -m i915 -a haswell_init_clock_gating \ -a valleyview_init_clock_gating Added new events: probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915) You can now use it in all perf tools, such as: perf record -e probe:valleyview_init_clock_gating -aR sleep 1 $ sudo ./perf probe -l probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) probe:valleyview_init_clock_gating (on i915_vga_set_decode:4@gpu/drm/i915/i915_drv.c in i915) As you can see, haswell_init_clock_gating is correctly shown, but valleyview_init_clock_gating is not. With this patch, both events are shown correctly. $ sudo ./perf probe -l probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) Committer notes: In my case: # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating Added new events: probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915) You can now use it in all perf tools, such as: perf record -e probe:valleyview_init_clock_gating -aR sleep 1 # perf probe -l probe:haswell_init_clock_gating (on i915_getparam+432@gpu/drm/i915/i915_drv.c in i915) probe:valleyview_init_clock_gating (on __i915_printk+240@gpu/drm/i915/i915_drv.c in i915) # # readelf -SW /lib/modules/4.9.0+/build/vmlinux | egrep -w '.text|Name' [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 1] .text PROGBITS ffffffff81000000 200000 822fd3 00 AX 0 0 4096 # So both are b0rked, now with the fix: # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating Added new events: probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915) You can now use it in all perf tools, such as: perf record -e probe:valleyview_init_clock_gating -aR sleep 1 # perf probe -l probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915) # Both looks correct. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/148411436777.9978.1440275861947194930.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 16 January 2017, 18:14:06 UTC
18e7a45 perf/x86: Reject non sampling events with precise_ip As Peter suggested [1] rejecting non sampling PEBS events, because they dont make any sense and could cause bugs in the NMI handler [2]. [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.org Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20170103142454.GA26251@krava Signed-off-by: Ingo Molnar <mingo@kernel.org> 14 January 2017, 10:06:50 UTC
475113d perf/x86/intel: Account interrupts for PEBS errors It's possible to set up PEBS events to get only errors and not any data, like on SNB-X (model 45) and IVB-EP (model 62) via 2 perf commands running simultaneously: taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10 This leads to a soft lock up, because the error path of the intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt for error PEBS interrupts, so in case you're getting ONLY errors you don't have a way to stop the event when it's over the max_samples_per_tick limit: NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816] ... RIP: 0010:[<ffffffff81159232>] [<ffffffff81159232>] smp_call_function_single+0xe2/0x140 ... Call Trace: ? trace_hardirqs_on_caller+0xf5/0x1b0 ? perf_cgroup_attach+0x70/0x70 perf_install_in_context+0x199/0x1b0 ? ctx_resched+0x90/0x90 SYSC_perf_event_open+0x641/0xf90 SyS_perf_event_open+0x9/0x10 do_syscall_64+0x6c/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 Add perf_event_account_interrupt() which does the interrupt and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s error path. We keep the pending_kill and pending_wakeup logic only in the __perf_event_overflow() path, because they make sense only if there's any data to deliver. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> 14 January 2017, 10:06:49 UTC
321027c perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race Di Shen reported a race between two concurrent sys_perf_event_open() calls where both try and move the same pre-existing software group into a hardware context. The problem is exactly that described in commit: f63a8daa5812 ("perf: Fix event->ctx locking") ... where, while we wait for a ctx->mutex acquisition, the event->ctx relation can have changed under us. That very same commit failed to recognise sys_perf_event_context() as an external access vector to the events and thereby didn't apply the established locking rules correctly. So while one sys_perf_event_open() call is stuck waiting on mutex_lock_double(), the other (which owns said locks) moves the group about. So by the time the former sys_perf_event_open() acquires the locks, the context we've acquired is stale (and possibly dead). Apply the established locking rules as per perf_event_ctx_lock_nested() to the mutex_lock_double() for the 'move_group' case. This obviously means we need to validate state after we acquire the locks. Reported-by: Di Shen (Keen Lab) Tested-by: John Dias <joaodias@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Min Chong <mchong@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> 14 January 2017, 09:56:11 UTC
63cae12 perf/core: Fix sys_perf_event_open() vs. hotplug There is problem with installing an event in a task that is 'stuck' on an offline CPU. Blocked tasks are not dis-assosciated from offlined CPUs, after all, a blocked task doesn't run and doesn't require a CPU etc.. Only on wakeup do we ammend the situation and place the task on a available CPU. If we hit such a task with perf_install_in_context() we'll loop until either that task wakes up or the CPU comes back online, if the task waking depends on the event being installed, we're stuck. While looking into this issue, I also spotted another problem, if we hit a task with perf_install_in_context() that is in the middle of being migrated, that is we observe the old CPU before sending the IPI, but run the IPI (on the old CPU) while the task is already running on the new CPU, things also go sideways. Rework things to rely on task_curr() -- outside of rq->lock -- which is rather tricky. Imagine the following scenario where we're trying to install the first event into our task 't': CPU0 CPU1 CPU2 (current == t) t->perf_event_ctxp[] = ctx; smp_mb(); cpu = task_cpu(t); switch(t, n); migrate(t, 2); switch(p, t); ctx = t->perf_event_ctxp[]; // must not be NULL smp_function_call(cpu, ..); generic_exec_single() func(); spin_lock(ctx->lock); if (task_curr(t)) // false add_event_to_ctx(); spin_unlock(ctx->lock); perf_event_context_sched_in(); spin_lock(ctx->lock); // sees event So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'. Because if CPU2's load of that variable were to observe NULL, it would not try to schedule the ctx and we'd have a task running without its counter, which would be 'bad'. As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it first and not see the event yet, then CPU0 must observe task_curr() and retry. If the install happens first, then we must see the event on sched-in and all is well. I think we can translate the first part (until the 'must not be NULL') of the scenario to a litmus test like: C C-peterz { } P0(int *x, int *y) { int r1; WRITE_ONCE(*x, 1); smp_mb(); r1 = READ_ONCE(*y); } P1(int *y, int *z) { WRITE_ONCE(*y, 1); smp_store_release(z, 1); } P2(int *x, int *z) { int r1; int r2; r1 = smp_load_acquire(z); smp_mb(); r2 = READ_ONCE(*x); } exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0) Where: x is perf_event_ctxp[], y is our tasks's CPU, and z is our task being placed on the rq of CPU2. The P0 smp_mb() is the one added by this patch, ordering the store to perf_event_ctxp[] from find_get_context() and the load of task_cpu() in task_function_call(). The smp_store_release/smp_load_acquire model the RCpc locking of the rq->lock and the smp_mb() of P2 is the context switch switching from whatever CPU2 was running to our task 't'. This litmus test evaluates into: Test C-peterz Allowed States 7 0:r1=0; 2:r1=0; 2:r2=0; 0:r1=0; 2:r1=0; 2:r2=1; 0:r1=0; 2:r1=1; 2:r2=1; 0:r1=1; 2:r1=0; 2:r2=0; 0:r1=1; 2:r1=0; 2:r2=1; 0:r1=1; 2:r1=1; 2:r2=0; 0:r1=1; 2:r1=1; 2:r2=1; No Witnesses Positive: 0 Negative: 7 Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0) Observation C-peterz Never 0 7 Hash=e427f41d9146b2a5445101d3e2fcaa34 And the strong and weak model agree. Reported-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Cc: jeremy.linton@arm.com Link: http://lkml.kernel.org/r/20161209135900.GU3174@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> 14 January 2017, 09:56:10 UTC
ad5013d perf/x86/intel: Use ULL constant to prevent undefined shift behaviour When x86_pmu.num_counters is 32 the shift of the integer constant 1 is exceeding 32bit and therefor undefined behaviour. Fix this by shifting 1ULL instead of 1. Reported-by: CoverityScan CID#1192105 ("Bad bit shift operation") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: http://lkml.kernel.org/r/20170111114310.17928-1-colin.king@canonical.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 11 January 2017, 15:43:30 UTC
6d6daa2 perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code hswep_uncore_cpu_init() uses a hardcoded physical package id 0 for the boot cpu. This works as long as the boot CPU is actually on the physical package 0, which is normaly the case after power on / reboot. But it fails with a NULL pointer dereference when a kdump kernel is started on a secondary socket which has a different physical package id because the locigal package translation for physical package 0 does not exist. Use the logical package id of the boot cpu instead of hard coded 0. [ tglx: Rewrote changelog once more ] Fixes: cf6d445f6897 ("perf/x86/uncore: Track packages, not per CPU data") Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1483628965-2890-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 11 January 2017, 11:13:21 UTC
74545f6 perf/x86: Set pmu->module in Intel PMU modules The conversion of Intel PMU drivers into modules did not include reference counting. The machine will crash when attempting to access deleted code if an event from a module PMU is started and the module removed before the event is destroyed. i.e. this crashes the machine: $ insmod intel-rapl-perf.ko $ perf stat -e power/energy-cores/ -C 0 & $ rmmod intel-rapl-perf.ko Set THIS_MODULE to pmu->module in Intel module PMUs so that generic code can handle reference counting and deny rmmod while an event still exists. Signed-off-by: David Carrillo-Cisneros <davidcc@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1482455860-116269-1-git-send-email-davidcc@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> 05 January 2017, 08:13:55 UTC
4e06d4f Merge tag 'perf-urgent-for-mingo-4.10-20170104' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes and one improvement from Arnaldo Carvalho de Melo: Fixes: - Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira) - Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo) - Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo) - Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo) - Fix 'perf record --switch-output' documentation and comment (Jiri Olsa) - Fix 'perf probe' for cross arch probing (Masami Hiramatsu) Improvement: - Show total scheduling time in 'perf sched timehist' (Namhyumg Kim) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 05 January 2017, 07:33:02 UTC
8a937a2 perf probe: Fix to probe on gcc generated symbols for offline kernel Fix perf-probe to show probe definition on gcc generated symbols for offline kernel (including cross-arch kernel image). gcc sometimes optimizes functions and generate new symbols with suffixes such as ".constprop.N" or ".isra.N" etc. Since those symbol names are not recorded in DWARF, we have to find correct generated symbols from offline ELF binary to probe on it (kallsyms doesn't correct it). For online kernel or uprobes we don't need it because those are rebased on _text, or a section relative address. E.g. Without this: $ perf probe -k build-arm/vmlinux -F __slab_alloc* __slab_alloc.constprop.9 $ perf probe -k build-arm/vmlinux -D __slab_alloc p:probe/__slab_alloc __slab_alloc+0 If you put above definition on target machine, it should fail because there is no __slab_alloc in kallsyms. With this fix, perf probe shows correct probe definition on __slab_alloc.constprop.9: $ perf probe -k build-arm/vmlinux -D __slab_alloc p:probe/__slab_alloc __slab_alloc.constprop.9+0 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/148350060434.19001.11864836288580083501.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 04 January 2017, 14:44:22 UTC
eebc509 perf probe: Fix --funcs to show correct symbols for offline module Fix --funcs (-F) option to show correct symbols for offline module. Since previous perf-probe uses machine__findnew_module_map() for offline module, even if user passes a module file (with full path) which is for other architecture, perf-probe always tries to load symbol map for current kernel module. This fix uses dso__new_map() to load the map from given binary as same as a map for user applications. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/148350053478.19001.15435255244512631545.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 04 January 2017, 14:15:09 UTC
7934c98 perf symbols: Robustify reading of build-id from sysfs Markus reported that perf segfaults when reading /sys/kernel/notes from a kernel linked with GNU gold, due to what looks like a gold bug, so do some bounds checking to avoid crashing in that case. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Report-Link: http://lkml.kernel.org/r/20161219161821.GA294@x4 Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ryhgs6a6jxvz207j2636w31c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 03 January 2017, 19:11:13 UTC
30a9c64 perf tools: Install tools/lib/traceevent plugins with install-bin Those are binaries as well, so should be installed by: make -C tools/perf install-bin' too. Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-3841b37u05evxrs1igkyu6ks@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 03 January 2017, 19:11:13 UTC
0748591 tools lib traceevent: Fix prev/next_prio for deadline tasks Currently, the sched:sched_switch tracepoint reports deadline tasks with priority -1. But when reading the trace via perf script I've got the following output: # ./d & # (d is a deadline task, see [1]) # perf record -e sched:sched_switch -a sleep 1 # perf script ... swapper 0 [000] 2146.962441: sched:sched_switch: swapper/0:0 [120] R ==> d:2593 [4294967295] d 2593 [000] 2146.972472: sched:sched_switch: d:2593 [4294967295] R ==> g:2590 [4294967295] The task d reports the wrong priority [4294967295]. This happens because the "int prio" is stored in an unsigned long long val. Although it is set as a %lld, as int is shorter than unsigned long long, trace_seq_printf prints it as a positive number. The fix is just to cast the val as an int, and print it as a %d, as in the sched:sched_switch tracepoint's "format". The output with the fix is: # ./d & # perf record -e sched:sched_switch -a sleep 1 # perf script ... swapper 0 [000] 4306.374037: sched:sched_switch: swapper/0:0 [120] R ==> d:10941 [-1] d 10941 [000] 4306.383823: sched:sched_switch: d:10941 [-1] R ==> swapper/0:0 [120] [1] d.c --- #include <stdio.h> #include <unistd.h> #include <sys/syscall.h> #include <linux/types.h> #include <linux/sched.h> struct sched_attr { __u32 size, sched_policy; __u64 sched_flags; __s32 sched_nice; __u32 sched_priority; __u64 sched_runtime, sched_deadline, sched_period; }; int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags) { return syscall(__NR_sched_setattr, pid, attr, flags); } int main(void) { struct sched_attr attr = { .size = sizeof(attr), .sched_policy = SCHED_DEADLINE, /* This creates a 10ms/30ms reservation */ .sched_runtime = 10 * 1000 * 1000, .sched_period = attr.sched_deadline = 30 * 1000 * 1000, }; if (sched_setattr(0, &attr, 0) < 0) { perror("sched_setattr"); return -1; } for(;;); } --- Committer notes: Got the program from the provided URL, http://bristot.me/lkml/d.c, trimmed it and included in the cset log above, so that we have everything needed to test it in one place. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/866ef75bcebf670ae91c6a96daa63597ba981f0d.1483443552.git.bristot@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 03 January 2017, 19:11:12 UTC
60437ac perf record: Fix --switch-output documentation and comment There's no --signal-trigger option, also adding the code comment into record man page. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Wang Nan <wangnan0@huawei.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1483431600-19887-4-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 03 January 2017, 14:11:38 UTC
efd2130 perf record: Make __record_options static There's no need for this one to be global. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Wang Nan <wangnan0@huawei.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1483431600-19887-3-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 03 January 2017, 14:11:10 UTC
b66fb1d tools lib subcmd: Add OPT_STRING_OPTARG_SET option To allow string options with a default argument and variable set when the option is used. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Wang Nan <wangnan0@huawei.com> Cc: David Ahern <dsahern@gmail.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1483431600-19887-2-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 03 January 2017, 14:10:38 UTC
1f2ed15 perf probe: Fix to get correct modname from elf header Since 'perf probe' supports cross-arch probes, it is possible to analyze different arch kernel image which has different bits-per-long. In that case, it fails to get the module name because it uses the MOD_NAME_OFFSET macro based on the host machine bits-per-long, instead of the target arch bits-per-long. This fixes above issue by changing modname-offset based on the target archs bit width. This is ok because linux kernel uses LP64 model on 64bit arch. E.g. without this (on x86_64, and target module is arm32): $ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup p:probe/configfs_lookup :configfs_lookup+0 ^-Here is an empty module name. With this fix, you can see correct module name: $ perf probe -m build-arm/fs/configfs/configfs.ko -D configfs_lookup p:probe/configfs_lookup configfs:configfs_lookup+0 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/148337043836.6752.383495516397005695.stgit@devbox Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 02 January 2017, 17:09:17 UTC
0c744ea Linux 4.10-rc2 01 January 2017, 22:31:53 UTC
4759d38 Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull DAX updates from Dan Williams: "The completion of Jan's DAX work for 4.10. As I mentioned in the libnvdimm-for-4.10 pull request, these are some final fixes for the DAX dirty-cacheline-tracking invalidation work that was merged through the -mm, ext4, and xfs trees in -rc1. These patches were prepared prior to the merge window, but we waited for 4.10-rc1 to have a stable merge base after all the prerequisites were merged. Quoting Jan on the overall changes in these patches: "So I'd like all these 6 patches to go for rc2. The first three patches fix invalidation of exceptional DAX entries (a bug which is there for a long time) - without these patches data loss can occur on power failure even though user called fsync(2). The other three patches change locking of DAX faults so that ->iomap_begin() is called in a more relaxed locking context and we are safe to start a transaction there for ext4" These have received a build success notification from the kbuild robot, and pass the latest libnvdimm unit tests. There have not been any -next releases since -rc1, so they have not appeared there" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: ext4: Simplify DAX fault path dax: Call ->iomap_begin without entry lock during dax fault dax: Finish fault completely when loading holes dax: Avoid page invalidation races and unnecessary radix tree traversals mm: Invalidate DAX radix tree entries only if appropriate ext2: Return BH_New buffers for zeroed blocks 01 January 2017, 20:27:05 UTC
238d1d0 Merge tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux Pull documentation fixes from Jonathan Corbet: "Two small fixes: - A merge error on my part broke the DocBook build. I've requisitioned one of tglx's frozen sharks for appropriate disciplinary action and resolved to be more careful about testing the DocBook stuff as long as it's still around. - Fix an error in unaligned-memory-access.txt" * tag 'docs-4.10-rc1-fix' of git://git.lwn.net/linux: Documentation/unaligned-memory-access.txt: fix incorrect comparison operator docs: Fix build failure 30 December 2016, 17:32:26 UTC
f3de082 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a boot failure on some platforms when crypto self test is enabled along with the new acomp interface" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: testmgr - Use heap buffer for acomp test input 30 December 2016, 17:29:50 UTC
98473f9 mm/filemap: fix parameters to test_bit() mm/filemap.c: In function 'clear_bit_unlock_is_negative_byte': mm/filemap.c:933:9: error: too few arguments to function 'test_bit' return test_bit(PG_waiters); ^~~~~~~~ Fixes: b91e1302ad9b ('mm: optimize PageWaiters bit use for unlock_page()') Signed-off-by: Olof Johansson <olof@lixom.net> Brown-paper-bag-by: Linus Torvalds <dummy@duh.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 29 December 2016, 22:46:39 UTC
b91e130 mm: optimize PageWaiters bit use for unlock_page() In commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") Nick Piggin made our page locking no longer unconditionally touch the hashed page waitqueue, which not only helps performance in general, but is particularly helpful on NUMA machines where the hashed wait queues can bounce around a lot. However, the "clear lock bit atomically and then test the waiters bit" sequence turns out to be much more expensive than it needs to be, because you get a nasty stall when trying to access the same word that just got updated atomically. On architectures where locking is done with LL/SC, this would be trivial to fix with a new primitive that clears one bit and tests another atomically, but that ends up not working on x86, where the only atomic operations that return the result end up being cmpxchg and xadd. The atomic bit operations return the old value of the same bit we changed, not the value of an unrelated bit. On x86, we could put the lock bit in the high bit of the byte, and use "xadd" with that bit (where the overflow ends up not touching other bits), and look at the other bits of the result. However, an even simpler model is to just use a regular atomic "and" to clear the lock bit, and then the sign bit in eflags will indicate the resulting state of the unrelated bit #7. So by moving the PageWaiters bit up to bit #7, we can atomically clear the lock bit and test the waiters bit on x86 too. And architectures with LL/SC (which is all the usual RISC suspects), the particular bit doesn't matter, so they are fine with this approach too. This avoids the extra access to the same atomic word, and thus avoids the costly stall at page unlock time. The only downside is that the interface ends up being a bit odd and specialized: clear a bit in a byte, and test the sign bit. Nick doesn't love the resulting name of the new primitive, but I'd rather make the name be descriptive and very clear about the limitation imposed by trying to work across all relevant architectures than make it be some generic thing that doesn't make the odd semantics explicit. So this introduces the new architecture primitive clear_bit_unlock_is_negative_byte(); and adds the trivial implementation for x86. We have a generic non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)" combination) which can be overridden by any architecture that can do better. According to Nick, Power has the same hickup x86 has, for example, but some other architectures may not even care. All these optimizations mean that my page locking stress-test (which is just executing a lot of small short-lived shell scripts: "make test" in the git source tree) no longer makes our page locking look horribly bad. Before all these optimizations, just the unlock_page() costs were just over 3% of all CPU overhead on "make test". After this, it's down to 0.66%, so just a quarter of the cost it used to be. (The difference on NUMA is bigger, but there this micro-optimization is likely less noticeable, since the big issue on NUMA was not the accesses to 'struct page', but the waitqueue accesses that were already removed by Nick's earlier commit). Acked-by: Nick Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 29 December 2016, 19:03:15 UTC
b6f4c66 samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Joe Stringer <joe@ovn.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-3awp0nv8tpnblatojmwjww7z@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 28 December 2016, 13:47:13 UTC
2d706e7 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a hash corruption bug in the marvell driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: marvell - Copy IVDIG before launching partial DMA ahash requests 28 December 2016, 01:51:36 UTC
ee12996 samples/bpf sock_example: Avoid getting ethhdr from two includes To avoid the following build failure on Alpine Linux 3.4, that has clang-3.8 with the bpf target: HOSTCC samples/bpf/sock_example.o In file included from /usr/include/net/ethernet.h:10:0, from /git/linux/samples/bpf/sock_example.h:7, from /git/linux/samples/bpf/sock_example.c:30: /usr/include/netinet/if_ether.h:96:8: error: redefinition of 'struct ethhdr' struct ethhdr { ^ In file included from /git/linux/samples/bpf/sock_example.c:26:0: ./usr/include/linux/if_ether.h:144:8: note: originally defined here struct ethhdr { ^ scripts/Makefile.host:124: recipe for target 'samples/bpf/sock_example.o' failed make[2]: *** [samples/bpf/sock_example.o] Error 1 /git/linux/Makefile:1658: recipe for target 'samples/bpf/' failed So include net/if_ether.h for the needs of sock_example.h, using the same include that sock_example.c uses. Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Joe Stringer <joe@ovn.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-m9avekl1b651qe1r1zd5tzz9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 28 December 2016, 00:49:17 UTC
9396c9c perf sched timehist: Show total scheduling time Show length of analyzed sample time and rate of idle task running. This also takes care of time range given by --time option. $ perf sched timehist -sI | tail Samples do not have callchains. Idle stats: CPU 0 idle for 930.316 msec ( 92.93%) CPU 1 idle for 963.614 msec ( 96.25%) CPU 2 idle for 885.482 msec ( 88.45%) CPU 3 idle for 938.635 msec ( 93.76%) Total number of unique tasks: 118 Total number of context switches: 2337 Total run time (msec): 3718.048 Total scheduling time (msec): 1001.131 (x 4) Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20161222060350.17655-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 28 December 2016, 00:47:57 UTC
8f18e4d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar. The most important is to not assume the packet is RX just because the destination address matches that of the device. Such an assumption causes problems when an interface is put into loopback mode. 2) If we retry when creating a new tc entry (because we dropped the RTNL mutex in order to load a module, for example) we end up with -EAGAIN and then loop trying to replay the request. But we didn't reset some state when looping back to the top like this, and if another thread meanwhile inserted the same tc entry we were trying to, we re-link it creating an enless loop in the tc chain. Fix from Daniel Borkmann. 3) There are two different WRITE bits in the MDIO address register for the stmmac chip, depending upon the chip variant. Due to a bug we could set them both, fix from Hock Leong Kweh. 4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: stmmac: fix incorrect bit set in gmac4 mdio addr register r8169: add support for RTL8168 series add-on card. net: xdp: remove unused bfp_warn_invalid_xdp_buffer() openvswitch: upcall: Fix vlan handling. ipv4: Namespaceify tcp_tw_reuse knob net: korina: Fix NAPI versus resources freeing net, sched: fix soft lockup in tc_classify net/mlx4_en: Fix user prio field in XDP forward tipc: don't send FIN message from connectionless socket ipvlan: fix multicast processing ipvlan: fix various issues in ipvlan_process_multicast() 28 December 2016, 00:04:37 UTC
36f671b Documentation/unaligned-memory-access.txt: fix incorrect comparison operator In the actual implementation ether_addr_equal function tests for equality to 0 when returning. It seems in commit 0d74c4 it is somehow overlooked to change this operator to reflect the actual function. Signed-off-by: Cihangir Akturk <cakturk@gmail.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 27 December 2016, 20:08:42 UTC
6611533 docs: Fix build failure The 80211.tmpl DocBook file was removed in commit 819bf593767c ("docs-rst: sphinxify 802.11 documentation"), but the 80211.xml target was re-added to the Makefile by commit 7ddedebb03b7 ("ALSA: doc: ReSTize writing-an-alsa-driver document"), leading to a failure when building the documentation: *** No rule to make target 'Documentation/DocBook/80211.xml', needed by 'Documentation/DocBook/80211.aux.xml'. cc: stable@vger.kernel.org Signed-off-by: John Brooks <john@fastquake.com> Mea-culpa-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net> 27 December 2016, 20:05:36 UTC
54ab6db Merge tag 'v4.10-rc1' into docs-next Linux 4.10-rc1 27 December 2016, 19:53:44 UTC
5799fc9 net: stmmac: fix incorrect bit set in gmac4 mdio addr register Fixing the gmac4 mdio write access to use MII_GMAC4_WRITE only instead of OR together with MII_WRITE. Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com> Acked-By: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net> 27 December 2016, 17:28:08 UTC
610c908 r8169: add support for RTL8168 series add-on card. This chip is the same as RTL8168, but its device id is 0x8161. Signed-off-by: Chun-Hao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net> 27 December 2016, 17:28:07 UTC
be26727 net: xdp: remove unused bfp_warn_invalid_xdp_buffer() After commit 73b62bd085f4737679ea9afc7867fa5f99ba7d1b ("virtio-net: remove the warning before XDP linearizing"), there's no users for bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for commit f23bc46c30ca5ef58b8549434899fcbac41b2cfc. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> 27 December 2016, 17:28:07 UTC
df30f74 openvswitch: upcall: Fix vlan handling. Networking stack accelerate vlan tag handling by keeping topmost vlan header in skb. This works as long as packet remains in OVS datapath. But during OVS upcall vlan header is pushed on to the packet. When such packet is sent back to OVS datapath, core networking stack might not handle it correctly. Following patch avoids this issue by accelerating the vlan tag during flow key extract. This simplifies datapath by bringing uniform packet processing for packets from all code paths. Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets"). CC: Jarno Rajahalme <jarno@ovn.org> CC: Jiri Benc <jbenc@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net> 27 December 2016, 17:28:07 UTC
56ab6b9 ipv4: Namespaceify tcp_tw_reuse knob Different namespaces might have different requirements to reuse TIME-WAIT sockets for new connections. This might be required in cases where different namespace applications are in place which require TIME_WAIT socket connections to be reduced independently of the host. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net> 27 December 2016, 17:28:07 UTC
02608e0 crypto: testmgr - Use heap buffer for acomp test input Christopher Covington reported a crash on aarch64 on recent Fedora kernels: kernel BUG at ./include/linux/scatterlist.h:140! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc #162 Hardware name: linux,dummy-virt (DT) task: ffff80007c650080 task.stack: ffff800008910000 PC is at sg_init_one+0xa0/0xb8 LR is at sg_init_one+0x24/0xb8 ... [<ffff000008398db8>] sg_init_one+0xa0/0xb8 [<ffff000008350a44>] test_acomp+0x10c/0x438 [<ffff000008350e20>] alg_test_comp+0xb0/0x118 [<ffff00000834f28c>] alg_test+0x17c/0x2f0 [<ffff00000834c6a4>] cryptomgr_test+0x44/0x50 [<ffff0000080dac70>] kthread+0xf8/0x128 [<ffff000008082ec0>] ret_from_fork+0x10/0x50 The test vectors used for input are part of the kernel image. These inputs are passed as a buffer to sg_init_one which eventually blows up with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns false for the kernel image since virt_to_page will not return the correct page. Fix this by copying the input vectors to heap buffer before setting up the scatterlist. Reported-by: Christopher Covington <cov@codeaurora.org> Fixes: d7db7a882deb ("crypto: acomp - update testmgr with support for acomp") Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> 27 December 2016, 09:32:11 UTC
1db1754 ext4: Simplify DAX fault path Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we can use transaction starting in ext4_iomap_begin() and thus simplify ext4_dax_fault(). It also provides us proper retries in case of ENOSPC. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> 27 December 2016, 04:29:25 UTC
9f141d6 dax: Call ->iomap_begin without entry lock during dax fault Currently ->iomap_begin() handler is called with entry lock held. If the filesystem held any locks between ->iomap_begin() and ->iomap_end() (such as ext4 which will want to hold transaction open), this would cause lock inversion with the iomap_apply() from standard IO path which first calls ->iomap_begin() and only then calls ->actor() callback which grabs entry locks for DAX (if it faults when copying from/to user provided buffers). Fix the problem by nesting grabbing of entry lock inside ->iomap_begin() - ->iomap_end() pair. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> 27 December 2016, 04:29:25 UTC
f449b93 dax: Finish fault completely when loading holes The only case when we do not finish the page fault completely is when we are loading hole pages into a radix tree. Avoid this special case and finish the fault in that case as well inside the DAX fault handler. It will allow us for easier iomap handling. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> 27 December 2016, 04:29:25 UTC
e3fce68 dax: Avoid page invalidation races and unnecessary radix tree traversals Currently dax_iomap_rw() takes care of invalidating page tables and evicting hole pages from the radix tree when write(2) to the file happens. This invalidation is only necessary when there is some block allocation resulting from write(2). Furthermore in current place the invalidation is racy wrt page fault instantiating a hole page just after we have invalidated it. So perform the page invalidation inside dax_iomap_actor() where we can do it only when really necessary and after blocks have been allocated so nobody will be instantiating new hole pages anymore. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> 27 December 2016, 04:29:24 UTC
c6dcf52 mm: Invalidate DAX radix tree entries only if appropriate Currently invalidate_inode_pages2_range() and invalidate_mapping_pages() just delete all exceptional radix tree entries they find. For DAX this is not desirable as we track cache dirtiness in these entries and when they are evicted, we may not flush caches although it is necessary. This can for example manifest when we write to the same block both via mmap and via write(2) (to different offsets) and fsync(2) then does not properly flush CPU caches when modification via write(2) was the last one. Create appropriate DAX functions to handle invalidation of DAX entries for invalidate_inode_pages2_range() and invalidate_mapping_pages() and wire them up into the corresponding mm functions. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> 27 December 2016, 04:29:24 UTC
e568df6 ext2: Return BH_New buffers for zeroed blocks So far we did not return BH_New buffers from ext2_get_blocks() when we allocated and zeroed-out a block for DAX inode to avoid racy zeroing in DAX code. This zeroing is gone these days so we can remove the workaround. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com> 27 December 2016, 04:29:24 UTC
0dad3a3 x86/mce/AMD: Make the init code more robust If mce_device_init() fails then the mce device pointer is NULL and the AMD mce code happily dereferences it. Add a sanity check. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 27 December 2016, 01:30:24 UTC
b9d9d69 smp/hotplug: Undo tglxs brainfart The attempt to prevent overwriting an active state resulted in a disaster which effectively disables all dynamically allocated hotplug states. Cleanup the mess. Fixes: dc280d936239 ("cpu/hotplug: Prevent overwriting of callbacks") Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 27 December 2016, 01:30:24 UTC
b4b8664 arm64: don't pull uaccess.h into *.S Split asm-only parts of arm64 uaccess.h into a new header and use that from *.S. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> 26 December 2016, 18:05:17 UTC
e6afb1a net: korina: Fix NAPI versus resources freeing Commit beb0babfb77e ("korina: disable napi on close and restart") introduced calls to napi_disable() that were missing before, unfortunately this leaves a small window during which NAPI has a chance to run, yet we just freed resources since korina_free_ring() has been called: Fix this by disabling NAPI first then freeing resource, and make sure that we also cancel the restart task before doing the resource freeing. Fixes: beb0babfb77e ("korina: disable napi on close and restart") Reported-by: Alexandros C. Couloumbis <alex@ozo.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 26 December 2016, 16:26:16 UTC
628185c net, sched: fix soft lockup in tc_classify Shahar reported a soft lockup in tc_classify(), where we run into an endless loop when walking the classifier chain due to tp->next == tp which is a state we should never run into. The issue only seems to trigger under load in the tc control path. What happens is that in tc_ctl_tfilter(), thread A allocates a new tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() with it. In that classifier callback we had to unlock/lock the rtnl mutex and returned with -EAGAIN. One reason why we need to drop there is, for example, that we need to request an action module to be loaded. This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning after we loaded and found the requested action, we need to redo the whole request so we don't race against others. While we had to unlock rtnl in that time, thread B's request was processed next on that CPU. Thread B added a new tp instance successfully to the classifier chain. When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN and destroying its tp instance which never got linked, we goto replay and redo A's request. This time when walking the classifier chain in tc_ctl_tfilter() for checking for existing tp instances we had a priority match and found the tp instance that was created and linked by thread B. Now calling again into tp->ops->change() with that tp was successful and returned without error. tp_created was never cleared in the second round, thus kernel thinks that we need to link it into the classifier chain (once again). tp and *back point to the same object due to the match we had earlier on. Thus for thread B's already public tp, we reset tp->next to tp itself and link it into the chain, which eventually causes the mentioned endless loop in tc_classify() once a packet hits the data path. Fix is to clear tp_created at the beginning of each request, also when we replay it. On the paths that can cause -EAGAIN we already destroy the original tp instance we had and on replay we really need to start from scratch. It seems that this issue was first introduced in commit 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup"). Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup") Reported-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Tested-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> 26 December 2016, 16:24:10 UTC
7ce7d89 Linux 4.10-rc1 26 December 2016, 00:13:08 UTC
8ae679c powerpc: Fix build warning on 32-bit PPC I am getting the following warning when I build kernel 4.9-git on my PowerBook G4 with a 32-bit PPC processor: AS arch/powerpc/kernel/misc_32.o arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef] This problem is evident after commit 989cea5c14be ("kbuild: prevent lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an error that has been in the code since 2005 when this source file was created. That was with commit 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S"). The offending line does not make a lot of sense. This error does not seem to cause any errors in the executable, thus I am not recommending that it be applied to any stable versions. Thanks to Nicholas Piggin for suggesting this solution. Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 26 December 2016, 00:12:20 UTC
d33d5a6 avoid spurious "may be used uninitialized" warning The timer type simplifications caused a new gcc warning: drivers/base/power/domain.c: In function ‘genpd_runtime_suspend’: drivers/base/power/domain.c:562:14: warning: ‘time_start’ may be used uninitialized in this function [-Wmaybe-uninitialized] elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start)); despite the actual use of "time_start" not having changed in any way. It appears that simply changing the type of ktime_t from a union to a plain scalar type made gcc check the use. The variable wasn't actually used uninitialized, but gcc apparently failed to notice that the conditional around the use was exactly the same as the conditional around the initialization of that variable. Add an unnecessary initialization just to shut up the compiler. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 25 December 2016, 22:56:58 UTC
3ddc76d Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer type cleanups from Thomas Gleixner: "This series does a tree wide cleanup of types related to timers/timekeeping. - Get rid of cycles_t and use a plain u64. The type is not really helpful and caused more confusion than clarity - Get rid of the ktime union. The union has become useless as we use the scalar nanoseconds storage unconditionally now. The 32bit timespec alike storage got removed due to the Y2038 limitations some time ago. That leaves the odd union access around for no reason. Clean it up. Both changes have been done with coccinelle and a small amount of manual mopping up" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Get rid of ktime_equal() ktime: Cleanup ktime_set() usage ktime: Get rid of the union clocksource: Use a plain u64 instead of cycle_t 25 December 2016, 22:30:04 UTC
b272f73 Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug notifier removal from Thomas Gleixner: "This is the final cleanup of the hotplug notifier infrastructure. The series has been reintgrated in the last two days because there came a new driver using the old infrastructure via the SCSI tree. Summary: - convert the last leftover drivers utilizing notifiers - fixup for a completely broken hotplug user - prevent setup of already used states - removal of the notifiers - treewide cleanup of hotplug state names - consolidation of state space There is a sphinx based documentation pending, but that needs review from the documentation folks" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/armada-xp: Consolidate hotplug state space irqchip/gic: Consolidate hotplug state space coresight/etm3/4x: Consolidate hotplug state space cpu/hotplug: Cleanup state names cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions staging/lustre/libcfs: Convert to hotplug state machine scsi/bnx2i: Convert to hotplug state machine scsi/bnx2fc: Convert to hotplug state machine cpu/hotplug: Prevent overwriting of callbacks x86/msr: Remove bogus cleanup from the error path bus: arm-ccn: Prevent hotplug callback leak perf/x86/intel/cstate: Prevent hotplug callback leak ARM/imx/mmcd: Fix broken cpu hotplug handling scsi: qedi: Convert to hotplug state machine 25 December 2016, 22:05:56 UTC
10bbe75 Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown. * 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: remove obsolete -M, -m, -C, -c options tools/power turbostat: Make extensible via the --add parameter tools/power turbostat: Denverton uses a 25 MHz crystal, not 19.2 MHz tools/power turbostat: line up headers when -M is used tools/power turbostat: fix SKX PKG_CSTATE_LIMIT decoding tools/power turbostat: Support Knights Mill (KNM) tools/power turbostat: Display HWP OOB status tools/power turbostat: fix Denverton BCLK tools/power turbostat: use intel-family.h model strings tools/power/turbostat: Add Denverton RAPL support tools/power/turbostat: Add Denverton support tools/power/turbostat: split core MSR support into status + limit tools/power turbostat: fix error case overflow read of slm_freq_table[] tools/power turbostat: Allocate correct amount of fd and irq entries tools/power turbostat: switch to tab delimited output tools/power turbostat: Gracefully handle ACPI S3 tools/power turbostat: tidy up output on Joule counter overflow 25 December 2016, 22:01:28 UTC
6290602 mm: add PageWaiters indicating tasks are waiting for a page bit Add a new page flag, PageWaiters, to indicate the page waitqueue has tasks waiting. This can be tested rather than testing waitqueue_active which requires another cacheline load. This bit is always set when the page has tasks on page_waitqueue(page), and is set and cleared under the waitqueue lock. It may be set when there are no tasks on the waitqueue, which will cause a harmless extra wakeup check that will clears the bit. The generic bit-waitqueue infrastructure is no longer used for pages. Instead, waitqueues are used directly with a custom key type. The generic code was not flexible enough to have PageWaiters manipulation under the waitqueue lock (which simplifies concurrency). This improves the performance of page lock intensive microbenchmarks by 2-3%. Putting two bits in the same word opens the opportunity to remove the memory barrier between clearing the lock bit and testing the waiters bit, after some work on the arch primitives (e.g., ensuring memory operand widths match and cover both bits). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 25 December 2016, 19:54:48 UTC
6326fec mm: Use owner_priv bit for PageSwapCache, valid when PageSwapBacked A page is not added to the swap cache without being swap backed, so PageSwapBacked mappings can use PG_owner_priv_1 for PageSwapCache. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 25 December 2016, 19:54:48 UTC
1f3a8e4 ktime: Get rid of ktime_equal() No point in going through loops and hoops instead of just comparing the values. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> 25 December 2016, 16:21:23 UTC
8b0e195 ktime: Cleanup ktime_set() usage ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> 25 December 2016, 16:21:22 UTC
2456e85 ktime: Get rid of the union ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless. Get rid of the union and just keep ktime_t as simple typedef of type s64. The conversion was done with coccinelle and some manual mopping up. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> 25 December 2016, 16:21:22 UTC
a5a1d1c clocksource: Use a plain u64 instead of cycle_t There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org> 25 December 2016, 10:04:12 UTC
008b69e irqchip/armada-xp: Consolidate hotplug state space The mpic is either the main interrupt controller or is cascaded behind a GIC. The mpic is single instance and the modes are mutually exclusive, so there is no reason to have seperate cpu hotplug states. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/20161221192112.333161745@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:44 UTC
6896bcd irqchip/gic: Consolidate hotplug state space Even if both drivers are compiled in only one instance can run on a given system depending on the available GIC version. So having seperate hotplug states for them is pointless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.252416267@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:44 UTC
36e5b0e coresight/etm3/4x: Consolidate hotplug state space Even if both drivers are compiled in only one instance can run on a given system depending on the available tracer cell. So having seperate hotplug states for them is pointless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Link: http://lkml.kernel.org/r/20161221192112.162765484@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:44 UTC
73c1b41 cpu/hotplug: Cleanup state names When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:44 UTC
530e9b7 cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(), register_hotcpu_notifier(), register_cpu_notifier(), __register_hotcpu_notifier(), __register_cpu_notifier(), unregister_hotcpu_notifier(), unregister_cpu_notifier(), __unregister_hotcpu_notifier(), __unregister_cpu_notifier() are unused now. Remove them and all related code. Remove also the now pointless cpu notifier error injection mechanism. The states can be executed step by step and error rollback is the same as cpu down, so any state transition can be tested w/o requiring the notifier error injection. Some CPU hotplug states are kept as they are (ab)used for hotplug state tracking. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:43 UTC
7b73796 staging/lustre/libcfs: Convert to hotplug state machine Install the callbacks via the state machine. No functional change. Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: devel@driverdev.osuosl.org Cc: Andreas Dilger <andreas.dilger@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: rt@linutronix.de Cc: lustre-devel@lists.lustre.org Link: http://lkml.kernel.org/r/20161202110027.htzzeervzkoc4muv@linutronix.de Link: http://lkml.kernel.org/r/20161221192111.922872524@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:43 UTC
e210faa scsi/bnx2i: Convert to hotplug state machine Install the callbacks via the state machine. No functional change. This is the minimal fixup so we can remove the hotplug notifier mess completely. The real rework of this driver to use work queues is still stuck in review/testing on the SCSI mailing list. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Chad Dupuis <chad.dupuis@qlogic.com> Cc: QLogic-Storage-Upstream@qlogic.com Cc: Johannes Thumshirn <jth@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20161221192111.836895753@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:43 UTC
c53b005 scsi/bnx2fc: Convert to hotplug state machine Install the callbacks via the state machine. No functional change. This is the minimal fixup so we can remove the hotplug notifier mess completely. The real rework of this driver to use work queues is still stuck in review/testing on the SCSI mailing list. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: linux-scsi@vger.kernel.org Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Chad Dupuis <chad.dupuis@qlogic.com> Cc: QLogic-Storage-Upstream@qlogic.com Cc: Johannes Thumshirn <jth@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Link: http://lkml.kernel.org/r/20161221192111.757309869@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:42 UTC
dc280d9 cpu/hotplug: Prevent overwriting of callbacks Developers manage to overwrite states blindly without thought. That's fatal and hard to debug. Add sanity checks to make it fail. This requries to restructure the code so that the dynamic state allocation happens in the same lock protected section as the actual store. Otherwise the previous assignment of 'Reserved' to the name field would trigger the overwrite check. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192111.675234535@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:42 UTC
59fefd0 x86/msr: Remove bogus cleanup from the error path The error cleanup which is invoked when the hotplug state setup failed tries to remove the failed state, which is broken. Fixes: 8fba38c937cd ("x86/msr: Convert to hotplug state machine") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> 25 December 2016, 09:47:41 UTC
26242b3 bus: arm-ccn: Prevent hotplug callback leak In case the driver registration fails, the hotplug callback is leaked. Not fatal, because it's never invoked as there are no instances registered, but wrong nevertheless. Fixes: fdc15a36d84e ("bus/arm-ccn: Convert to hotplug statemachine") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> 25 December 2016, 09:47:40 UTC
834fcd2 perf/x86/intel/cstate: Prevent hotplug callback leak If the pmu registration fails the registered hotplug callbacks are not removed. Wrong in any case, but fatal in case of a modular driver. Replace the nonsensical state names with proper ones while at it. Fixes: 77c34ef1c319 ("perf/x86/intel/cstate: Convert Intel CSTATE to hotplug state machine") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org 25 December 2016, 09:47:40 UTC
a051f22 ARM/imx/mmcd: Fix broken cpu hotplug handling The cpu hotplug support of this perf driver is broken in several ways: 1) It adds a instance before setting up the state. 2) The state for the instance is different from the state of the callback. It's just a randomly chosen state. 3) The instance registration is not error checked so nobody noticed that the call can never succeed. 4) The state for the multi install callbacks is chosen randomly and overwrites existing state. This is now prevented by the core code so the call is guaranteed to fail. 5) The error exit path in the init function leaves the instance registered and then frees the memory which contains the enqueued hlist node. 6) The remove function is removing the state and not the instance. Fix it by: - Setting up the state before adding instances. Use a dynamically allocated state for it. - Installing instances after the state has been set up - Removing the instance in the error path before freeing memory - Removing the instance not the state in the driver remove callback While at is use raw_cpu_processor_id(), because cpu_processor_id() cannot be used in preemptible context, and set the driver data after successful registration of the pmu. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Shawn Guo <shawnguo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Frank Li <frank.li@nxp.com> Cc: Zhengyu Shen <zhengyu.shen@nxp.com> Link: http://lkml.kernel.org/r/20161221192111.596204211@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 25 December 2016, 09:47:40 UTC
a98d1a0 scsi: qedi: Convert to hotplug state machine The CPU hotplug code is a trainwreck. It leaks a notifier in case of driver registration error and the per cpu loop is racy against cpu hotplug. Aside of that the driver should have been written and merged with the new state machine interfaces in the first place. Mop up the mess and Convert it to the hotplug state machine. Signed-off-by: Thomas Grumpy Gleixner <tglx@linutronix.de> Cc: Nilesh Javali <nilesh.javali@cavium.com> Cc: Adheer Chandravanshi <adheer.chandravanshi@qlogic.com> Cc: Chad Dupuis <chad.dupuis@cavium.com> Cc: Saurav Kashyap <saurav.kashyap@cavium.com> Cc: Arun Easi <arun.easi@cavium.com> Cc: Manish Rangankar <manish.rangankar@cavium.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> 25 December 2016, 09:47:18 UTC
6886fee tools/power turbostat: remove obsolete -M, -m, -C, -c options The new --add option has replaced the -M, -m, -C, -c options Eg. -M 0x10 is now --add msr0x10,raw -m 0x10 is now --add msr0x10,raw,u32 -C 0x10 is now --add msr0x10,delta -c 0x10 is now --add msr0x10,delta,u32 The --add option can be repeated to add any number of counters, while the previous options were limited to adding one of each type. In addition, the --add option can accept a column label, and can also display a counter as a percentage of elapsed cycles. Eg. --add msr0x3fe,core,percent,MY_CC3 Signed-off-by: Len Brown <len.brown@intel.com> 24 December 2016, 20:38:09 UTC
388e9c8 tools/power turbostat: Make extensible via the --add parameter Create the "--add" parameter. This can be used to teach an existing turbostat binary about any number of any type of counter. turbostat(8) details the syntax for --add. Signed-off-by: Len Brown <len.brown@intel.com> 24 December 2016, 20:16:10 UTC
7c0f6ba Replace <asm/uaccess.h> with <linux/uaccess.h> globally This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 24 December 2016, 19:46:01 UTC
1dd5c6b Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "This ncludes various cifs/smb3 bug fixes, mostly for stable as well. In the next week I expect that Germano will have some reconnection fixes, and also I expect to have the remaining pieces of the snapshot enablement and SMB3 ACLs, but wanted to get this set of bug fixes in" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs_get_root shouldn't use path with tree name Fix default behaviour for empty domains and add domainauto option cifs: use %16phN for formatting md5 sum cifs: Fix smbencrypt() to stop pointing a scatterlist at the stack CIFS: Fix a possible double locking of mutex during reconnect CIFS: Fix a possible memory corruption during reconnect CIFS: Fix a possible memory corruption in push locks CIFS: Fix missing nls unload in smb2_reconnect() CIFS: Decrease verbosity of ioctl call SMB3: parsing for new snapshot timestamp mount parm 24 December 2016, 19:37:18 UTC
3a77fa8 Merge tag 'watchdog-for-linus-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull watchdog updates from Wim Van Sebroeck and Guenter Roeck: - new driver for Add Loongson1 SoC - minor cleanup and fixes in various drivers * tag 'watchdog-for-linus-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: watchdog: it87_wdt: add IT8620E ID watchdog: mpc8xxx: Remove unneeded linux/miscdevice.h include watchdog: octeon: Remove unneeded linux/miscdevice.h include watchdog: bcm2835_wdt: set WDOG_HW_RUNNING bit when appropriate watchdog: loongson1: Add Loongson1 SoC watchdog driver watchdog: cpwd: remove memory allocate failure message watchdog: da9062/61: watchdog driver intel-mid_wdt: Error code is just an integer intel-mid_wdt: make sure watchdog is not running at startup watchdog: mei_wdt: request stop on reboot to prevent false positive event watchdog: hpwdt: changed maintainer information watchdog: jz4740: Fix modular build watchdog: qcom: fix kernel panic due to external abort on non-linefetch watchdog: davinci: add support for deferred probing watchdog: meson: Remove unneeded platform MODULE_ALIAS watchdog: Standardize leading tabs and spaces in Kconfig file watchdog: max77620_wdt: fix module autoload watchdog: bcm7038_wdt: fix module autoload 24 December 2016, 19:27:45 UTC
01e0d60 Merge tag 'ntb-4.10' of git://github.com/jonmason/ntb Pull NTB update from Jon Mason: - NTB bug fixes for removing an unnecessary call to ntb_peer_spad_read, and correcting a free_irq inconsistency - add Intel SKX support - change the AMD NTB maintainer, and fix some bugs present there * tag 'ntb-4.10' of git://github.com/jonmason/ntb: ntb_transport: Remove unnecessary call to ntb_peer_spad_read NTB: Fix 'request_irq()' and 'free_irq()' inconsistancy ntb: fix SKX NTB config space size register offsets NTB: correct ntb_peer_spad_read for case when callback is not supplied. MAINTAINERS: Change in maintainer for AMD NTB ntb_transport: Limit memory windows based on available, scratchpads NTB: Register and offset values fix for memory window NTB: add support for hotplug feature ntb: Adding Skylake Xeon NTB support 24 December 2016, 19:23:24 UTC
6ac3bb1 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "There's a number of fixes: - a round of fixes for CPUID-less legacy CPUs - a number of microcode loader fixes - i8042 detection robustization fixes - stack dump/unwinder fixes - x86 SoC platform driver fixes - a GCC 7 warning fix - virtualization related fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) Revert "x86/unwind: Detect bad stack return address" x86/paravirt: Mark unused patch_default label x86/microcode/AMD: Reload proper initrd start address x86/platform/intel/quark: Add printf attribute to imr_self_test_result() x86/platform/intel-mid: Switch MPU3050 driver to IIO x86/alternatives: Do not use sync_core() to serialize I$ x86/topology: Document cpu_llc_id x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic x86/asm: Rewrite sync_core() to use IRET-to-self x86/microcode/intel: Replace sync_core() with native_cpuid() Revert "x86/boot: Fail the boot if !M486 and CPUID is missing" x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6 x86/tools: Fix gcc-7 warning in relocs.c x86/unwind: Dump stack data on warnings x86/unwind: Adjust last frame check for aligned function stacks x86/init: Fix a couple of comment typos x86/init: Remove i8042_detect() from platform ops Input: i8042 - Trust firmware a bit more when probing on X86 x86/init: Add i8042 state to the platform data ... 24 December 2016, 00:54:46 UTC
eb3e8d9 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "ARM/MOXA SoC clocksource driver fixes" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/moxart: Plug memory and mapping leaks 24 December 2016, 00:51:16 UTC
00198da Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "On the kernel side there's two x86 PMU driver fixes and a uprobes fix, plus on the tooling side there's a number of fixes and some late updates" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) perf sched timehist: Fix invalid period calculation perf sched timehist: Remove hardcoded 'comm_width' check at print_summary perf sched timehist: Enlarge default 'comm_width' perf sched timehist: Honour 'comm_width' when aligning the headers perf/x86: Fix overlap counter scheduling bug perf/x86/pebs: Fix handling of PEBS buffer overflows samples/bpf: Move open_raw_sock to separate header samples/bpf: Remove perf_event_open() declaration samples/bpf: Be consistent with bpf_load_program bpf_insn parameter tools lib bpf: Add bpf_prog_{attach,detach} samples/bpf: Switch over to libbpf perf diff: Do not overwrite valid build id perf annotate: Don't throw error for zero length symbols perf bench futex: Fix lock-pi help string perf trace: Check if MAP_32BIT is defined (again) samples/bpf: Make perf_event_read() static uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation samples/bpf: Make samples more libbpf-centric tools lib bpf: Add flags to bpf_create_map() tools lib bpf: use __u32 from linux/types.h ... 24 December 2016, 00:49:12 UTC
9004fda Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "A build warning fix with certain .config's" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/st: Mark st_irq_syscfg_resume() __maybe_unused 24 December 2016, 00:47:25 UTC
eb9def6 net/mlx4_en: Fix user prio field in XDP forward The user prio field is wrong (and overflows) in the XDP forward flow. This is a result of a bad value for num_tx_rings_p_up, which should account all XDP TX rings, as they operate for the same user prio. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 December 2016, 22:53:47 UTC
693c564 tipc: don't send FIN message from connectionless socket In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the check for socket type is in the wrong place, causing a closing socket to always send out a FIN message even when the socket was never connected. This is normally harmless, since the destination node for such messages most often is zero, and the message will be dropped, but it is still a wrong and confusing behavior. We fix this in this commit. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 December 2016, 22:53:47 UTC
e252536 ipvlan: fix multicast processing In an IPvlan setup when master is set in loopback mode e.g. ethtool -K eth0 set loopback on where eth0 is master device for IPvlan setup. The failure is caused by the faulty logic that determines if the packet is from TX-path vs. RX-path by just looking at the mac- addresses on the packet while processing multicast packets. In the loopback-mode where this crash was happening, the packets that are sent out are reflected by the NIC and are processed on the RX path, but mac-address check tricks into thinking this packet is from TX path and falsely uses dev_forward_skb() to pass packets to the slave (virtual) devices. This patch records the path while queueing packets and eliminates logic of looking at mac-addresses for the same decision. ------------[ cut here ]------------ kernel BUG at include/linux/skbuff.h:1737! Call Trace: [<ffffffff921fbbc2>] dev_forward_skb+0x92/0xd0 [<ffffffffc031ac65>] ipvlan_process_multicast+0x395/0x4c0 [ipvlan] [<ffffffffc031a9a7>] ? ipvlan_process_multicast+0xd7/0x4c0 [ipvlan] [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91cdff09>] process_one_work+0x1a9/0x660 [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91ce086d>] worker_thread+0x11d/0x360 [<ffffffff91ce0750>] ? rescuer_thread+0x350/0x350 [<ffffffff91ce960b>] kthread+0xdb/0xe0 [<ffffffff91c05c70>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 [<ffffffff92348b7a>] ret_from_fork+0x9a/0xd0 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 December 2016, 22:53:47 UTC
b1227d0 ipvlan: fix various issues in ipvlan_process_multicast() 1) netif_rx() / dev_forward_skb() should not be called from process context. 2) ipvlan_count_rx() should be called with preemption disabled. 3) We should check if ipvlan->dev is up before feeding packets to netif_rx() 4) We need to prevent device from disappearing if some packets are in the multicast backlog. 5) One kfree_skb() should be a consume_skb() eventually Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 December 2016, 22:53:47 UTC
dfb7d24 ntb_transport: Remove unnecessary call to ntb_peer_spad_read The results were previously ignored, anyway. Signed-off-by: Steve Wahl <Steve.Wahl@dell.com> Fixes: e26a5843f7f5014ae4460030ca4de029a3ac35d3 Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> 23 December 2016, 21:11:07 UTC
28734e8 NTB: Fix 'request_irq()' and 'free_irq()' inconsistancy 'request_irq()' and 'free_irq()' should have the same 'dev_id'. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> 23 December 2016, 21:11:03 UTC
09e71a6 ntb: fix SKX NTB config space size register offsets The offsets for the SZ registers are wrong. Updated. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reported-by: Sandeep Mann <sandeep@purestorage.com> Tested-by: Zachary Ross <zacharyx.ross@intel.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> 23 December 2016, 21:10:54 UTC
5c43c52 NTB: correct ntb_peer_spad_read for case when callback is not supplied. Correct ntb_peer_spad_read for case when callback is not supplied Signed-off-by: Steve Wahl <Steve.Wahl@dell.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> 23 December 2016, 21:10:50 UTC
bc034e5 MAINTAINERS: Change in maintainer for AMD NTB I would like to take maintainership for AMD NTB Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Acked-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> 23 December 2016, 21:10:42 UTC
b17faba ntb_transport: Limit memory windows based on available, scratchpads When the underlying NTB H/W driver advertises more memory windows than the number of scratchpads available to setup MW's, it is likely that we may end up filling the remaining memory windows with garbage. So to avoid that, lets limit the memory windows that transport driver can setup based on the available scratchpads. Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> 23 December 2016, 21:10:35 UTC
872deb2 NTB: Register and offset values fix for memory window Due to incorrect limit and translation register values, NTB link was going down when the memory window was setup. Made appropriate changes as per spec. Fix limit register values for BAR1, which was overlapping with the BAR23 address. Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> 23 December 2016, 21:09:18 UTC
e5b0d2d NTB: add support for hotplug feature AMD NTB support hotplug under B2B mode. NTB will trigger link up/down interrupt event when doing plug add/remove, this patch implements the two interrupt event to support B2B hotplug function. Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com> Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> 23 December 2016, 21:09:15 UTC
783dfa6 ntb: Adding Skylake Xeon NTB support The Skylake Xeon NTB hardware has made some changes to the register name, offset, and the way doorbells work. Adding driver support for the new hardware. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us> 23 December 2016, 21:09:10 UTC
c280f77 Revert "x86/unwind: Detect bad stack return address" Revert the following commit: b6959a362177 ("x86/unwind: Detect bad stack return address") ... because Andrey Konovalov reported an unwinder warning: WARNING: unrecognized kernel stack return address ffffffffa0000001 at ffff88006377fa18 in a.out:4467 The unwind was initiated from an interrupt which occurred while running in the generated code for a kprobe. The unwinder printed the warning because it expected regs->ip to point to a valid text address, but instead it pointed to the generated code. Eventually we may want come up with a way to identify generated kprobe code so the unwinder can know that it's a valid return address. Until then, just remove the warning. Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/02f296848fbf49fb72dfeea706413ecbd9d4caf6.1482418739.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> 23 December 2016, 19:32:30 UTC
3705b97 Merge tag 'perf-urgent-for-mingo-20161222' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: Fixes for 'perf sched timehist': (Namhyung Kim) - Define a larger initial alignment value for the COMM column and make it be more consistently honoured, for instance in the header. - Fix invalid period calculation when using the --time option to select a time slice, when events outside that slice were being considered for the per cpu idle stats summary. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> 23 December 2016, 19:23:29 UTC
back to top