sort by:
Revision Author Date Message Commit Date
d33a2a9 Merge tag 'pm-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two issues related to exposing the current CPU frequency to user space on x86. Specifics: - Disable interrupts around reading IA32_APERF and IA32_MPERF in aperfmperf_snapshot_khz() (introduced recently) to avoid excessive delays between the reads that may result from interrupt handling (Doug Smythies). - Fix the computation of the CPU frequency to be reported through the pstate_sample tracepoint in intel_pstate (Doug Smythies)" * tag 'pm-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: x86: Disable interrupts during MSRs reading cpufreq: intel_pstate: report correct CPU frequencies during trace 17 August 2017, 21:21:18 UTC
440105d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: elan_i2c - Add antoher Lenovo ACPI ID for upcoming Lenovo NB Input: elan_i2c - add ELAN0608 to the ACPI table Input: trackpoint - assume 3 buttons when buttons detection fails 17 August 2017, 20:45:44 UTC
8179962 Merge branches 'intel_pstate-fix' and 'cpufreq-x86-fix' * intel_pstate-fix: cpufreq: intel_pstate: report correct CPU frequencies during trace * cpufreq-x86-fix: cpufreq: x86: Disable interrupts during MSRs reading 17 August 2017, 19:00:30 UTC
3bc6c90 Merge branch 'parisc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: - Fix PCI memory bar assignments with 64-bit kernels on machines with Dino/Cujo PCI chipsets. This makes PCI graphic cards work on such machines (from Thomas Bogendoerfer). - Fix documentation to be more clear about the difference between %pF and %pS printk format usage. There are still many places in the kernel which have it wrong (from Petr Mladek, Sergey Senozhatsky & me). * 'parisc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: printk-formats.txt: Better describe the difference between %pS and %pF parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo 17 August 2017, 18:39:54 UTC
99f781b Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fix from Jan Kara: "A fix of a check for quota limit" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: correct space limit check 17 August 2017, 16:26:10 UTC
c8c03f1 pty: fix the cached path of the pty slave file descriptor in the master Christian Brauner reported that if you use the TIOCGPTPEER ioctl() to get a slave pty file descriptor, the resulting file descriptor doesn't look right in /proc/<pid>/fd/<fd>. In particular, he wanted to use readlink() on /proc/self/fd/<fd> to get the pathname of the slave pty (basically implementing "ptsname{_r}()"). The reason for that was that we had generated the wrong 'struct path' when we create the pty in ptmx_open(). In particular, the dentry was correct, but the vfsmount pointed to the mount of the ptmx node. That _can_ be correct - in case you use "/dev/pts/ptmx" to open the master - but usually is not. The normal case is to use /dev/ptmx, which then looks up the pts/ directory, and then the vfsmount of the ptmx node is obviously the /dev directory, not the /dev/pts/ directory. We actually did have the right vfsmount available, but in the wrong place (it gets looked up in 'devpts_acquire()' when we get a reference to the pts filesystem), and so ptmx_open() used the wrong mnt pointer. The end result of this confusion was that the pty worked fine, but when if you did TIOCGPTPEER to get the slave side of the pty, end end result would also work, but have that dodgy 'struct path'. And then when doing "d_path()" on to get the pathname, the vfsmount would not match the root of the pts directory, and d_path() would return an empty pathname thinking that the entry had escaped a bind mount into another mount. This fixes the problem by making devpts_acquire() return the vfsmount for the pts filesystem, allowing ptmx_open() to trivially just use the right mount for the pts dentry, and create the proper 'struct path'. Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 17 August 2017, 16:10:48 UTC
ac9a409 Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "A couple of minor fixes (st, ses) and some bigger driver fixes for qla2xxx (crash triggered by fw dump) and ipr (lockdep problems with mq)" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ses: Fix wrong page error scsi: ipr: Fix scsi-mq lockdep issue scsi: st: fix blk_get_queue usage scsi: qla2xxx: Fix system crash while triggering FW dump 17 August 2017, 00:21:20 UTC
422ce07 Merge tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fixes from Paul Moore: "Two small fixes to the audit code, both explained well in the respective patch descriptions, but the quick summary is one use-after-free fix, and one silly fanotify notification flag fix" * tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: Receive unmount event audit: Fix use after free in audit_remove_watch_rule() 16 August 2017, 23:48:34 UTC
d6957f3 printk-formats.txt: Better describe the difference between %pS and %pF Sometimes people seems unclear when to use the %pS or %pF printk format. For example, see commit 51d96dc2e2dc ("random: fix warning message on ia64 and parisc") which fixed such a wrong format string. The documentation should be more clear about the difference. Signed-off-by: Helge Deller <deller@gmx.de> [pmladek@suse.com: Restructure the entire section] Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de> 16 August 2017, 19:09:45 UTC
4098116 parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo For 64bit kernels the lmmio_space_offset of the host bridge window isn't set correctly on systems with dino/cujo PCI host bridges. This leads to not assigned memory bars and failing drivers, which need to use these bars. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: <stable@vger.kernel.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de> 16 August 2017, 07:50:39 UTC
510c8a8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Fix TCP checksum offload handling in iwlwifi driver, from Emmanuel Grumbach. 2) In ksz DSA tagging code, free SKB if skb_put_padto() fails. From Vivien Didelot. 3) Fix two regressions with bonding on wireless, from Andreas Born. 4) Fix build when busypoll is disabled, from Daniel Borkmann. 5) Fix copy_linear_skb() wrt. SO_PEEK_OFF, from Eric Dumazet. 6) Set SKB cached route properly in inet_rtm_getroute(), from Florian Westphal. 7) Fix PCI-E relaxed ordering handling in cxgb4 driver, from Ding Tianhong. 8) Fix module refcnt leak in ULP code, from Sabrina Dubroca. 9) Fix use of GFP_KERNEL in atomic contexts in AF_KEY code, from Eric Dumazet. 10) Need to purge socket write queue in dccp_destroy_sock(), also from Eric Dumazet. 11) Make bpf_trace_printk() work properly on 32-bit architectures, from Daniel Borkmann. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits) bpf: fix bpf_trace_printk on 32 bit archs PCI: fix oops when try to find Root Port for a PCI device sfc: don't try and read ef10 data on non-ef10 NIC net_sched: remove warning from qdisc_hash_add net_sched/sfq: update hierarchical backlog when drop packet net_sched: reset pointers to tcf blocks in classful qdiscs' destructors ipv4: fix NULL dereference in free_fib_info_rcu() net: Fix a typo in comment about sock flags. ipv6: fix NULL dereference in ip6_route_dev_notify() tcp: fix possible deadlock in TCP stack vs BPF filter dccp: purge write queue in dccp_destroy_sock() udp: fix linear skb reception with PEEK_OFF ipv6: release rt6->rt6i_idev properly during ifdown af_key: do not use GFP_KERNEL in atomic contexts tcp: ulp: avoid module refcnt leak in tcp_set_ulp net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag PCI: Disable Relaxed Ordering Attributes for AMD A1100 PCI: Disable Relaxed Ordering for some Intel processors PCI: Disable PCIe Relaxed Ordering if unsupported ... 16 August 2017, 01:52:28 UTC
88a5c69 bpf: fix bpf_trace_printk on 32 bit archs James reported that on MIPS32 bpf_trace_printk() is currently broken while MIPS64 works fine: bpf_trace_printk() uses conditional operators to attempt to pass different types to __trace_printk() depending on the format operators. This doesn't work as intended on 32-bit architectures where u32 and long are passed differently to u64, since the result of C conditional operators follows the "usual arithmetic conversions" rules, such that the values passed to __trace_printk() will always be u64 [causing issues later in the va_list handling for vscnprintf()]. For example the samples/bpf/tracex5 test printed lines like below on MIPS32, where the fd and buf have come from the u64 fd argument, and the size from the buf argument: [...] 1180.941542: 0x00000001: write(fd=1, buf= (null), size=6258688) Instead of this: [...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512) One way to get it working is to expand various combinations of argument types into 8 different combinations for 32 bit and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64 as well that it resolves the issue. Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()") Reported-by: James Hogan <james.hogan@imgtec.com> Tested-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> 16 August 2017, 00:32:15 UTC
0e40523 PCI: fix oops when try to find Root Port for a PCI device Eric report a oops when booting the system after applying the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."): [ 4.241029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 4.247001] IP: pci_find_pcie_root_port+0x62/0x80 [ 4.253011] PGD 0 [ 4.253011] P4D 0 [ 4.253011] [ 4.258013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC [ 4.262015] Modules linked in: [ 4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316 [ 4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016 [ 4.279002] task: ffffa2ee38cfa040 task.stack: ffffa51ec0004000 [ 4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80 [ 4.290012] RSP: 0000:ffffa51ec0007ab8 EFLAGS: 00010246 [ 4.295003] RAX: 0000000000000000 RBX: ffffa2ee36bae000 RCX: 0000000000000006 [ 4.303002] RDX: 000000000000081c RSI: ffffa2ee38cfa8c8 RDI: ffffa2ee36bae000 [ 4.310013] RBP: ffffa51ec0007b58 R08: 0000000000000001 R09: 0000000000000000 [ 4.317001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa51ec0007ad0 [ 4.324005] R13: ffffa2ee36bae098 R14: 0000000000000002 R15: ffffa2ee37204818 [ 4.331002] FS: 0000000000000000(0000) GS:ffffa2ee3fcc0000(0000) knlGS:0000000000000000 [ 4.339002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.345001] CR2: 0000000000000050 CR3: 000000401000f000 CR4: 00000000001406e0 [ 4.351002] Call Trace: [ 4.354012] ? pci_configure_device+0x19f/0x570 [ 4.359002] ? pci_conf1_read+0xb8/0xf0 [ 4.363002] ? raw_pci_read+0x23/0x40 [ 4.366011] ? pci_read+0x2c/0x30 [ 4.370014] ? pci_read_config_word+0x67/0x70 [ 4.374012] pci_device_add+0x28/0x230 [ 4.378012] ? pci_vpd_f0_read+0x50/0x80 [ 4.382014] pci_scan_single_device+0x96/0xc0 [ 4.386012] pci_scan_slot+0x79/0xf0 [ 4.389001] pci_scan_child_bus+0x31/0x180 [ 4.394014] acpi_pci_root_create+0x1c6/0x240 [ 4.398013] pci_acpi_scan_root+0x15f/0x1b0 [ 4.402012] acpi_pci_root_add+0x2e6/0x400 [ 4.406012] ? acpi_evaluate_integer+0x37/0x60 [ 4.411002] acpi_bus_attach+0xdf/0x200 [ 4.415002] acpi_bus_attach+0x6a/0x200 [ 4.418014] acpi_bus_attach+0x6a/0x200 [ 4.422013] acpi_bus_scan+0x38/0x70 [ 4.426011] acpi_scan_init+0x10c/0x271 [ 4.429001] acpi_init+0x2fa/0x348 [ 4.433004] ? acpi_sleep_proc_init+0x2d/0x2d [ 4.437001] do_one_initcall+0x43/0x169 [ 4.441001] kernel_init_freeable+0x1d0/0x258 [ 4.445003] ? rest_init+0xe0/0xe0 [ 4.449001] kernel_init+0xe/0x150 ====================== cut here ============================= It looks like the pci_find_pcie_root_port() was trying to find the Root Port for the PCI device which is the Root Port already, it will return NULL and trigger the problem, so check the highest_pcie_bridge to fix thie problem. Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported") Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 16 August 2017, 00:25:16 UTC
61deee9 sfc: don't try and read ef10 data on non-ef10 NIC The MAC stats command takes a port ID, which doesn't exist on pre-ef10 NICs (5000- and 6000- series). This is extracted from the NIC specific data; we misinterpret this as the ef10 data structure, causing us to read potentially unallocated data. With a KASAN kernel this can cause errors with: BUG: KASAN: slab-out-of-bounds in efx_mcdi_mac_stats Fixes: 0a2ab4d988d7 ("sfc: set the port-id when calling MC_CMD_MAC_STATS") Reported-by: Stefano Brivio <sbrivio@redhat.com> Tested-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> 16 August 2017, 00:19:34 UTC
c90e951 net_sched: remove warning from qdisc_hash_add It was added in commit e57a784d8cae ("pkt_sched: set root qdisc before change() in attach_default_qdiscs()") to hide duplicates from "tc qdisc show" for incative deivices. After 59cc1f61f ("net: sched: convert qdisc linked list to hashtable") it triggered when classful qdisc is added to inactive device because default qdiscs are added before switching root qdisc. Anyway after commit ea3274695353 ("net: sched: avoid duplicates in qdisc dump") duplicates are filtered right in dumper. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net> 16 August 2017, 00:16:39 UTC
325d5dc net_sched/sfq: update hierarchical backlog when drop packet When sfq_enqueue() drops head packet or packet from another queue it have to update backlog at upper qdiscs too. Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 16 August 2017, 00:16:39 UTC
8989042 net_sched: reset pointers to tcf blocks in classful qdiscs' destructors Traffic filters could keep direct pointers to classes in classful qdisc, thus qdisc destruction first removes all filters before freeing classes. Class destruction methods also tries to free attached filters but now this isn't safe because tcf_block_put() unlike to tcf_destroy_chain() cannot be called second time. This patch set class->block to NULL after first tcf_block_put() and turn second call into no-op. Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> 16 August 2017, 00:16:39 UTC
187e5b3 ipv4: fix NULL dereference in free_fib_info_rcu() If fi->fib_metrics could not be allocated in fib_create_info() we attempt to dereference a NULL pointer in free_fib_info_rcu() : m = fi->fib_metrics; if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt)) kfree(m); Before my recent patch, we used to call kfree(NULL) and nothing wrong happened. Instead of using RCU to defer freeing while we are under memory stress, it seems better to take immediate action. This was reported by syzkaller team. Fixes: 3fb07daff8e9 ("ipv4: add reference counting to metrics") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 16 August 2017, 00:07:52 UTC
b3dc8f7 net: Fix a typo in comment about sock flags. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 16 August 2017, 00:07:17 UTC
12d94a8 ipv6: fix NULL dereference in ip6_route_dev_notify() Based on a syzkaller report [1], I found that a per cpu allocation failure in snmp6_alloc_dev() would then lead to NULL dereference in ip6_route_dev_notify(). It seems this is a very old bug, thus no Fixes tag in this submission. Let's add in6_dev_put_clear() helper, as we will probably use it elsewhere (once available/present in net-next) [1] kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff88019f456680 task.stack: ffff8801c6e58000 RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline] RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline] RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202 RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000 RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001 RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37 R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8 FS: 00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0 DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Call Trace: refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211 in6_dev_put include/net/addrconf.h:335 [inline] ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732 notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93 __raw_notifier_call_chain kernel/notifier.c:394 [inline] raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678 call_netdevice_notifiers net/core/dev.c:1694 [inline] rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107 rollback_registered+0x1be/0x3c0 net/core/dev.c:7149 register_netdevice+0xbcd/0xee0 net/core/dev.c:7587 register_netdev+0x1a/0x30 net/core/dev.c:7669 loopback_net_init+0x76/0x160 drivers/net/loopback.c:214 ops_init+0x10a/0x570 net/core/net_namespace.c:118 setup_net+0x313/0x710 net/core/net_namespace.c:294 copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418 create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206 SYSC_unshare kernel/fork.c:2347 [inline] SyS_unshare+0x653/0xfa0 kernel/fork.c:2297 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4512c9 RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3 f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6 0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85 RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP: ffff8801c6e5f1b0 RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP: ffff8801c6e5f1b0 RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP: ffff8801c6e5f1b0 ---[ end trace e441d046c6410d31 ]--- Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 16 August 2017, 00:06:34 UTC
b5fed47 audit: Receive unmount event Although audit_watch_handle_event() can handle FS_UNMOUNT event, it is not part of AUDIT_FS_WATCH mask and thus such event never gets to audit_watch_handle_event(). Thus fsnotify marks are deleted by fsnotify subsystem on unmount without audit being notified about that which leads to a strange state of existing audit rules with dead fsnotify marks. Add FS_UNMOUNT to the mask of events to be received so that audit can clean up its state accordingly. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Paul Moore <paul@paul-moore.com> 15 August 2017, 20:03:00 UTC
d76036a audit: Fix use after free in audit_remove_watch_rule() audit_remove_watch_rule() drops watch's reference to parent but then continues to work with it. That is not safe as parent can get freed once we drop our reference. The following is a trivial reproducer: mount -o loop image /mnt touch /mnt/file auditctl -w /mnt/file -p wax umount /mnt auditctl -D <crash in fsnotify_destroy_mark()> Grab our own reference in audit_remove_watch_rule() earlier to make sure mark does not get freed under us. CC: stable@vger.kernel.org Reported-by: Tony Jones <tonyj@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Tested-by: Tony Jones <tonyj@suse.de> Signed-off-by: Paul Moore <paul@paul-moore.com> 15 August 2017, 19:58:17 UTC
40c6d1b Merge tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "This update consists of important compile and run-time error fixes to timers/freq-step, kmod, and sysctl tests" * tag 'linux-kselftest-4.13-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: timers: freq-step: fix compile error selftests: futex: fix run_tests target test_sysctl: fix sysctl.sh by making it executable test_kmod: fix kmod.sh by making it executable 15 August 2017, 19:49:43 UTC
0a6f041 Merge tag 'wireless-drivers-for-davem-2017-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.13 This time quite a few fixes for iwlwifi and one major regression fix for brcmfmac. For the iwlwifi aggregation bug a small change was needed for mac80211, but as Johannes is still away the mac80211 patch is taken via wireless-drivers tree. brcmfmac * fix firmware crash (a recent regression in bcm4343{0,1,8} iwlwifi * Some simple PCI HW ID fix-ups and additions for family 9000 * Remove a bogus warning message with new FWs (bug #196915) * Don't allow illegal channel options to be used (bug #195299) * A fix for checksum offload in family 9000 * A fix serious throughput degradation in 11ac with multiple streams * An old bug in SMPS where the firmware was not aware of SMPS changes * Fix a memory leak in the SAR code * Fix a stuck queue case in AP mode; * Convert a WARN to a simple debug in a legitimate race case (from which we can recover) * Fix a severe throughput aggregation on 9000-family devices due to aggregation issues, needed a small change in mac80211 ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 17:19:14 UTC
d624d27 tcp: fix possible deadlock in TCP stack vs BPF filter Filtering the ACK packet was not put at the right place. At this place, we already allocated a child and put it into accept queue. We absolutely need to call tcp_child_process() to release its spinlock, or we will deadlock at accept() or close() time. Found by syzkaller team (Thanks a lot !) Fixes: 8fac365f63c8 ("tcp: Add a tcp_filter hook before handle ack packet") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:31:27 UTC
7749d4f dccp: purge write queue in dccp_destroy_sock() syzkaller reported that DCCP could have a non empty write queue at dismantle time. WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 panic+0x1e4/0x417 kernel/panic.c:180 __warn+0x1c4/0x1d9 kernel/panic.c:541 report_bug+0x211/0x2d0 lib/bug.c:183 fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190 do_trap_no_signal arch/x86/kernel/traps.c:224 [inline] do_trap+0x260/0x390 arch/x86/kernel/traps.c:273 do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323 invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846 RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199 RSP: 0018:ffff8801d182f108 EFLAGS: 00010297 RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280 RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0 R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8 inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835 dccp_close+0x84d/0xc10 net/dccp/proto.c:1067 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425 sock_release+0x8d/0x1e0 net/socket.c:597 sock_close+0x16/0x20 net/socket.c:1126 __fput+0x327/0x7e0 fs/file_table.c:210 ____fput+0x15/0x20 fs/file_table.c:246 task_work_run+0x18a/0x260 kernel/task_work.c:116 exit_task_work include/linux/task_work.h:21 [inline] do_exit+0xa32/0x1b10 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:969 get_signal+0x7e8/0x17e0 kernel/signal.c:2330 do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808 exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157 prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline] syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:28:18 UTC
42b7305 udp: fix linear skb reception with PEEK_OFF copy_linear_skb() is broken; both of its callers actually expect 'len' to be the amount we are trying to copy, not the offset of the end. Fix it keeping the meanings of arguments in sync with what the callers (both of them) expect. Also restore a saner behavior on EFAULT (i.e. preserving the iov_iter position in case of failure): The commit fd851ba9caa9 ("udp: harden copy_linear_skb()") avoids the more destructive effect of the buggy copy_linear_skb(), e.g. no more invalid memory access, but said function still behaves incorrectly: when peeking with offset it can fail with EINVAL instead of copying the appropriate amount of memory. Reported-by: Sasha Levin <alexander.levin@verizon.com> Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue") Fixes: fd851ba9caa9 ("udp: harden copy_linear_skb()") Signed-off-by: Al Viro <viro@ZenIV.linux.org.uk> Acked-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Sasha Levin <alexander.levin@verizon.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:26:51 UTC
e5645f5 ipv6: release rt6->rt6i_idev properly during ifdown When a dst is created by addrconf_dst_alloc() for a host route or an anycast route, dst->dev points to loopback dev while rt6->rt6i_idev points to a real device. When the real device goes down, the current cleanup code only checks for dst->dev and assumes rt6->rt6i_idev->dev is the same. This causes the refcount leak on the real device in the above situation. This patch makes sure to always release the refcount taken on rt6->rt6i_idev during dst_dev_put(). Fixes: 587fea741134 ("ipv6: mark DST_NOGC and remove the operation of dst_free()") Reported-by: John Stultz <john.stultz@linaro.org> Tested-by: John Stultz <john.stultz@linaro.org> Tested-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:18:48 UTC
36f41f8 af_key: do not use GFP_KERNEL in atomic contexts pfkey_broadcast() might be called from non process contexts, we can not use GFP_KERNEL in these cases [1]. This patch partially reverts commit ba51b6be38c1 ("net: Fix RCU splat in af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock() section. [1] : syzkaller reported : in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439 3 locks held by syzkaller183439/2932: #0: (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649 #1: (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293 #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline] #2: (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028 CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994 __might_sleep+0x95/0x190 kernel/sched/core.c:5947 slab_pre_alloc_hook mm/slab.h:416 [inline] slab_alloc mm/slab.c:3383 [inline] kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559 skb_clone+0x1a0/0x400 net/core/skbuff.c:1037 pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207 pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281 dump_sp+0x3d6/0x500 net/key/af_key.c:2685 xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042 pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695 pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299 pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722 pfkey_process+0x606/0x710 net/key/af_key.c:2814 pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 ___sys_sendmsg+0x755/0x890 net/socket.c:2035 __sys_sendmsg+0xe5/0x210 net/socket.c:2069 SYSC_sendmsg net/socket.c:2080 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2076 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x445d79 RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79 RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008 RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700 R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000 R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000 Fixes: ba51b6be38c1 ("net: Fix RCU splat in af_key") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: David Ahern <dsa@cumulusnetworks.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:18:12 UTC
539a06b tcp: ulp: avoid module refcnt leak in tcp_set_ulp __tcp_ulp_find_autoload returns tcp_ulp_ops after taking a reference on the module. Then, if ->init fails, tcp_set_ulp propagates the error but nothing releases that reference. Fixes: 734942cc4ea6 ("tcp: ULP infrastructure") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:17:05 UTC
bae514a Merge branch 'Add-new-PCI_DEV_FLAGS_NO_RELAXED_ORDERING-flag' Ding Tianhong says: ==================== Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Some devices have problems with Transaction Layer Packets with the Relaxed Ordering Attribute set. This patch set adds a new PCIe Device Flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known devices with Relaxed Ordering issues, and a use of this new flag by the cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex Ports. It's been years since I've submitted kernel.org patches, I appolgise for the almost certain submission errors. v2: Alexander point out that the v1 was only a part of the whole solution, some platform which has some issues could use the new flag to indicate that it is not safe to enable relaxed ordering attribute, then we need to clear the relaxed ordering enable bits in the PCI configuration when initializing the device. So add a new second patch to modify the PCI initialization code to clear the relaxed ordering enable bit in the event that the root complex doesn't want relaxed ordering enabled. The third patch was base on the v1's second patch and only be changed to query the relaxed ordering enable bit in the PCI configuration space to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes set. This version didn't plan to drop the defines for Intel Drivers to use the new checking way to enable relaxed ordering because it is not the hardest part of the moment, we could fix it in next patchset when this patches reach the goal. v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration, If a PCIe device didn't enable the relaxed ordering attribute default, we should not do anything in the PCIe configuration, otherwise we should check if any of the devices above us do not support relaxed ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on the result if we get a return that indicate that the relaxed ordering is not supported we should update our device to disable relaxed ordering in configuration space. If the device above us doesn't exist or isn't the PCIe device, we shouldn't do anything and skip updating relaxed ordering because we are probably running in a guest. v4: Rename the functions pcie_get_relaxed_ordering and pcie_disable_relaxed_ordering according John's suggestion, and modify the description, use the true/false as the return value. We shouldn't enable relaxed ordering attribute by the setting in the root complex configuration space for PCIe device, so fix it for cxgb4. Fix some format issues. v5: Removed the unnecessary code for some function which only return the bool value, and add the check for VF device. Make this patch set base on 4.12-rc5. v6: Fix the logic error in the need to enable the relaxed ordering attribute for cxgb4. v7: The cxgb4 drivers will enable the PCIe Capability Device Control[Relaxed Ordering Enable] in PCI Probe() routine, this will break our current solution for some platform which has problematic when enable the relaxed ordering attribute. According to the latest recommendations, remove the enable_pcie_relaxed_ordering(), although it could not cover the Peer-to-Peer scene, but we agree to leave this problem until we really trigger it. Make this patch set base on 4.12 release version. v8: Change the second patch title and description to make it more reasonable, add the acked-by from Alex and Ashok. Add a new patch to enable the Relaxed Ordering Attribute for cxgb4vf driver. Make this patch set base on 4.13-rc2. v9: The document (https://software.intel.com/sites/default/files/managed/9e/ bc/64-ia-32-architectures-optimization-manual.pdf) indicate that the Xeon processors based on Broadwell/Haswell microarchitecture has the problem with Relaxed Ordering Attribute enabled, so add the whole list Device ID from Intel to the patch. v10: Significant rework based on Bjorn's feedback, reorganize the first 2 patches, now the Intel and AMD erratum soc has been divided to the different patches, rename the pcie_relaxed_ordering_supported() to pcie_relaxed_ordering_enabled(), and no need to check every intervening switch except the root ports, update some commits. v11: We shouldn't let the Intel engineer to acked the AMD's erratum patch, fix the funny mistake. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:14:51 UTC
b629276 net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag cxgb4vf Ethernet driver now queries PCIe configuration space to determine if it can send TLPs to it with the Relaxed Ordering Attribute set, just like the pf did. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Reviewed-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:14:51 UTC
b0ba9d5 net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag cxgb4 Ethernet driver now queries PCIe configuration space to determine if it can send TLPs to it with the Relaxed Ordering Attribute set. Remove the enable_pcie_relaxed_ordering() to avoid enable PCIe Capability Device Control[Relaxed Ordering Enable] at probe routine, to make sure the driver will not send the Relaxed Ordering TLPs to the Root Complex which could not deal the Relaxed Ordering TLPs. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Reviewed-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:14:51 UTC
077fa19 PCI: Disable Relaxed Ordering Attributes for AMD A1100 Casey reported that the AMD ARM A1100 SoC has a bug in its PCIe Root Port where Upstream Transaction Layer Packets with the Relaxed Ordering Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering set, it would cause Data Corruption, so we need to disable Relaxed Ordering Attribute when Upstream TLPs to the Root Port. Reported-and-suggested-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:14:50 UTC
87e09cd PCI: Disable Relaxed Ordering for some Intel processors According to the Intel spec section 3.9.1 said: 3.9.1 Optimizing PCIe Performance for Accesses Toward Coherent Memory and Toward MMIO Regions (P2P) In order to maximize performance for PCIe devices in the processors listed in Table 3-6 below, the soft- ware should determine whether the accesses are toward coherent memory (system memory) or toward MMIO regions (P2P access to other devices). If the access is toward MMIO region, then software can command HW to set the RO bit in the TLP header, as this would allow hardware to achieve maximum throughput for these types of accesses. For accesses toward coherent memory, software can command HW to clear the RO bit in the TLP header (no RO), as this would allow hardware to achieve maximum throughput for these types of accesses. Table 3-6. Intel Processor CPU RP Device IDs for Processors Optimizing PCIe Performance Processor CPU RP Device IDs Intel Xeon processors based on 6F01H-6F0EH Broadwell microarchitecture Intel Xeon processors based on 2F01H-2F0EH Haswell microarchitecture It means some Intel processors has performance issue when use the Relaxed Ordering Attribute, so disable Relaxed Ordering for these root port. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:14:50 UTC
a99b646 PCI: Disable PCIe Relaxed Ordering if unsupported When bit4 is set in the PCIe Device Control register, it indicates whether the device is permitted to use relaxed ordering. On some platforms using relaxed ordering can have performance issues or due to erratum can cause data-corruption. In such cases devices must avoid using relaxed ordering. The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that Relaxed Ordering (RO) attribute should not be used for Transaction Layer Packets (TLP) targeted towards these affected root complexes. This patch checks if there is any node in the hierarchy that indicates that using relaxed ordering is not safe. In such cases the patch turns off the relaxed ordering by clearing the capability for this device. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Acked-by: Ashok Raj <ashok.raj@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> 15 August 2017, 05:14:50 UTC
7698869 Input: elan_i2c - Add antoher Lenovo ACPI ID for upcoming Lenovo NB Add 2 new IDs (ELAN0609 and ELAN060B) to the list of ACPI IDs that should be handled by the driver. Signed-off-by: KT Liao <kt.liao@emc.com.tw> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> 15 August 2017, 03:22:11 UTC
1874064 Input: elan_i2c - add ELAN0608 to the ACPI table Similar to commit 722c5ac708b4f ("Input: elan_i2c - add ELAN0605 to the ACPI table"), ELAN0608 should be handled by elan_i2c. This touchpad can be found in Lenovo ideapad 320-14IKB. BugLink: https://bugs.launchpad.net/bugs/1708852 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> 15 August 2017, 03:22:02 UTC
fcd0735 Merge tag 'md/4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD fixes from Shaohua Li: "Fix several bugs: - fix a rcu stall issue introduced in 4.12 (Neil Brown) - fix two raid5 cache race conditions (Song Liu)" * tag 'md/4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: MD: not clear ->safemode for external metadata array md/r5cache: fix io_unit handling in r5l_log_endio() md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_set md: fix test in md_write_start() md: always clear ->safemode when md_check_recovery gets the mddev lock. 14 August 2017, 20:09:59 UTC
6b9d1c2 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "Fix an error path bug in ixp4xx as well as a read overrun in sha1-avx2" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/sha1 - Fix reads beyond the number of blocks passed crypto: ixp4xx - Fix error handling path in 'aead_perform()' 14 August 2017, 18:35:56 UTC
59a361b tipc: avoid inheriting msg_non_seq flag when message is returned In the function msg_reverse(), we reverse the header while trying to reuse the original buffer whenever possible. Those rejected/returned messages are always transmitted as unicast, but the msg_non_seq field is not explicitly set to zero as it should be. We have seen cases where multicast senders set the message type to "NOT dest_droppable", meaning that a multicast message shorter than one MTU will be returned, e.g., during receive buffer overflow, by reusing the original buffer. This has the effect that even the 'msg_non_seq' field is inadvertently inherited by the rejected message, although it is now sent as a unicast message. This again leads the receiving unicast link endpoint to steer the packet toward the broadcast link receive function, where it is dropped. The affected unicast link is thereafter (after 100 failed retransmissions) declared 'stale' and reset. We fix this by unconditionally setting the 'msg_non_seq' flag to zero for all rejected/returned messages. Reported-by: Canh Duc Luu <canh.d.luu@dektech.com.au> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> 14 August 2017, 18:20:36 UTC
fed5f57 tipc: accept PACKET_MULTICAST packets On L2 bearers, the TIPC broadcast function is sending out packets using the corresponding L2 broadcast address. At reception, we filter such packets under the assumption that they will also be delivered as broadcast packets. This assumption doesn't always hold true. Under high load, we have seen that a switch may convert the destination address and deliver the packet as a PACKET_MULTICAST, something leading to inadvertently dropped packets and a stale and reset broadcast link. We fix this by extending the reception filtering to accept packets of type PACKET_MULTICAST. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net> 14 August 2017, 18:19:25 UTC
2c87d63 ipv4: route: fix inet_rtm_getroute induced crash "ip route get $daddr iif eth0 from $saddr" causes: BUG: KASAN: use-after-free in ip_route_input_rcu+0x1535/0x1b50 Call Trace: ip_route_input_rcu+0x1535/0x1b50 ip_route_input_noref+0xf9/0x190 tcp_v4_early_demux+0x1a4/0x2b0 ip_rcv+0xbcb/0xc05 __netif_receive_skb+0x9c/0xd0 netif_receive_skb_internal+0x5a8/0x890 Problem is that inet_rtm_getroute calls either ip_route_input_rcu (if an iif was provided) or ip_route_output_key_hash_rcu. But ip_route_input_rcu, unlike ip_route_output_key_hash_rcu, already associates the dst_entry with the skb. This clears the SKB_DST_NOREF bit (i.e. skb_dst_drop will release/free the entry while it should not). Thus only set the dst if we called ip_route_output_key_hash_rcu(). I tested this patch by running: while true;do ip r get 10.0.1.2;done > /dev/null & while true;do ip r get 10.0.1.2 iif eth0 from 10.0.1.1;done > /dev/null & ... and saw no crash or memory leak. Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: David Ahern <dsahern@gmail.com> Fixes: ba52d61e0ff ("ipv4: route: restore skb_dst_set in inet_rtm_getroute") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 14 August 2017, 18:08:29 UTC
e9bf53a brcmfmac: feature check for multi-scheduled scan fails on bcm4343x devices The firmware feature check introduced for multi-scheduled scan turned out to be failing for bcm4343{0,1,8} devices resulting in a firmware crash. The reason for this crash has not yet been root cause so this patch avoids the feature check for those device as a short-term fix. Reported-by: Stefan Wahren <stefan.wahren@i2se.com> Reported-by: Ian Molton <ian@mnementh.co.uk> Fixes: 9fe929aaace6 ("brcmfmac: add firmware feature detection for gscan feature") Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> 14 August 2017, 08:09:30 UTC
11e9d78 bonding: ratelimit failed speed/duplex update warning bond_miimon_commit() handles the UP transition for each slave of a bond in the case of MII. It is triggered 10 times per second for the default MII Polling interval of 100ms. For device drivers that do not implement __ethtool_get_link_ksettings() the call to bond_update_speed_duplex() fails persistently while the MII status could remain UP. That is, in this and other cases where the speed/duplex update keeps failing over a longer period of time while the MII state is UP, a warning is printed every MII polling interval. To address these excessive warnings net_ratelimit() should be used. Printing a warning once would not be sufficient since the call to bond_update_speed_duplex() could recover to succeed and fail again later. In that case there would be no new indication what went wrong. Fixes: b5bf0f5b16b9c (bonding: correctly update link status during mii-commit phase) Signed-off-by: Andreas Born <futur.andy@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 14 August 2017, 03:01:38 UTC
ef95484 Linux 4.13-rc5 13 August 2017, 23:01:32 UTC
b2298fc Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS fixes from Ralf Baechle: "Another round of MIPS fixes: - compressed boot: Ignore a generated .c file - VDSO: Fix a register clobber list - DECstation: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression - Octeon: Fix recent cleanups that cleaned away a bit too much thus breaking the arch side of the EDAC and USB drivers. - uasm: Fix duplicate const in "const struct foo const bar[]" which GCC 7.1 no longer accepts. - Fix race on setting and getting cpu_online_mask - Fix preemption issue. To do so cleanly introduce macro to get the size of L3 cache line. - Revert include cleanup that sometimes results in build error - MicroMIPS uses bit 0 of the PC to indicate microMIPS mode. Make sure this bit is set for kernel entry as well. - Prevent configuring the kernel for both microMIPS and MT. There are no such CPUs currently and thus the combination is unsupported and results in build errors. This has been sitting in linux-next for a few days and has survived automated testing by Imagination's test farm. No known regressions pending except a number of issues that crept up due to lots of people switching to GCC 7.1" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: MIPS: Set ISA bit in entry-y for microMIPS kernels MIPS: Prevent building MT support for microMIPS kernels MIPS: PCI: Fix smp_processor_id() in preemptible MIPS: Introduce cpu_tcache_line_size MIPS: DEC: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression MIPS: VDSO: Fix clobber lists in fallback code paths Revert "MIPS: Don't unnecessarily include kmalloc.h into <asm/cache.h>." MIPS: OCTEON: Fix USB platform code breakage. MIPS: Octeon: Fix broken EDAC driver. MIPS: gitignore: ignore generated .c files MIPS: Fix race on setting and getting cpu_online_mask MIPS: mm: remove duplicate "const" qualifier on insn_table 13 August 2017, 22:34:28 UTC
c9dc281 Merge tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are three firmware core fixes for 4.13-rc5. All three of these fix reported issues and have been floating around for a few weeks. They have been in linux-next with no reported problems" * tag 'driver-core-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: firmware: avoid invalid fallback aborts by using killable wait firmware: fix batched requests - send wake up on failure on direct lookups firmware: fix batched requests - wake all waiters 13 August 2017, 19:44:18 UTC
ce7ba95 Merge tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Here are two patches for 4.13-rc5. One is a fix for a reported thunderbolt issue, and the other a fix for an MEI driver issue. Both have been in linux-next with no reported issues" * tag 'char-misc-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: thunderbolt: Do not enumerate more ports from DROM than the controller has mei: exclude device from suspend direct complete optimization 13 August 2017, 19:41:58 UTC
438630e Merge tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are two tty serial driver fixes for 4.13-rc5. One is a revert of a -rc1 patch that turned out to not be a good idea, and the other is a fix for the pl011 serial driver. Both have been in linux-next with no reported issues" * tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: Revert "serial: Delete dead code for CIR serial ports" tty: pl011: fix initialization order of QDF2400 E44 13 August 2017, 19:33:35 UTC
dd95f18 Merge tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/iio fixes from Greg KH: "Here are some Staging and IIO driver fixes for 4.13-rc5. Nothing major, just a number of small fixes for reported issues. All of these have been in linux-next for a while now with no reported issues. Full details are in the shortlog" * tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: comedi: comedi_fops: do not call blocking ops when !TASK_RUNNING iio: aspeed-adc: wait for initial sequence. iio: accel: bmc150: Always restore device to normal mode after suspend-resume staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read iio: adc: axp288: Fix the GPADC pin reading often wrongly returning 0 iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits iio: accel: st_accel: add SPI-3wire support iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications" iio: adc: sun4i-gpadc-iio: fix unbalanced irq enable/disable iio: pressure: st_pressure_core: disable multiread by default for LPS22HB iio: light: tsl2563: use correct event code 13 August 2017, 19:30:17 UTC
10cec91 Merge tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a number of small USB driver fixes and new device ids for 4.13-rc5. There is the usual gadget driver fixes, some new quirks for "messy" hardware, and some new device ids. All have been in linux-next with no reported issues" * tag 'usb-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: USB: serial: pl2303: add new ATEN device id usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter USB: Check for dropped connection before switching to full speed usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume usb: renesas_usbhs: gadget: fix unused-but-set-variable warning usb: renesas_usbhs: Fix UGCTRL2 value for R-Car Gen3 usb: phy: phy-msm-usb: Fix usage of devm_regulator_bulk_get() usb: gadget: udc: renesas_usb3: Fix usb_gadget_giveback_request() calling usb: dwc3: gadget: Correct ISOC DATA PIDs for short packets USB: serial: option: add D-Link DWM-222 device ID usb: musb: fix tx fifo flush handling again usb: core: unlink urbs from the tail of the endpoint's urb_list usb-storage: fix deadlock involving host lock and scsi_done uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069 USB: hcd: Mark secondary HCD as dead if the primary one died USB: serial: cp210x: add support for Qivicon USB ZigBee dongle 13 August 2017, 19:27:42 UTC
89a5527 Merge tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd Pull another MTD fix from Brian Norris: "An mtdblock regression occurred in -rc1 (all writes were broken!), in the process of some block subsystem refactoring. Noticed and fixed last week, but I'm a little slow on the uptake" * tag 'for-linus-20170812' of git://git.infradead.org/linux-mtd: mtd: blkdevs: Fix mtd block write failure 12 August 2017, 23:19:43 UTC
9a51544 mtd: blkdevs: Fix mtd block write failure All the MTD block write requests are failing with following error messages mkfs.ext4 /dev/mtdblock0 print_req_error: I/O error, dev mtdblock0, sector 0 Buffer I/O error on dev mtdblock0, logical block 0, lost async page write The control is going to default case after block write request because of missing return. Fixes: commit 2a842acab109 ("block: introduce new block status code type") Signed-off-by: Abhishek Sahu <absahu@codeaurora.org> Signed-off-by: Brian Norris <computersforpeace@gmail.com> 12 August 2017, 21:53:24 UTC
a99bcdc Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target fixes from Nicholas Bellinger: "The highlights include: - Fix iscsi-target payload memory leak during ISCSI_FLAG_TEXT_CONTINUE (Varun Prakash) - Fix tcm_qla2xxx incorrect use of tcm_qla2xxx_free_cmd during ABORT (Pascal de Bruijn + Himanshu Madhani + nab) - Fix iscsi-target long-standing issue with parallel delete of a single network portal across multiple target instances (Gary Guo + nab) - Fix target dynamic se_node GPF during uncached shutdown regression (Justin Maggard + nab)" * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: target: Fix node_acl demo-mode + uncached dynamic shutdown regression iscsi-target: Fix iscsi_np reset hung task during parallel delete qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT (v2) cxgbit: fix sg_nents calculation iscsi-target: fix invalid flags in text response iscsi-target: fix memory leak in iscsit_setup_text_cmd() cxgbit: add missing __kfree_skb() tcmu: free old string on reconfig tcmu: Fix possible to/from address overflow when doing the memcpy 12 August 2017, 19:08:59 UTC
043cd07 Merge tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Some fixes for Xen: - a fix for a regression introduced in 4.13 for a Xen HVM-guest configured with KASLR - a fix for a possible deadlock in the xenbus driver when booting the system - a fix for lost interrupts in Xen guests" * tag 'for-linus-4.13b-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: Fix interrupt lost during irq_disable and irq_enable xen: avoid deadlock in xenbus xen: fix hvm guest with kaslr enabled xen: split up xen_hvm_init_shared_info() x86: provide an init_mem_mapping hypervisor hook 12 August 2017, 16:01:36 UTC
afc1f55 MD: not clear ->safemode for external metadata array ->safemode should be triggered by mdadm for external metadaa array, otherwise array's state confuses mdadm. Fixes: 33182d15c6bf(md: always clear ->safemode when md_check_recovery gets the mddev lock.) Cc: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> 12 August 2017, 03:42:06 UTC
fd851ba udp: harden copy_linear_skb() syzkaller got crashes with CONFIG_HARDENED_USERCOPY=y configs. Issue here is that recvfrom() can be used with user buffer of Z bytes, and SO_PEEK_OFF of X bytes, from a skb with Y bytes, and following condition : Z < X < Y kernel BUG at mm/usercopy.c:72! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 2917 Comm: syzkaller842281 Not tainted 4.13.0-rc3+ #16 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d2fa40c0 task.stack: ffff8801d1fe8000 RIP: 0010:report_usercopy mm/usercopy.c:64 [inline] RIP: 0010:__check_object_size+0x3ad/0x500 mm/usercopy.c:264 RSP: 0018:ffff8801d1fef8a8 EFLAGS: 00010286 RAX: 0000000000000078 RBX: ffffffff847102c0 RCX: 0000000000000000 RDX: 0000000000000078 RSI: 1ffff1003a3fded5 RDI: ffffed003a3fdf09 RBP: ffff8801d1fef998 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d1ea480e R13: fffffffffffffffa R14: ffffffff84710280 R15: dffffc0000000000 FS: 0000000001360880(0000) GS:ffff8801dc000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000202ecfe4 CR3: 00000001d1ff8000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: check_object_size include/linux/thread_info.h:108 [inline] check_copy_size include/linux/thread_info.h:139 [inline] copy_to_iter include/linux/uio.h:105 [inline] copy_linear_skb include/net/udp.h:371 [inline] udpv6_recvmsg+0x1040/0x1af0 net/ipv6/udp.c:395 inet_recvmsg+0x14c/0x5f0 net/ipv4/af_inet.c:793 sock_recvmsg_nosec net/socket.c:792 [inline] sock_recvmsg+0xc9/0x110 net/socket.c:799 SYSC_recvfrom+0x2d6/0x570 net/socket.c:1788 SyS_recvfrom+0x40/0x50 net/socket.c:1760 entry_SYSCALL_64_fastpath+0x1f/0xbe Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 22:00:45 UTC
6401f37 Merge branch 'bpf-Minor-fix-in-bpf_convert_ctx_access' Daniel Borkmann says: ==================== bpf: Minor fix in bpf_convert_ctx_access First one was found while trying to compile the kernel with !CONFIG_NET_RX_BUSY_POLL. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 21:59:24 UTC
2ed46ce bpf: fix two missing target_size settings in bpf_convert_ctx_access When CONFIG_NET_SCHED or CONFIG_NET_RX_BUSY_POLL is /not/ set and we try a narrow __sk_buff load of tc_index or napi_id, respectively, then verifier rightfully complains that it's misconfigured, because we need to set target_size in each of the two cases. The rewrite for the ctx access is just a dummy op, but needs to pass, so fix this up. Fixes: f96da09473b5 ("bpf: simplify narrower ctx access") Reported-by: Shubham Bansal <illusionist.neo@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 21:59:24 UTC
e4dde41 net: fix compilation when busy poll is not enabled MIN_NAPI_ID is used in various places outside of CONFIG_NET_RX_BUSY_POLL wrapping, so when it's not set we run into build errors such as: net/core/dev.c: In function 'dev_get_by_napi_id': net/core/dev.c:886:16: error: ‘MIN_NAPI_ID’ undeclared (first use in this function) if (napi_id < MIN_NAPI_ID) ^~~~~~~~~~~ Thus, have MIN_NAPI_ID always defined to fix these errors. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 21:59:24 UTC
54a6a04 mISDN: Fix null pointer dereference at mISDN_FsmNew If mISDN_FsmNew() fails to allocate memory for jumpmatrix then null pointer dereference will occur on any write to jumpmatrix. The patch adds check on successful allocation and corresponding error handling. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru> Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 21:56:23 UTC
bb3afda nfp: do not update MTU from BH in flower app The Flower app may receive a request to update the MTU of a representor netdev upon receipt of a control message from the firmware. This requires the RTNL lock which needs to be taken outside of the packet processing path. As a handling of this correctly seems a little to invasive for a fix simply skip setting the MTU for now. Relevant backtrace: [ 1496.288489] BUG: scheduling while atomic: kworker/0:3/373/0x00000100 [ 1496.294911] dca syscopyarea sysfillrect sysimgblt fb_sys_fops ptp drm mxm_wmi ahci pps_core libahci i2c_algo_bit wmi [last unloaded: nfp] [ 1496.294918] CPU: 0 PID: 373 Comm: kworker/0:3 Tainted: G OE 4.13.0-rc3+ #3 [ 1496.294919] Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0 12/28/2015 [ 1496.294923] Workqueue: events work_for_cpu_fn [ 1496.294924] Call Trace: [ 1496.294927] <IRQ> [ 1496.294931] dump_stack+0x63/0x82 [ 1496.294935] __schedule_bug+0x54/0x70 [ 1496.294937] __schedule+0x62f/0x890 [ 1496.294941] ? intel_unmap_sg+0x90/0x90 [ 1496.294942] schedule+0x36/0x80 [ 1496.294943] schedule_preempt_disabled+0xe/0x10 [ 1496.294945] __mutex_lock.isra.2+0x445/0x4a0 [ 1496.294947] ? device_is_rmrr_locked+0x12/0x50 [ 1496.294950] ? kfree+0x162/0x170 [ 1496.294952] ? device_is_rmrr_locked+0x12/0x50 [ 1496.294953] ? iommu_should_identity_map+0x50/0xe0 [ 1496.294954] __mutex_lock_slowpath+0x13/0x20 [ 1496.294955] ? iommu_no_mapping+0x48/0xd0 [ 1496.294956] ? __mutex_lock_slowpath+0x13/0x20 [ 1496.294957] mutex_lock+0x2f/0x40 [ 1496.294960] rtnl_lock+0x15/0x20 [ 1496.294979] nfp_flower_cmsg_rx+0xc8/0x150 [nfp] [ 1496.294986] nfp_ctrl_poll+0x286/0x350 [nfp] [ 1496.294989] tasklet_action+0xf6/0x110 [ 1496.294992] __do_softirq+0xed/0x278 [ 1496.294993] irq_exit+0xb6/0xc0 [ 1496.294994] do_IRQ+0x4f/0xd0 [ 1496.294996] common_interrupt+0x89/0x89 Fixes: 948faa46c05b ("nfp: add support for control messages for flower app") Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 21:50:09 UTC
fbca164 net: stmmac: Use the right logging function in stmmac_mdio_register Currently, the function stmmac_mdio_register() is only used by stmmac_dvr_probe() from stmmac_main.c, in order to register the MDIO bus and probe information about the PHY. As this function is called before calling register_netdev(), all messages logged from stmmac_mdio_register are prefixed by "(unnamed net_device)". The goal of netdev_info or netdev_err is to dump useful infos about a net_device, when this data structure is partially initialized, there is no point for using these functions. This commit fixes the issue by replacing all netdev_*() by the corresponding dev_*() function for logging. The last netdev_info is replaced by phy_attached_info(), as a valid phydev can be used at this point. Signed-off-by: Romain Perier <romain.perier@collabora.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 21:38:55 UTC
8d55373 net/sched/hfsc: allocate tcf block for hfsc root class Without this filters cannot be attached. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 21:30:30 UTC
ad729bc bonding: require speed/duplex only for 802.3ad, alb and tlb The patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") puts the link state to down if bond_update_speed_duplex() cannot retrieve speed and duplex settings. Assumably the patch was written with 802.3ad mode in mind which relies on link speed/duplex settings. For other modes like active-backup these settings are not required. Thus, only for these other modes, this patch reintroduces support for slaves that do not support reporting speed or duplex such as wireless devices. This fixes the regression reported in bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547). Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Andreas Born <futur.andy@googlemail.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 21:21:42 UTC
e71cb9e net: dsa: ksz: fix skb freeing The DSA layer frees the original skb when an xmit function returns NULL, meaning an error occurred. But if the tagging code copied the original skb, it is responsible of freeing the copy if an error occurs. The ksz tagging code currently has two issues: if skb_put_padto fails, the skb copy is not freed, and the original skb will be freed twice. To fix that, move skb_put_padto inside both branches of the skb_tailroom condition, before freeing the original skb, and free the copy on error. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 August 2017, 20:57:08 UTC
216e4a1 Merge tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client fixes from Anna Schumaker: "A few more NFS client bugfixes from me for rc5. Dros has a stable fix for flexfiles to prevent leaking the nfs4_ff_ds_version arrays when freeing a layout, Trond fixed a potential recovery loop situation with the TEST_STATEID operation, and Christoph fixed up the pNFS blocklayout Kconfig options to prevent unsafe use with kernels that don't have large block device support. Summary: Stable fix: - fix leaking nfs4_ff_ds_version array Other fixes: - improve TEST_STATEID OLD_STATEID handling to prevent recovery loop - require 64-bit sector_t for pNFS blocklayout to prevent 32-bit compile errors" * tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs: pnfs/blocklayout: require 64-bit sector_t NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid() nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays 11 August 2017, 20:54:09 UTC
e0d0e04 Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A set of fixes that should go into this series. This contains: - Fix from Bart for blk-mq requeue queue running, preventing a continued loop of run/restart. - Fix for a bio/blk-integrity issue, in two parts. One from Christoph, fixing where verification happens, and one from Milan, for a NULL profile. - NVMe pull request, most of the changes being for nvme-fc, but also a few trivial core/pci fixes" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: fix directive command numd calculation nvme: fix nvme reset command timeout handling nvme-pci: fix CMB sysfs file removal in reset path lpfc: support nvmet_fc defer_rcv callback nvmet_fc: add defer_req callback for deferment of cmd buffer return nvme: strip trailing 0-bytes in wwid_show block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time bio-integrity: only verify integrity on the lowest stacked driver bio-integrity: Fix regression if profile verify_fn is NULL 11 August 2017, 19:26:49 UTC
0993133 Merge tag 'mmc-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - fix lockdep splat when removing mmc_block module - fix the logic for setting eMMC HS400ES signal voltage MMC host: - omap_hsmmc: add CMD23 capability to fix -EIO errors" * tag 'mmc-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: block: fix lockdep splat when removing mmc_block module mmc: mmc: correct the logic for setting HS400ES signal voltage mmc: host: omap_hsmmc: Add CMD23 capability to omap_hsmmc driver 11 August 2017, 18:56:54 UTC
7eb97ba Merge tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linux Pull fbdev fixes from Bartlomiej Zolnierkiewicz: - allow user to disable write combined mapping in efifb driver (Dave Airlie) - fix use after free bugs on driver removal in imxfb driver (Dan Carpenter) - fix unused variable warning in omapfb driver (Arnd Bergmann) * tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linux: efifb: allow user to disable write combined mapping. fbdev: omapfb: remove unused variable video: fbdev: imxfb: use after free in imxfb_remove() 11 August 2017, 18:44:18 UTC
2bfc37c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix a few bugs in fuse" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: set mapping error in writepage_locked when it fails fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio fuse: initialize the flock flag in fuse_file on allocation 11 August 2017, 18:20:48 UTC
7d7a827 Merge tag 'iommu-fixes-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fix from Joerg Roedel: "Fix a NULL-pointer dereference in arm_smmu_add_device" * tag 'iommu-fixes-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device 11 August 2017, 18:15:51 UTC
8a9d6e9 pnfs/blocklayout: require 64-bit sector_t The blocklayout code does not compile cleanly for a 32-bit sector_t, and also has no reliable checks for devices sizes, which makes it unsafe to use with a kernel that doesn't support large block devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Arnd Bergmann <arnd@arndb.de> Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> 11 August 2017, 18:10:13 UTC
8001a97 Merge tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "All fixes for code that went in this cycle. - a revert of an optimisation to the syscall exit path, which could lead to an oops on either older machines or machines with > 1TB of memory - disable some deep idle states if the firmware configuration for them fails - re-enable HARD/SOFT lockup detectors in defconfigs after a Kconfig change - six fairly small patches fixing bugs in our new watchdog code Thanks to: Gautham R Shenoy, Nicholas Piggin" * tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/watchdog: add locking around init/exit functions powerpc/watchdog: Fix marking of stuck CPUs powerpc/watchdog: Fix final-check recovered case powerpc/watchdog: Moderate touch_nmi_watchdog overhead powerpc/watchdog: Improve watchdog lock primitive powerpc: NMI IPI improve lock primitive powerpc/configs: Re-enable HARD/SOFT lockup detectors powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails Revert "powerpc/64: Avoid restore_math call if possible in syscall exit" 11 August 2017, 15:56:01 UTC
622b2fb selftests: timers: freq-step: fix compile error Fix compile error due to ksft_exit_skip() update to take var_args. freq-step.c: In function ‘init_test’: freq-step.c:234:3: error: too few arguments to function ‘ksft_exit_skip’ ksft_exit_skip(); ^~~~~~~~~~~~~~ In file included from freq-step.c:26:0: ../kselftest.h:167:19: note: declared here static inline int ksft_exit_skip(const char *msg, ...) ^~~~~~~~~~~~~~ <builtin>: recipe for target 'freq-step' failed Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> 11 August 2017, 15:28:37 UTC
a7990c6 iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device" removed fwspec assignment in legacy_binding path as redundant which is wrong. It needs to be updated after fwspec initialisation in arm_smmu_register_legacy_master() as it is dereferenced later. Without this there is a NULL-pointer dereference panic during boot on some hosts. Signed-off-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> 11 August 2017, 14:56:51 UTC
020db9d xen/events: Fix interrupt lost during irq_disable and irq_enable Here is a device has xen-pirq-MSI interrupt. Dom0 might lost interrupt during driver irq_disable/irq_enable. Here is the scenario, 1. irq_disable -> disable_dynirq -> mask_evtchn(irq channel) 2. dev interrupt raised by HW and Xen mark its evtchn as pending 3. irq_enable -> startup_pirq -> eoi_pirq -> clear_evtchn(channel of irq) -> clear pending status 4. consume_one_event process the irq event without pending bit assert which result in interrupt lost once 5. No HW interrupt raising anymore. Now use enable_dynirq for enable_pirq of xen_pirq_chip to remove eoi_pirq when irq_enable. Signed-off-by: Liu Shuo <shuo.a.liu@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com> 11 August 2017, 14:46:01 UTC
529871b xen: avoid deadlock in xenbus When starting the xenwatch thread a theoretical deadlock situation is possible: xs_init() contains: task = kthread_run(xenwatch_thread, NULL, "xenwatch"); if (IS_ERR(task)) return PTR_ERR(task); xenwatch_pid = task->pid; And xenwatch_thread() does: mutex_lock(&xenwatch_mutex); ... event->handle->callback(); ... mutex_unlock(&xenwatch_mutex); The callback could call unregister_xenbus_watch() which does: ... if (current->pid != xenwatch_pid) mutex_lock(&xenwatch_mutex); ... In case a watch is firing before xenwatch_pid could be set and the callback of that watch unregisters a watch, then a self-deadlock would occur. Avoid this by setting xenwatch_pid in xenwatch_thread(). Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com> 11 August 2017, 14:45:56 UTC
4a8b53b Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linus Pull NVMe fixes from Christoph: "A few more small fixes - the fc/lpfc update is the biggest by far." 11 August 2017, 14:07:19 UTC
4ca83dc xen: fix hvm guest with kaslr enabled A Xen HVM guest running with KASLR enabled will die rather soon today because the shared info page mapping is using va() too early. This was introduced by commit a5d5f328b0e2baa5ee7c119fd66324eb79eeeb66 ("xen: allocate page for shared info page from low memory"). In order to fix this use early_memremap() to get a temporary virtual address for shared info until va() can be used safely. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> 11 August 2017, 13:50:26 UTC
10231f6 xen: split up xen_hvm_init_shared_info() Instead of calling xen_hvm_init_shared_info() on boot and resume split it up into a boot time function searching for the pfn to use and a mapping function doing the hypervisor mapping call. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> 11 August 2017, 13:50:24 UTC
c138d81 x86: provide an init_mem_mapping hypervisor hook Provide a hook in hypervisor_x86 called after setting up initial memory mapping. This is needed e.g. by Xen HVM guests to map the hypervisor shared info page. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Juergen Gross <jgross@suse.com> 11 August 2017, 13:50:21 UTC
9183976 fuse: set mapping error in writepage_locked when it fails This ensures that we see errors on fsync when writeback fails. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> 11 August 2017, 09:38:26 UTC
b2dbdf2 Merge tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Nothing too earth shattering here, it just seems like lots of little things all over the place. msm has probably the larger amount of changes, but they all seem fine, otherwise, some rockchip, i915, etnaviv and exynos fixes, along with one nouveau regression fix for some older GPUs" * tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux: (35 commits) drm/nouveau/disp/nv04: avoid creation of output paths drm: make DRM_STM default n drm/exynos: forbid creating framebuffers from too small GEM buffers drm/etnaviv: Fix off-by-one error in reloc checking drm/i915: fix backlight invert for non-zero minimum brightness drm/i915/shrinker: Wrap need_resched() inside preempt-disable drm/i915/perf: fix flex eu registers programming drm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut drm/i915/gvt: Change the max length of mmio_reg_rw from 4 to 8 drm/i915/gvt: Initialize MMIO Block with HW state drm/rockchip: vop: report error when check resource error drm/rockchip: vop: round_up pitches to word align drm/rockchip: vop: fix NV12 video display error drm/rockchip: vop: fix iommu page fault when resume drm/i915/gvt: clean workload queue if error happened drm/i915/gvt: change resetting to resetting_eng drm/msm: gpu: don't abuse dma_alloc for non-DMA allocations drm/msm: gpu: call qcom_mdt interfaces only for ARCH_QCOM drm/msm/adreno: Prevent unclocked access when retrieving timestamps drm/msm: Remove __user from __u64 data types ... 11 August 2017, 05:33:47 UTC
8e2f3bc cpufreq: x86: Disable interrupts during MSRs reading According to Intel 64 and IA-32 Architectures SDM, Volume 3, Chapter 14.2, "Software needs to exercise care to avoid delays between the two RDMSRs (for example interrupts)". So, disable interrupts during reading MSRs IA32_APERF and IA32_MPERF. See also: commit 4ab60c3f32c7 (cpufreq: intel_pstate: Disable interrupts during MSRs reading). Signed-off-by: Doug Smythies <dsmythies@telus.net> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 10 August 2017, 23:27:41 UTC
c587c79 cpufreq: intel_pstate: report correct CPU frequencies during trace The intel_pstate CPU frequency scaling driver has always calculated CPU frequency incorrectly. Recent changes have eliminted most of the issues, however the frequency reported in the trace buffer, if used, is incorrect. It remains desireable that cpu->pstate.scaling still be a nice round number for things such as when setting max and min frequencies. So the proposal is to just fix the reported frequency in the trace data. Fixes what remains of [1]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=96521 # [1] Signed-off-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 10 August 2017, 23:25:53 UTC
27df704 Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "21 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits) userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage zram: rework copy of compressor name in comp_algorithm_store() rmap: do not call mmu_notifier_invalidate_page() under ptl mm: fix list corruptions on shmem shrinklist mm/balloon_compaction.c: don't zero ballooned pages MAINTAINERS: copy virtio on balloon_compaction.c mm: fix KSM data corruption mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem mm: make tlb_flush_pending global mm: refactor TLB gathering API Revert "mm: numa: defer TLB flush for THP migration as long as possible" mm: migrate: fix barriers around tlb_flush_pending mm: migrate: prevent racy access to tlb_flush_pending fault-inject: fix wrong should_fail() decision in task context test_kmod: fix small memory leak on filesystem tests test_kmod: fix the lock in register_test_dev_kmod() test_kmod: fix bug which allows negative values on two config options test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY" userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case mm: ratelimit PFNs busy info message ... 10 August 2017, 23:20:52 UTC
e86b298 userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage When the process exit races with outstanding mcopy_atomic, it would be better to return ESRCH error. When such race occurs the process and it's mm are going away and returning "no such process" to the uffd monitor seems better fit than ENOSPC. Link: http://lkml.kernel.org/r/1502111545-32305-1-git-send-email-rppt@linux.vnet.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Pavel Emelyanov <xemul@virtuozzo.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
f357e34 zram: rework copy of compressor name in comp_algorithm_store() comp_algorithm_store() passes the size of the source buffer to strlcpy() instead of the destination buffer size. Make it explicit that the two buffers have the same size and use strcpy() instead of strlcpy(). The latter can be done safely since the function ensures that the string in the source buffer is terminated. Link: http://lkml.kernel.org/r/20170803163350.45245-1-mka@chromium.org Signed-off-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
aac2fea rmap: do not call mmu_notifier_invalidate_page() under ptl MMU notifiers can sleep, but in page_mkclean_one() we call mmu_notifier_invalidate_page() under page table lock. Let's instead use mmu_notifier_invalidate_range() outside page_vma_mapped_walk() loop. [jglisse@redhat.com: try_to_unmap_one() do not call mmu_notifier under ptl] Link: http://lkml.kernel.org/r/20170809204333.27485-1-jglisse@redhat.com Link: http://lkml.kernel.org/r/20170804134928.l4klfcnqatni7vsc@black.fi.intel.com Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Jérôme Glisse <jglisse@redhat.com> Reported-by: axie <axie@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Writer, Tim" <Tim.Writer@amd.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
d041353 mm: fix list corruptions on shmem shrinklist We saw many list corruption warnings on shmem shrinklist: WARNING: CPU: 18 PID: 177 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0 list_del corruption. prev->next should be ffff9ae5694b82d8, but was ffff9ae5699ba960 Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt CPU: 18 PID: 177 Comm: kswapd1 Not tainted 4.9.34-t3.el7.twitter.x86_64 #1 Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013 Call Trace: dump_stack+0x4d/0x66 __warn+0xcb/0xf0 warn_slowpath_fmt+0x4f/0x60 __list_del_entry+0x9e/0xc0 shmem_unused_huge_shrink+0xfa/0x2e0 shmem_unused_huge_scan+0x20/0x30 super_cache_scan+0x193/0x1a0 shrink_slab.part.41+0x1e3/0x3f0 shrink_slab+0x29/0x30 shrink_node+0xf9/0x2f0 kswapd+0x2d8/0x6c0 kthread+0xd7/0xf0 ret_from_fork+0x22/0x30 WARNING: CPU: 23 PID: 639 at lib/list_debug.c:33 __list_add+0x89/0xb0 list_add corruption. prev->next should be next (ffff9ae5699ba960), but was ffff9ae5694b82d8. (prev=ffff9ae5694b82d8). Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt CPU: 23 PID: 639 Comm: systemd-udevd Tainted: G W 4.9.34-t3.el7.twitter.x86_64 #1 Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013 Call Trace: dump_stack+0x4d/0x66 __warn+0xcb/0xf0 warn_slowpath_fmt+0x4f/0x60 __list_add+0x89/0xb0 shmem_setattr+0x204/0x230 notify_change+0x2ef/0x440 do_truncate+0x5d/0x90 path_openat+0x331/0x1190 do_filp_open+0x7e/0xe0 do_sys_open+0x123/0x200 SyS_open+0x1e/0x20 do_syscall_64+0x61/0x170 entry_SYSCALL64_slow_path+0x25/0x25 The problem is that shmem_unused_huge_shrink() moves entries from the global sbinfo->shrinklist to its local lists and then releases the spinlock. However, a parallel shmem_setattr() could access one of these entries directly and add it back to the global shrinklist if it is removed, with the spinlock held. The logic itself looks solid since an entry could be either in a local list or the global list, otherwise it is removed from one of them by list_del_init(). So probably the race condition is that, one CPU is in the middle of INIT_LIST_HEAD() but the other CPU calls list_empty() which returns true too early then the following list_add_tail() sees a corrupted entry. list_empty_careful() is designed to fix this situation. [akpm@linux-foundation.org: add comments] Link: http://lkml.kernel.org/r/20170803054630.18775-1-xiyou.wangcong@gmail.com Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure") Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
af54aed mm/balloon_compaction.c: don't zero ballooned pages Revert commit bb01b64cfab7 ("mm/balloon_compaction.c: enqueue zero page to balloon device")' Zeroing ballon pages is rather time consuming, especially when a lot of pages are in flight. E.g. 7GB worth of ballooned memory takes 2.8s with __GFP_ZERO while it takes ~491ms without it. The original commit argued that zeroing will help ksmd to merge these pages on the host but this argument is assuming that the host actually marks balloon pages for ksm which is not universally true. So we pay performance penalty for something that even might not be used in the end which is wrong. The host can zero out pages on its own when there is a need. [mhocko@kernel.org: new changelog text] Link: http://lkml.kernel.org/r/1501761557-9758-1-git-send-email-wei.w.wang@intel.com Fixes: bb01b64cfab7 ("mm/balloon_compaction.c: enqueue zero page to balloon device") Signed-off-by: Wei Wang <wei.w.wang@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: zhenwei.pi <zhenwei.pi@youruncloud.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
c0a6a5a MAINTAINERS: copy virtio on balloon_compaction.c Changes to mm/balloon_compaction.c can easily break virtio, and virtio is the only user of that interface. Add a line to MAINTAINERS so whoever changes that file remembers to copy us. Link: http://lkml.kernel.org/r/1501764010-24456-1-git-send-email-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
b3a81d0 mm: fix KSM data corruption Nadav reported KSM can corrupt the user data by the TLB batching race[1]. That means data user written can be lost. Quote from Nadav Amit: "For this race we need 4 CPUs: CPU0: Caches a writable and dirty PTE entry, and uses the stale value for write later. CPU1: Runs madvise_free on the range that includes the PTE. It would clear the dirty-bit. It batches TLB flushes. CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty. We care about the fact that it clears the PTE write-bit, and of course, batches TLB flushes. CPU3: Runs KSM. Our purpose is to pass the following test in write_protect_page(): if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) || (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte))) Since it will avoid TLB flush. And we want to do it while the PTE is stale. Later, and before replacing the page, we would be able to change the page. Note that all the operations the CPU1-3 perform canhappen in parallel since they only acquire mmap_sem for read. We start with two identical pages. Everything below regards the same page/PTE. CPU0 CPU1 CPU2 CPU3 ---- ---- ---- ---- Write the same value on page [cache PTE as dirty in TLB] MADV_FREE pte_mkclean() 4 > clear_refs pte_wrprotect() write_protect_page() [ success, no flush ] pages_indentical() [ ok ] Write to page different value [Ok, using stale PTE] replace_page() Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late. CPU0 already wrote on the page, but KSM ignored this write, and it got lost" In above scenario, MADV_FREE is fixed by changing TLB batching API including [set|clear]_tlb_flush_pending. Remained thing is soft-dirty part. This patch changes soft-dirty uses TLB batching API instead of flush_tlb_mm and KSM checks pending TLB flush by using mm_tlb_flush_pending so that it will flush TLB to avoid data lost if there are other parallel threads pending TLB flush. [1] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com Link: http://lkml.kernel.org/r/20170802000818.4760-8-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reported-by: Nadav Amit <namit@vmware.com> Tested-by: Nadav Amit <namit@vmware.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Hugh Dickins <hughd@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
99baac2 mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem Nadav reported parallel MADV_DONTNEED on same range has a stale TLB problem and Mel fixed it[1] and found same problem on MADV_FREE[2]. Quote from Mel Gorman: "The race in question is CPU 0 running madv_free and updating some PTEs while CPU 1 is also running madv_free and looking at the same PTEs. CPU 1 may have writable TLB entries for a page but fail the pte_dirty check (because CPU 0 has updated it already) and potentially fail to flush. Hence, when madv_free on CPU 1 returns, there are still potentially writable TLB entries and the underlying PTE is still present so that a subsequent write does not necessarily propagate the dirty bit to the underlying PTE any more. Reclaim at some unknown time at the future may then see that the PTE is still clean and discard the page even though a write has happened in the meantime. I think this is possible but I could have missed some protection in madv_free that prevents it happening." This patch aims for solving both problems all at once and is ready for other problem with KSM, MADV_FREE and soft-dirty story[3]. TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch there are parallel threads going on. In that case, forcefully, flush TLB to prevent for user to access memory via stale TLB entry although it fail to gather page table entry. I confirmed this patch works with [4] test program Nadav gave so this patch supersedes "mm: Always flush VMA ranges affected by zap_page_range v2" in current mmotm. NOTE: This patch modifies arch-specific TLB gathering interface(x86, ia64, s390, sh, um). It seems most of architecture are straightforward but s390 need to be careful because tlb_flush_mmu works only if mm->context.flush_mm is set to non-zero which happens only a pte entry really is cleared by ptep_get_and_clear and friends. However, this problem never changes the pte entries but need to flush to prevent memory access from stale tlb. [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com [4] https://patchwork.kernel.org/patch/9861621/ [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu] Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Reported-by: Nadav Amit <namit@vmware.com> Reported-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
0a2dd26 mm: make tlb_flush_pending global Currently, tlb_flush_pending is used only for CONFIG_[NUMA_BALANCING| COMPACTION] but upcoming patches to solve subtle TLB flush batching problem will use it regardless of compaction/NUMA so this patch doesn't remove the dependency. [akpm@linux-foundation.org: remove more ifdefs from world's ugliest printk statement] Link: http://lkml.kernel.org/r/20170802000818.4760-6-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
56236a5 mm: refactor TLB gathering API This patch is a preparatory patch for solving race problems caused by TLB batch. For that, we will increase/decrease TLB flush pending count of mm_struct whenever tlb_[gather|finish]_mmu is called. Before making it simple, this patch separates architecture specific part and rename it to arch_tlb_[gather|finish]_mmu and generic part just calls it. It shouldn't change any behavior. Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Jeff Dike <jdike@addtoit.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
a9b8025 Revert "mm: numa: defer TLB flush for THP migration as long as possible" While deferring TLB flushes is a good practice, the reverted patch caused pending TLB flushes to be checked while the page-table lock is not taken. As a result, in architectures with weak memory model (PPC), Linux may miss a memory-barrier, miss the fact TLB flushes are pending, and cause (in theory) a memory corruption. Since the alternative of using smp_mb__after_unlock_lock() was considered a bit open-coded, and the performance impact is expected to be small, the previous patch is reverted. This reverts b0943d61b8fa ("mm: numa: defer TLB flush for THP migration as long as possible"). Link: http://lkml.kernel.org/r/20170802000818.4760-4-namit@vmware.com Signed-off-by: Nadav Amit <namit@vmware.com> Suggested-by: Mel Gorman <mgorman@suse.de> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
0a2c404 mm: migrate: fix barriers around tlb_flush_pending Reading tlb_flush_pending while the page-table lock is taken does not require a barrier, since the lock/unlock already acts as a barrier. Removing the barrier in mm_tlb_flush_pending() to address this issue. However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending() while the page-table lock is already released, which may present a problem on architectures with weak memory model (PPC). To deal with this case, a new parameter is added to mm_tlb_flush_pending() to indicate if it is read without the page-table lock taken, and calling smp_mb__after_unlock_lock() in this case. Link: http://lkml.kernel.org/r/20170802000818.4760-3-namit@vmware.com Signed-off-by: Nadav Amit <namit@vmware.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 10 August 2017, 22:54:07 UTC
back to top