https://github.com/torvalds/linux

sort by:
Revision Author Date Message Commit Date
4d856f7 Linux 5.3 15 September 2019, 21:19:32 UTC
72dbcf7 Revert "ext4: make __ext4_get_inode_loc plug" This reverts commit b03755ad6f33b7b8cd7312a3596a2dbf496de6e7. This is sad, and done for all the wrong reasons. Because that commit is good, and does exactly what it says: avoids a lot of small disk requests for the inode table read-ahead. However, it turns out that it causes an entirely unrelated problem: the getrandom() system call was introduced back in 2014 by commit c6e9d6f38894 ("random: introduce getrandom(2) system call"), and people use it as a convenient source of good random numbers. But part of the current semantics for getrandom() is that it waits for the entropy pool to fill at least partially (unlike /dev/urandom). And at least ArchLinux apparently has a systemd that uses getrandom() at boot time, and the improvements in IO patterns means that existing installations suddenly start hanging, waiting for entropy that will never happen. It seems to be an unlucky combination of not _quite_ enough entropy, together with a particular systemd version and configuration. Lennart says that the systemd-random-seed process (which is what does this early access) is supposed to not block any other boot activity, but sadly that doesn't actually seem to be the case (possibly due bogus dependencies on cryptsetup for encrypted swapspace). The correct fix is to fix getrandom() to not block when it's not appropriate, but that fix is going to take a lot more discussion. Do we just make it act like /dev/urandom by default, and add a new flag for "wait for entropy"? Do we add a boot-time option? Or do we just limit the amount of time it will wait for entropy? So in the meantime, we do the revert to give us time to discuss the eventual fix for the fundamental problem, at which point we can re-apply the ext4 inode table access optimization. Reported-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Alexander E. Patrakov <patrakov@gmail.com> Cc: Lennart Poettering <mzxreary@0pointer.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 15 September 2019, 19:32:03 UTC
1609d76 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "The main change here is a revert of reverts. We recently simplified some code that was thought unnecessary; however, since then KVM has grown quite a few cond_resched()s and for that reason the simplified code is prone to livelocks---one CPUs tries to empty a list of guest page tables while the others keep adding to them. This adds back the generation-based zapping of guest page tables, which was not unnecessary after all. On top of this, there is a fix for a kernel memory leak and a couple of s390 fixlets as well" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot KVM: x86: work around leak of uninitialized stack contents KVM: nVMX: handle page fault in vmread KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset() 14 September 2019, 23:07:40 UTC
1f9c632 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fix from Michael Tsirkin: "A last minute revert The 32-bit build got broken by the latest defence in depth patch. Revert and we'll try again in the next cycle" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: Revert "vhost: block speculation of translated descriptors" 14 September 2019, 23:02:49 UTC
b03c036 Merge tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fix from Paul Walmsley: "Last week, Palmer and I learned that there was an error in the RISC-V kernel image header format that could make it less compatible with the ARM64 kernel image header format. I had missed this error during my original reviews of the patch. The kernel image header format is an interface that impacts bootloaders, QEMU, and other user tools. Those packages must be updated to align with whatever is merged in the kernel. We would like to avoid proliferating these image formats by keeping the RISC-V header as close as possible to the existing ARM64 header. Since the arch/riscv patch that adds support for the image header was merged with our v5.3-rc1 pull request as commit 0f327f2aaad6a ("RISC-V: Add an Image header that boot loader can parse."), we think it wise to try to fix this error before v5.3 is released. The fix itself should be backwards-compatible with any project that has already merged support for premature versions of this interface. It primarily involves ensuring that the RISC-V image header has something useful in the same field as the ARM64 image header" * tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: modify the Image header to improve compatibility with the ARM64 header 14 September 2019, 22:58:02 UTC
0d4a3f2 Revert "vhost: block speculation of translated descriptors" This reverts commit a89db445fbd7f1f8457b03759aa7343fa530ef6b. I was hasty to include this patch, and it breaks the build on 32 bit. Defence in depth is good but let's do it properly. Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> 14 September 2019, 19:21:51 UTC
36024fc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Don't corrupt xfrm_interface parms before validation, from Nicolas Dichtel. 2) Revert use of usb-wakeup in btusb, from Mario Limonciello. 3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from Leonardo Bras. 4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso. 5) Missing ULP check in sock_map, from John Fastabend. 6) Fix receive statistic handling in forcedeth, from Zhu Yanjun. 7) Fix length of SKB allocated in 6pack driver, from Christophe JAILLET. 8) ip6_route_info_create() returns an error pointer, not NULL. From Maciej Żenczykowski. 9) Only add RDS sock to the hashes after rs_transport is set, from Ka-Cheong Poon. 10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets. 11) Presence of transmit IPSEC offload in an SKB is not tested for correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff Kirsher. 12) Need rcu_barrier() when register_netdevice() takes one of the notifier based failure paths, from Subash Abhinov Kasiviswanathan. 13) Fix leak in sctp_do_bind(), from Mao Wenan. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) cdc_ether: fix rndis support for Mediatek based smartphones sctp: destroy bucket if failed to bind addr sctp: remove redundant assignment when call sctp_get_port_local sctp: change return type of sctp_get_port_local ixgbevf: Fix secpath usage for IPsec Tx offload sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()' ixgbe: Fix secpath usage for IPsec TX offload. net: qrtr: fix memort leak in qrtr_tun_write_iter net: Fix null de-reference of device refcount ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()' tun: fix use-after-free when register netdev failed tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR ixgbe: fix double clean of Tx descriptors with xdp ixgbe: Prevent u8 wrapping of ITR value to something less than 10us mlx4: fix spelling mistake "veify" -> "verify" net: hns3: fix spelling mistake "undeflow" -> "underflow" net: lmc: fix spelling mistake "runnin" -> "running" NFC: st95hf: fix spelling mistake "receieve" -> "receive" net/rds: An rds_sock is added too early to the hash table mac80211: Do not send Layer 2 Update frame before authorization ... 14 September 2019, 19:20:38 UTC
1c4c5e2 Merge tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: - tmio: Fixup runtime PM management during probe and remove - sdhci-pci-o2micro: Fix eMMC initialization for an AMD SoC - bcm2835: Prevent lockups when terminating work * tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: tmio: Fixup runtime PM management during remove mmc: tmio: Fixup runtime PM management during probe Revert "mmc: tmio: move runtime PM enablement to the driver implementations" Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller" Revert "mmc: bcm2835: Terminate timeout work synchronously" 14 September 2019, 19:08:19 UTC
592b8d8 Merge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "From the maintainer summit, just some last minute fixes for final: lima: - fix gem_wait ioctl core: - constify modes list i915: - DP MST high color depth regression - GPU hangs on vulkan compute workloads" * tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm: drm/lima: fix lima_gem_wait() return value drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+ drm/i915: Limit MST to <= 8bpc once again drm/modes: Make the whitelist more const 14 September 2019, 18:54:57 UTC
a9c20bb Merge tag 'kvm-s390-master-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fixes for 5.3 - prevent a user triggerable oops in the migration code - do not leak kernel stack content 14 September 2019, 07:25:30 UTC
002c5f7 KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot James Harvey reported a livelock that was introduced by commit d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot""). The livelock occurs because kvm_mmu_zap_all() as it exists today will voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs to add shadow pages. With enough vCPUs, kvm_mmu_zap_all() can get stuck in an infinite loop as it can never zap all pages before observing lock contention or the need to reschedule. The equivalent of kvm_mmu_zap_all() that was in use at the time of the reverted commit (4e103134b8623, "KVM: x86/mmu: Zap only the relevant pages when removing a memslot") employed a fast invalidate mechanism and was not susceptible to the above livelock. There are three ways to fix the livelock: - Reverting the revert (commit d012a06ab1d23) is not a viable option as the revert is needed to fix a regression that occurs when the guest has one or more assigned devices. It's unlikely we'll root cause the device assignment regression soon enough to fix the regression timely. - Remove the conditional reschedule from kvm_mmu_zap_all(). However, although removing the reschedule would be a smaller code change, it's less safe in the sense that the resulting kvm_mmu_zap_all() hasn't been used in the wild for flushing memslots since the fast invalidate mechanism was introduced by commit 6ca18b6950f8d ("KVM: x86: use the fast way to invalidate all pages"), back in 2013. - Reintroduce the fast invalidate mechanism and use it when zapping shadow pages in response to a memslot being deleted/moved, which is what this patch does. For all intents and purposes, this is a revert of commit ea145aacf4ae8 ("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of commit 7390de1e99a70 ("Revert "KVM: x86: use the fast way to invalidate all pages""), i.e. restores the behavior of commit 5304b8d37c2a5 ("KVM: MMU: fast invalidate all pages") and commit 6ca18b6950f8d ("KVM: x86: use the fast way to invalidate all pages") respectively. Fixes: d012a06ab1d23 ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"") Reported-by: James Harvey <jamespharvey20@gmail.com> Cc: Alex Willamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 September 2019, 07:25:11 UTC
541ab2a KVM: x86: work around leak of uninitialized stack contents Emulation of VMPTRST can incorrectly inject a page fault when passed an operand that points to an MMIO address. The page fault will use uninitialized kernel stack memory as the CR2 and error code. The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR exit to userspace; however, it is not an easy fix, so for now just ensure that the error code and CR2 are zero. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Cc: stable@vger.kernel.org [add comment] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 September 2019, 07:25:11 UTC
f7eea63 KVM: nVMX: handle page fault in vmread The implementation of vmread to memory is still incomplete, as it lacks the ability to do vmread to I/O memory just like vmptrst. Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 14 September 2019, 07:25:02 UTC
474efec riscv: modify the Image header to improve compatibility with the ARM64 header Part of the intention during the definition of the RISC-V kernel image header was to lay the groundwork for a future merge with the ARM64 image header. One error during my original review was not noticing that the RISC-V header's "magic" field was at a different size and position than the ARM64's "magic" field. If the existing ARM64 Image header parsing code were to attempt to parse an existing RISC-V kernel image header format, it would see a magic number 0. This is undesirable, since it's our intention to align as closely as possible with the ARM64 header format. Another problem was that the original "res3" field was not being initialized correctly to zero. Address these issues by creating a 32-bit "magic2" field in the RISC-V header which matches the ARM64 "magic" field. RISC-V binaries will store "RSC\x05" in this field. The intention is that the use of the existing 64-bit "magic" field in the RISC-V header will be deprecated over time. Increment the minor version number of the file format to indicate this change, and update the documentation accordingly. Fix the assembler directives in head.S to ensure that reserved fields are properly zero-initialized. Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com> Reported-by: Palmer Dabbelt <palmer@sifive.com> Reviewed-by: Palmer Dabbelt <palmer@sifive.com> Cc: Atish Patra <atish.patra@wdc.com> Cc: Karsten Merker <merker@debian.org> Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t 14 September 2019, 02:03:52 UTC
4d7ffcf cdc_ether: fix rndis support for Mediatek based smartphones A Mediatek based smartphone owner reports problems with USB tethering in Linux. The verbose USB listing shows a rndis_host interface pair (e0/01/03 + 10/00/00), but the driver fails to bind with [ 355.960428] usb 1-4: bad CDC descriptors The problem is a failsafe test intended to filter out ACM serial functions using the same 02/02/ff class/subclass/protocol as RNDIS. The serial functions are recognized by their non-zero bmCapabilities. No RNDIS function with non-zero bmCapabilities were known at the time this failsafe was added. But it turns out that some Wireless class RNDIS functions are using the bmCapabilities field. These functions are uniquely identified as RNDIS by their class/subclass/protocol, so the failing test can safely be disabled. The same applies to the two types of Misc class RNDIS functions. Applying the failsafe to Communication class functions only retains the original functionality, and fixes the problem for the Mediatek based smartphone. Tow examples of CDC functional descriptors with non-zero bmCapabilities from Wireless class RNDIS functions are: 0e8d:000a Mediatek Crosscall Spider X5 3G Phone CDC Header: bcdCDC 1.10 CDC ACM: bmCapabilities 0x0f connection notifications sends break line coding and serial state get/set/clear comm features CDC Union: bMasterInterface 0 bSlaveInterface 1 CDC Call Management: bmCapabilities 0x03 call management use DataInterface bDataInterface 1 and 19d2:1023 ZTE K4201-z CDC Header: bcdCDC 1.10 CDC ACM: bmCapabilities 0x02 line coding and serial state CDC Call Management: bmCapabilities 0x03 call management use DataInterface bDataInterface 1 CDC Union: bMasterInterface 0 bSlaveInterface 1 The Mediatek example is believed to apply to most smartphones with Mediatek firmware. The ZTE example is most likely also part of a larger family of devices/firmwares. Suggested-by: Lars Melin <larsm17@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> 13 September 2019, 20:08:13 UTC
ae3b06e Merge branch 'sctp_do_bind-leak' Mao Wenan says: ==================== fix memory leak for sctp_do_bind First two patches are to do cleanup, remove redundant assignment, and change return type of sctp_get_port_local. Third patch is to fix memory leak for sctp_do_bind if failed to bind address. v2: add one patch to change return type of sctp_get_port_local. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 13 September 2019, 20:06:20 UTC
29b99f5 sctp: destroy bucket if failed to bind addr There is one memory leak bug report: BUG: memory leak unreferenced object 0xffff8881dc4c5ec0 (size 40): comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s) hex dump (first 32 bytes): 02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00 ................ f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00 .c=............. backtrace: [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp] [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp] [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp] [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6] [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647 [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline] [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline] [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656 [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296 [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe This is because in sctp_do_bind, if sctp_get_port_local is to create hash bucket successfully, and sctp_add_bind_addr failed to bind address, e.g return -ENOMEM, so memory leak found, it needs to destroy allocated bucket. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 13 September 2019, 20:06:20 UTC
e0e4b8d sctp: remove redundant assignment when call sctp_get_port_local There are more parentheses in if clause when call sctp_get_port_local in sctp_do_bind, and redundant assignment to 'ret'. This patch is to do cleanup. Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 13 September 2019, 20:06:20 UTC
8e2ef6a sctp: change return type of sctp_get_port_local Currently sctp_get_port_local() returns a long which is either 0,1 or a pointer casted to long. It's neither of the callers use the return value since commit 62208f12451f ("net: sctp: simplify sctp_get_port"). Now two callers are sctp_get_port and sctp_do_bind, they actually assumend a casted to an int was the same as a pointer casted to a long, and they don't save the return value just check whether it is zero or non-zero, so it would better change return type from long to int for sctp_get_port_local. Signed-off-by: Mao Wenan <maowenan@huawei.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 13 September 2019, 20:06:20 UTC
8f6617b ixgbevf: Fix secpath usage for IPsec Tx offload Port the same fix for ixgbe to ixgbevf. The ixgbevf driver currently does IPsec Tx offloading based on an existing secpath. However, the secpath can also come from the Rx side, in this case it is misinterpreted for Tx offload and the packets are dropped with a "bad sa_idx" error. Fix this by using the xfrm_offload() function to test for Tx offload. CC: Shannon Nelson <snelson@pensando.io> Fixes: 7f68d4306701 ("ixgbevf: enable VF IPsec offload operations") Reported-by: Jonathan Tooker <jonathan@reliablehosting.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net> 13 September 2019, 13:52:10 UTC
87b5d60 mmc: tmio: Fixup runtime PM management during remove Accessing the device when it may be runtime suspended is a bug, which is the case in tmio_mmc_host_remove(). Let's fix the behaviour. Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> 13 September 2019, 11:49:09 UTC
aa86f1a mmc: tmio: Fixup runtime PM management during probe The tmio_mmc_host_probe() calls pm_runtime_set_active() to update the runtime PM status of the device, as to make it reflect the current status of the HW. This works fine for most cases, but unfortunate not for all. Especially, there is a generic problem when the device has a genpd attached and that genpd have the ->start|stop() callbacks assigned. More precisely, if the driver calls pm_runtime_set_active() during ->probe(), genpd does not get to invoke the ->start() callback for it, which means the HW isn't really fully powered on. Furthermore, in the next phase, when the device becomes runtime suspended, genpd will invoke the ->stop() callback for it, potentially leading to usage count imbalance problems, depending on what's implemented behind the callbacks of course. To fix this problem, convert to call pm_runtime_get_sync() from tmio_mmc_host_probe() rather than pm_runtime_set_active(). Additionally, to avoid bumping usage counters and unnecessary re-initializing the HW the first time the tmio driver's ->runtime_resume() callback is called, introduce a state flag to keeping track of this. Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> 13 September 2019, 11:49:04 UTC
8861474 Revert "mmc: tmio: move runtime PM enablement to the driver implementations" This reverts commit 7ff213193310ef8d0ee5f04f79d791210787ac2c. It turns out that the above commit introduces other problems. For example, calling pm_runtime_set_active() must not be done prior calling pm_runtime_enable() as that makes it fail. This leads to additional problems, such as clock enables being wrongly balanced. Rather than fixing the problem on top, let's start over by doing a revert. Fixes: 7ff213193310 ("mmc: tmio: move runtime PM enablement to the driver implementations") Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> 13 September 2019, 11:48:35 UTC
a7f8961 Merge branch 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "Roman found and fixed a bug in the cgroup2 freezer which allows new child cgroup to escape frozen state" * 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: freezer: fix frozen state inheritance kselftests: cgroup: add freezer mkdir test 13 September 2019, 08:52:01 UTC
1b304a1 Merge tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Here are two fixes, one of them urgent fixing a bug introduced in 5.2 and reported by many users. It took time to identify the root cause, catching the 5.3 release is higly desired also to push the fix to 5.2 stable tree. The bug is a mess up of return values after adding proper error handling and honestly the kind of bug that can cause sleeping disorders until it's caught. My appologies to everybody who was affected. Summary of what could happen: 1) either a hang when committing a transaction, if this happens there's no risk of corruption, still the hang is very inconvenient and can't be resolved without a reboot 2) writeback for some btree nodes may never be started and we end up committing a transaction without noticing that, this is really serious and that will lead to the "parent transid verify failed" messages" * tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix unwritten extent buffers and hangs on future writeback attempts Btrfs: fix assertion failure during fsync and use of stale transaction 13 September 2019, 08:48:47 UTC
97a6136 cgroup: freezer: fix frozen state inheritance If a new child cgroup is created in the frozen cgroup hierarchy (one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup flag should be set. Otherwise if a process will be attached to the child cgroup, it won't become frozen. The problem can be reproduced with the test_cgfreezer_mkdir test. This is the output before this patch: ~/test_freezer ok 1 test_cgfreezer_simple ok 2 test_cgfreezer_tree ok 3 test_cgfreezer_forkbomb Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen not ok 4 test_cgfreezer_mkdir ok 5 test_cgfreezer_rmdir ok 6 test_cgfreezer_migrate ok 7 test_cgfreezer_ptrace ok 8 test_cgfreezer_stopped ok 9 test_cgfreezer_ptraced ok 10 test_cgfreezer_vfork And with this patch: ~/test_freezer ok 1 test_cgfreezer_simple ok 2 test_cgfreezer_tree ok 3 test_cgfreezer_forkbomb ok 4 test_cgfreezer_mkdir ok 5 test_cgfreezer_rmdir ok 6 test_cgfreezer_migrate ok 7 test_cgfreezer_ptrace ok 8 test_cgfreezer_stopped ok 9 test_cgfreezer_ptraced ok 10 test_cgfreezer_vfork Reported-by: Mark Crossen <mcrossen@fb.com> Signed-off-by: Roman Gushchin <guro@fb.com> Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer") Cc: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Tejun Heo <tj@kernel.org> 12 September 2019, 21:04:45 UTC
44e9d30 kselftests: cgroup: add freezer mkdir test Add a new cgroup freezer selftest, which checks that if a cgroup is frozen, their new child cgroups will properly inherit the frozen state. It creates a parent cgroup, freezes it, creates a child cgroup and populates it with a dummy process. Then it checks that both parent and child cgroup are frozen. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Tejun Heo <tj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> 12 September 2019, 21:04:40 UTC
505a8ec Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()" The userptr put_pages can be called from inside try_to_unmap, and so enters with the page lock held on one of the object's backing pages. We cannot take the page lock ourselves for fear of recursion. Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reported-by: Martin Wilck <Martin.Wilck@suse.com> Reported-by: Leo Kraav <leho@kraav.com> Fixes: aa56a292ce62 ("drm/i915/userptr: Acquire the page lock around set_page_dirty()") References: https://bugzilla.kernel.org/show_bug.cgi?id=203317 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 12 September 2019, 13:55:03 UTC
98dcb38 Merge tag 'for-linus-20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux Pull clone3 fix from Christian Brauner: "This is a last-minute bugfix for clone3() that should go in before we release 5.3 with clone3(). clone3() did not verify that the exit_signal argument was set to a valid signal. This can be used to cause a crash by specifying a signal greater than NSIG. e.g. -1. The commit from Eugene adds a check to copy_clone_args_from_user() to verify that the exit signal is limited by CSIGNAL as with legacy clone() and that the signal is valid. With this we don't get the legacy clone behavior were an invalid signal could be handed down and would only be detected and then ignored in do_notify_parent(). Users of clone3() will now get a proper error right when they pass an invalid exit signal. Note, that this is not a change in user-visible behavior since no kernel with clone3() has been released yet" * tag 'for-linus-20190912' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux: fork: block invalid exit signals with clone3() 12 September 2019, 13:50:14 UTC
9521778 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A KVM guest fix, and a kdump kernel relocation errors fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/timer: Force PIT initialization when !X86_FEATURE_ARAT x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors 12 September 2019, 13:47:35 UTC
e6bb711 Merge tag 'drm-misc-fixes-2019-09-12' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.3 final: - Constify modes whitelist harder. - Fix lima driver gem_wait ioctl. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/99e52e7a-d4ce-6a2c-0501-bc559a710955@linux.intel.com 12 September 2019, 13:14:35 UTC
911ad0b Merge tag 'drm-intel-fixes-2019-09-11' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Final drm/i915 fixes for v5.3: - Fox DP MST high color depth regression - Fix GPU hangs on Vulkan compute workloads Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/877e6e27qm.fsf@intel.com 12 September 2019, 13:11:36 UTC
a0eb9ab fork: block invalid exit signals with clone3() Previously, higher 32 bits of exit_signal fields were lost when copied to the kernel args structure (that uses int as a type for the respective field). Moreover, as Oleg has noted, exit_signal is used unchecked, so it has to be checked for sanity before use; for the legacy syscalls, applying CSIGNAL mask guarantees that it is at least non-negative; however, there's no such thing is done in clone3() code path, and that can break at least thread_group_leader. This commit adds a check to copy_clone_args_from_user() to verify that the exit signal is limited by CSIGNAL as with legacy clone() and that the signal is valid. With this we don't get the legacy clone behavior were an invalid signal could be handed down and would only be detected and ignored in do_notify_parent(). Users of clone3() will now get a proper error when they pass an invalid exit signal. Note, that this is not user-visible behavior since no kernel with clone3() has been released yet. The following program will cause a splat on a non-fixed clone3() version and will fail correctly on a fixed version: #define _GNU_SOURCE #include <linux/sched.h> #include <linux/types.h> #include <sched.h> #include <stdio.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> int main(int argc, char *argv[]) { pid_t pid = -1; struct clone_args args = {0}; args.exit_signal = -1; pid = syscall(__NR_clone3, &args, sizeof(struct clone_args)); if (pid < 0) exit(EXIT_FAILURE); if (pid == 0) exit(EXIT_SUCCESS); wait(NULL); exit(EXIT_SUCCESS); } Fixes: 7f192e3cd316 ("fork: add clone3") Reported-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Suggested-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com> Link: https://lore.kernel.org/r/4b38fa4ce420b119a4c6345f42fe3cec2de9b0b5.1568223594.git.esyr@redhat.com [christian.brauner@ubuntu.com: simplify check and rework commit message] Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> 12 September 2019, 12:56:33 UTC
53936b5 KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl When the userspace program runs the KVM_S390_INTERRUPT ioctl to inject an interrupt, we convert them from the legacy struct kvm_s390_interrupt to the new struct kvm_s390_irq via the s390int_to_s390irq() function. However, this function does not take care of all types of interrupts that we can inject into the guest later (see do_inject_vcpu()). Since we do not clear out the s390irq values before calling s390int_to_s390irq(), there is a chance that we copy random data from the kernel stack which could be leaked to the userspace later. Specifically, the problem exists with the KVM_S390_INT_PFAULT_INIT interrupt: s390int_to_s390irq() does not handle it, and the function __inject_pfault_init() later copies irq->u.ext which contains the random kernel stack data. This data can then be leaked either to the guest memory in __deliver_pfault_init(), or the userspace might retrieve it directly with the KVM_S390_GET_IRQ_STATE ioctl. Fix it by handling that interrupt type in s390int_to_s390irq(), too, and by making sure that the s390irq struct is properly pre-initialized. And while we're at it, make sure that s390int_to_s390irq() now directly returns -EINVAL for unknown interrupt types, so that we immediately get a proper error code in case we add more interrupt types to do_inject_vcpu() without updating s390int_to_s390irq() sometime in the future. Cc: stable@vger.kernel.org Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/kvm/20190912115438.25761-1-thuth@redhat.com Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> 12 September 2019, 12:12:21 UTC
b456d72 sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()' The '.exit' functions from 'pernet_operations' structure should be marked as __net_exit, not __net_init. Fixes: 8e2d61e0aed2 ("sctp: fix race on protocol/netns initialization") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 12 September 2019, 11:55:28 UTC
f39b683 ixgbe: Fix secpath usage for IPsec TX offload. The ixgbe driver currently does IPsec TX offloading based on an existing secpath. However, the secpath can also come from the RX side, in this case it is misinterpreted for TX offload and the packets are dropped with a "bad sa_idx" error. Fix this by using the xfrm_offload() function to test for TX offload. Fixes: 592594704761 ("ixgbe: process the Tx ipsec offload") Reported-by: Michael Marley <michael@michaelmarley.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net> 12 September 2019, 11:43:14 UTC
18dfa71 Btrfs: fix unwritten extent buffers and hangs on future writeback attempts The lock_extent_buffer_io() returns 1 to the caller to tell it everything went fine and the callers needs to start writeback for the extent buffer (submit a bio, etc), 0 to tell the caller everything went fine but it does not need to start writeback for the extent buffer, and a negative value if some error happened. When it's about to return 1 it tries to lock all pages, and if a try lock on a page fails, and we didn't flush any existing bio in our "epd", it calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or an error. The page might have been locked elsewhere, not with the goal of starting writeback of the extent buffer, and even by some code other than btrfs, like page migration for example, so it does not mean the writeback of the extent buffer was already started by some other task, so returning a 0 tells the caller (btree_write_cache_pages()) to not start writeback for the extent buffer. Note that epd might currently have either no bio, so flush_write_bio() returns 0 (success) or it might have a bio for another extent buffer with a lower index (logical address). Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the extent buffer and writeback is never started for the extent buffer, future attempts to writeback the extent buffer will hang forever waiting on that bit to be cleared, since it can only be cleared after writeback completes. Such hang is reported with a trace like the following: [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds. [49887.347059] Not tainted 5.2.13-gentoo #2 [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [49887.347062] btrfs-transacti D 0 1752 2 0x80004000 [49887.347064] Call Trace: [49887.347069] ? __schedule+0x265/0x830 [49887.347071] ? bit_wait+0x50/0x50 [49887.347072] ? bit_wait+0x50/0x50 [49887.347074] schedule+0x24/0x90 [49887.347075] io_schedule+0x3c/0x60 [49887.347077] bit_wait_io+0x8/0x50 [49887.347079] __wait_on_bit+0x6c/0x80 [49887.347081] ? __lock_release.isra.29+0x155/0x2d0 [49887.347083] out_of_line_wait_on_bit+0x7b/0x80 [49887.347084] ? var_wake_function+0x20/0x20 [49887.347087] lock_extent_buffer_for_io+0x28c/0x390 [49887.347089] btree_write_cache_pages+0x18e/0x340 [49887.347091] do_writepages+0x29/0xb0 [49887.347093] ? kmem_cache_free+0x132/0x160 [49887.347095] ? convert_extent_bit+0x544/0x680 [49887.347097] filemap_fdatawrite_range+0x70/0x90 [49887.347099] btrfs_write_marked_extents+0x53/0x120 [49887.347100] btrfs_write_and_wait_transaction.isra.4+0x38/0xa0 [49887.347102] btrfs_commit_transaction+0x6bb/0x990 [49887.347103] ? start_transaction+0x33e/0x500 [49887.347105] transaction_kthread+0x139/0x15c So fix this by not overwriting the return value (ret) with the result from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK bit in case flush_write_bio() returns an error, otherwise it will hang any future attempts to writeback the extent buffer, and undo all work done before (set back EXTENT_BUFFER_DIRTY, etc). This is a regression introduced in the 5.2 kernel. Fixes: 2e3c25136adfb ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()") Fixes: f4340622e0226 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up") Reported-by: Zdenek Sojka <zsojka@seznam.cz> Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag> Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t Reported-by: Drazen Kacar <drazen.kacar@oradian.com> Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377 Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> 12 September 2019, 11:37:25 UTC
410f954 Btrfs: fix assertion failure during fsync and use of stale transaction Sometimes when fsync'ing a file we need to log that other inodes exist and when we need to do that we acquire a reference on the inodes and then drop that reference using iput() after logging them. That generally is not a problem except if we end up doing the final iput() (dropping the last reference) on the inode and that inode has a link count of 0, which can happen in a very short time window if the logging path gets a reference on the inode while it's being unlinked. In that case we end up getting the eviction callback, btrfs_evict_inode(), invoked through the iput() call chain which needs to drop all of the inode's items from its subvolume btree, and in order to do that, it needs to join a transaction at the helper function evict_refill_and_join(). However because the task previously started a transaction at the fsync handler, btrfs_sync_file(), it has current->journal_info already pointing to a transaction handle and therefore evict_refill_and_join() will get that transaction handle from btrfs_join_transaction(). From this point on, two different problems can happen: 1) evict_refill_and_join() will often change the transaction handle's block reserve (->block_rsv) and set its ->bytes_reserved field to a value greater than 0. If evict_refill_and_join() never commits the transaction, the eviction handler ends up decreasing the reference count (->use_count) of the transaction handle through the call to btrfs_end_transaction(), and after that point we have a transaction handle with a NULL ->block_rsv (which is the value prior to the transaction join from evict_refill_and_join()) and a ->bytes_reserved value greater than 0. If after the eviction/iput completes the inode logging path hits an error or it decides that it must fallback to a transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a non-zero value from btrfs_log_dentry_safe(), and because of that non-zero value it tries to commit the transaction using a handle with a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes the transaction commit hit an assertion failure at btrfs_trans_release_metadata() because ->bytes_reserved is not zero but the ->block_rsv is NULL. The produced stack trace for that is like the following: [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816 [192922.917553] ------------[ cut here ]------------ [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532! [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G W 5.1.4-btrfs-next-47 #1 [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs] (...) [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286 [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000 [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838 [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000 [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980 [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8 [192922.923200] FS: 00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000 [192922.923579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0 [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [192922.925105] Call Trace: [192922.925505] btrfs_trans_release_metadata+0x10c/0x170 [btrfs] [192922.925911] btrfs_commit_transaction+0x3e/0xaf0 [btrfs] [192922.926324] btrfs_sync_file+0x44c/0x490 [btrfs] [192922.926731] do_fsync+0x38/0x60 [192922.927138] __x64_sys_fdatasync+0x13/0x20 [192922.927543] do_syscall_64+0x60/0x1c0 [192922.927939] entry_SYSCALL_64_after_hwframe+0x49/0xbe (...) [192922.934077] ---[ end trace f00808b12068168f ]--- 2) If evict_refill_and_join() decides to commit the transaction, it will be able to do it, since the nested transaction join only increments the transaction handle's ->use_count reference counter and it does not prevent the transaction from getting committed. This means that after eviction completes, the fsync logging path will be using a transaction handle that refers to an already committed transaction. What happens when using such a stale transaction can be unpredictable, we are at least having a use-after-free on the transaction handle itself, since the transaction commit will call kmem_cache_free() against the handle regardless of its ->use_count value, or we can end up silently losing all the updates to the log tree after that iput() in the logging path, or using a transaction handle that in the meanwhile was allocated to another task for a new transaction, etc, pretty much unpredictable what can happen. In order to fix both of them, instead of using iput() during logging, use btrfs_add_delayed_iput(), so that the logging path of fsync never drops the last reference on an inode, that step is offloaded to a safe context (usually the cleaner kthread). The assertion failure issue was sporadically triggered by the test case generic/475 from fstests, which loads the dm error target while fsstress is running, which lead to fsync failing while logging inodes with -EIO errors and then trying later to commit the transaction, triggering the assertion failure. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> 12 September 2019, 11:37:19 UTC
13a17cc KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset() If userspace doesn't set KVM_MEM_LOG_DIRTY_PAGES on memslot before calling kvm_s390_vm_start_migration(), kernel will oops with: Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 0000000000000000 TEID: 0000000000000483 Fault in home space mode while using kernel ASCE. AS:0000000002a2000b R2:00000001bff8c00b R3:00000001bff88007 S:00000001bff91000 P:000000000000003d Oops: 0004 ilc:2 [#1] SMP ... Call Trace: ([<001fffff804ec552>] kvm_s390_vm_set_attr+0x347a/0x3828 [kvm]) [<001fffff804ecfc0>] kvm_arch_vm_ioctl+0x6c0/0x1998 [kvm] [<001fffff804b67e4>] kvm_vm_ioctl+0x51c/0x11a8 [kvm] [<00000000008ba572>] do_vfs_ioctl+0x1d2/0xe58 [<00000000008bb284>] ksys_ioctl+0x8c/0xb8 [<00000000008bb2e2>] sys_ioctl+0x32/0x40 [<000000000175552c>] system_call+0x2b8/0x2d8 INFO: lockdep is turned off. Last Breaking-Event-Address: [<0000000000dbaf60>] __memset+0xc/0xa0 due to ms->dirty_bitmap being NULL, which might crash the host. Make sure that ms->dirty_bitmap is set before using it or return -EINVAL otherwise. Cc: <stable@vger.kernel.org> Fixes: afdad61615cc ("KVM: s390: Fix storage attributes migration with memory slots") Signed-off-by: Igor Mammedov <imammedo@redhat.com> Link: https://lore.kernel.org/kvm/20190911075218.29153-1-imammedo@redhat.com/ Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> 12 September 2019, 11:09:17 UTC
a21b7f0 net: qrtr: fix memort leak in qrtr_tun_write_iter In qrtr_tun_write_iter the allocated kbuf should be release in case of error or success return. v2 Update: Thanks to David Miller for pointing out the release on success path as well. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 12 September 2019, 10:58:44 UTC
10cc514 net: Fix null de-reference of device refcount In event of failure during register_netdevice, free_netdev is invoked immediately. free_netdev assumes that all the netdevice refcounts have been dropped prior to it being called and as a result frees and clears out the refcount pointer. However, this is not necessarily true as some of the operations in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for invocation after a grace period. The IPv4 callback in_dev_rcu_put tries to access the refcount after free_netdev is called which leads to a null de-reference- 44837.761523: <6> Unable to handle kernel paging request at virtual address 0000004a88287000 44837.761651: <2> pc : in_dev_finish_destroy+0x4c/0xc8 44837.761654: <2> lr : in_dev_finish_destroy+0x2c/0xc8 44837.762393: <2> Call trace: 44837.762398: <2> in_dev_finish_destroy+0x4c/0xc8 44837.762404: <2> in_dev_rcu_put+0x24/0x30 44837.762412: <2> rcu_nocb_kthread+0x43c/0x468 44837.762418: <2> kthread+0x118/0x128 44837.762424: <2> ret_from_fork+0x10/0x1c Fix this by waiting for the completion of the call_rcu() in case of register_netdevice errors. Fixes: 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.") Cc: Sean Tranchetti <stranche@codeaurora.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net> 12 September 2019, 10:55:34 UTC
d23dbc4 ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()' The '.exit' functions from 'pernet_operations' structure should be marked as __net_exit, not __net_init. Fixes: d862e5461423 ("net: ipv6: Implement /proc/net/icmp6.") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net> 12 September 2019, 10:20:33 UTC
77f22f9 tun: fix use-after-free when register netdev failed I got a UAF repport in tun driver when doing fuzzy test: [ 466.269490] ================================================================== [ 466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0 [ 466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699 [ 466.271810] [ 466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427 [ 466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 466.271838] Call Trace: [ 466.271858] dump_stack+0xca/0x13e [ 466.271871] ? tun_chr_read_iter+0x2ca/0x2d0 [ 466.271890] print_address_description+0x79/0x440 [ 466.271906] ? vprintk_func+0x5e/0xf0 [ 466.271920] ? tun_chr_read_iter+0x2ca/0x2d0 [ 466.271935] __kasan_report+0x15c/0x1df [ 466.271958] ? tun_chr_read_iter+0x2ca/0x2d0 [ 466.271976] kasan_report+0xe/0x20 [ 466.271987] tun_chr_read_iter+0x2ca/0x2d0 [ 466.272013] do_iter_readv_writev+0x4b7/0x740 [ 466.272032] ? default_llseek+0x2d0/0x2d0 [ 466.272072] do_iter_read+0x1c5/0x5e0 [ 466.272110] vfs_readv+0x108/0x180 [ 466.299007] ? compat_rw_copy_check_uvector+0x440/0x440 [ 466.299020] ? fsnotify+0x888/0xd50 [ 466.299040] ? __fsnotify_parent+0xd0/0x350 [ 466.299064] ? fsnotify_first_mark+0x1e0/0x1e0 [ 466.304548] ? vfs_write+0x264/0x510 [ 466.304569] ? ksys_write+0x101/0x210 [ 466.304591] ? do_preadv+0x116/0x1a0 [ 466.304609] do_preadv+0x116/0x1a0 [ 466.309829] do_syscall_64+0xc8/0x600 [ 466.309849] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 466.309861] RIP: 0033:0x4560f9 [ 466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 [ 466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127 [ 466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9 [ 466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003 [ 466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000 [ 466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10 [ 466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000 [ 466.323057] [ 466.323064] Allocated by task 2605: [ 466.335165] save_stack+0x19/0x80 [ 466.336240] __kasan_kmalloc.constprop.8+0xa0/0xd0 [ 466.337755] kmem_cache_alloc+0xe8/0x320 [ 466.339050] getname_flags+0xca/0x560 [ 466.340229] user_path_at_empty+0x2c/0x50 [ 466.341508] vfs_statx+0xe6/0x190 [ 466.342619] __do_sys_newstat+0x81/0x100 [ 466.343908] do_syscall_64+0xc8/0x600 [ 466.345303] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 466.347034] [ 466.347517] Freed by task 2605: [ 466.348471] save_stack+0x19/0x80 [ 466.349476] __kasan_slab_free+0x12e/0x180 [ 466.350726] kmem_cache_free+0xc8/0x430 [ 466.351874] putname+0xe2/0x120 [ 466.352921] filename_lookup+0x257/0x3e0 [ 466.354319] vfs_statx+0xe6/0x190 [ 466.355498] __do_sys_newstat+0x81/0x100 [ 466.356889] do_syscall_64+0xc8/0x600 [ 466.358037] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 466.359567] [ 466.360050] The buggy address belongs to the object at ffff888372139100 [ 466.360050] which belongs to the cache names_cache of size 4096 [ 466.363735] The buggy address is located 336 bytes inside of [ 466.363735] 4096-byte region [ffff888372139100, ffff88837213a100) [ 466.367179] The buggy address belongs to the page: [ 466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0 [ 466.371582] flags: 0x2fffff80010200(slab|head) [ 466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00 [ 466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 466.377778] page dumped because: kasan: bad access detected [ 466.379730] [ 466.380288] Memory state around the buggy address: [ 466.381844] ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 466.384009] ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 466.388257] ^ [ 466.390234] ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 466.392512] ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 466.394667] ================================================================== tun_chr_read_iter() accessed the memory which freed by free_netdev() called by tun_set_iff(): CPUA CPUB tun_set_iff() alloc_netdev_mqs() tun_attach() tun_chr_read_iter() tun_get() tun_do_read() tun_ring_recv() register_netdevice() <-- inject error goto err_detach tun_detach_all() <-- set RCV_SHUTDOWN free_netdev() <-- called from err_free_dev path netdev_freemem() <-- free the memory without check refcount (In this path, the refcount cannot prevent freeing the memory of dev, and the memory will be used by dev_put() called by tun_chr_read_iter() on CPUB.) (Break from tun_ring_recv(), because RCV_SHUTDOWN is set) tun_put() dev_put() <-- use the memory freed by netdev_freemem() Put the publishing of tfile->tun after register_netdevice(), so tun_get() won't get the tun pointer that freed by err_detach path if register_netdevice() failed. Fixes: eb0fb363f920 ("tuntap: attach queue 0 before registering netdevice") Reported-by: Hulk Robot <hulkci@huawei.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 12 September 2019, 10:17:26 UTC
ad32b48 Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio fixes from Michael Tsirkin: "Last minute bugfixes. A couple of security things. And an error handling bugfix that is never encountered by most people, but that also makes it kind of safe to push at the last minute, and it helps push the fix to stable a bit sooner" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: make sure log_num < in_num vhost: block speculation of translated descriptors virtio_ring: fix unmap of indirect descriptors 12 September 2019, 10:07:31 UTC
6dcf6a4 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Ingo Molnar: "Fix an initialization bug in the hw-breakpoints, which triggered on the ARM platform" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/hw_breakpoint: Fix arch_hw_breakpoint use-before-initialization 12 September 2019, 10:04:50 UTC
95779fe Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix a race in the IRQ resend mechanism, which can result in a NULL dereference crash" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Prevent NULL pointer dereference in resend_irqs() 12 September 2019, 10:02:00 UTC
840ce8f Merge tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fix from Linus Walleij: "Hopefully last pin control fix: a single patch for some Aspeed problems. The BMCs are much happier now" * tag 'pinctrl-v5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: aspeed: Fix spurious mux failures on the AST2500 12 September 2019, 09:58:47 UTC
9c09f62 Merge tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "I don't really like to send so many fixes at the very last minute, but the bug-sport activity is unpredictable. Four fixes, three are -stable material that will go everywhere, one is for the current cycle: - An ACPI DSDT error fixup of the type we always see and Hans invariably gets to fix. - A OF quirk fix for the current release (v5.3) - Some consistency checks on the userspace ABI. - A memory leak" * tag 'gpio-v5.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist gpiolib: of: fix fallback quirks handling gpio: fix line flag validation in lineevent_create gpio: fix line flag validation in linehandle_create gpio: mockup: add missing single_release() 12 September 2019, 08:53:38 UTC
c143242 pinctrl: aspeed: Fix spurious mux failures on the AST2500 Commit 674fa8daa8c9 ("pinctrl: aspeed-g5: Delay acquisition of regmaps") was determined to be a partial fix to the problem of acquiring the LPC Host Controller and GFX regmaps: The AST2500 pin controller may need to fetch syscon regmaps during expression evaluation as well as when setting mux state. For example, this case is hit by attempting to export pins exposing the LPC Host Controller as GPIOs. An optional eval() hook is added to the Aspeed pinmux operation struct and called from aspeed_sig_expr_eval() if the pointer is set by the SoC-specific driver. This enables the AST2500 to perform the custom action of acquiring its regmap dependencies as required. John Wang tested the fix on an Inspur FP5280G2 machine (AST2500-based) where the issue was found, and I've booted the fix on Witherspoon (AST2500) and Palmetto (AST2400) machines, and poked at relevant pins under QEMU by forcing mux configurations via devmem before exporting GPIOs to exercise the driver. Fixes: 7d29ed88acbb ("pinctrl: aspeed: Read and write bits in LPC and GFX controllers") Fixes: 674fa8daa8c9 ("pinctrl: aspeed-g5: Delay acquisition of regmaps") Reported-by: John Wang <wangzqbj@inspur.com> Tested-by: John Wang <wangzqbj@inspur.com> Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Link: https://lore.kernel.org/r/20190829071738.2523-1-andrew@aj.id.au Signed-off-by: Linus Walleij <linus.walleij@linaro.org> 11 September 2019, 23:08:27 UTC
13d5231 Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2019-09-11 This series contains fixes to ixgbe. Alex fixes up the adaptive ITR scheme for ixgbe which could result in a value that was either 0 or something less than 10 which was causing issues with hardware features, like RSC, that do not function well with ITR values that low. Ilya Maximets fixes the ixgbe driver to limit the number of transmit descriptors to clean by the number of transmit descriptors used in the transmit ring, so that the driver does not try to "double" clean the same descriptors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 23:05:52 UTC
af38d07 tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR Fix tcp_ecn_withdraw_cwr() to clear the correct bit: TCP_ECN_QUEUE_CWR. Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about the behavior of data receivers, and deciding whether to reflect incoming IP ECN CE marks as outgoing TCP th->ece marks. The TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders, and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function is only called from tcp_undo_cwnd_reduction() by data senders during an undo, so it should zero the sender-side state, TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of incoming CE bits on incoming data packets just because outgoing packets were spuriously retransmitted. The bug has been reproduced with packetdrill to manifest in a scenario with RFC3168 ECN, with an incoming data packet with CE bit set and carrying a TCP timestamp value that causes cwnd undo. Before this fix, the IP CE bit was ignored and not reflected in the TCP ECE header bit, and sender sent a TCP CWR ('W') bit on the next outgoing data packet, even though the cwnd reduction had been undone. After this fix, the sender properly reflects the CE bit and does not set the W bit. Note: the bug actually predates 2005 git history; this Fixes footer is chosen to be the oldest SHA1 I have tested (from Sep 2007) for which the patch applies cleanly (since before this commit the code was in a .h file). Fixes: bdf1ee5d3bd3 ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it") Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 22:53:18 UTC
060423b vhost: make sure log_num < in_num The code assumes log_num < in_num everywhere, and that is true as long as in_num is incremented by descriptor iov count, and log_num by 1. However this breaks if there's a zero sized descriptor. As a result, if a malicious guest creates a vring desc with desc.len = 0, it may cause the host kernel to crash by overflowing the log array. This bug can be triggered during the VM migration. There's no need to log when desc.len = 0, so just don't increment log_num in this case. Fixes: 3a4d5c94e959 ("vhost_net: a kernel-level virtio server") Cc: stable@vger.kernel.org Reviewed-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: ruippan <ruippan@tencent.com> Signed-off-by: yongduan <yongduan@tencent.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> 11 September 2019, 19:15:26 UTC
a89db44 vhost: block speculation of translated descriptors iovec addresses coming from vhost are assumed to be pre-validated, but in fact can be speculated to a value out of range. Userspace address are later validated with array_index_nospec so we can be sure kernel info does not leak through these addresses, but vhost must also not leak userspace info outside the allowed memory table to guests. Following the defence in depth principle, make sure the address is not validated out of node range. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Jason Wang <jasowang@redhat.com> 11 September 2019, 19:15:07 UTC
bf280c0 ixgbe: fix double clean of Tx descriptors with xdp Tx code doesn't clear the descriptors' status after cleaning. So, if the budget is larger than number of used elems in a ring, some descriptors will be accounted twice and xsk_umem_complete_tx will move prod_tail far beyond the prod_head breaking the completion queue ring. Fix that by limiting the number of descriptors to clean by the number of used descriptors in the Tx ring. 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like 'ixgbe_xsk_clean_tx_ring()' since we're allowed to directly use 'next_to_clean' and 'next_to_use' indexes. CC: stable@vger.kernel.org Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Tested-by: William Tu <u9012063@gmail.com> Tested-by: Eelco Chaudron <echaudro@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> 11 September 2019, 16:42:18 UTC
377228a ixgbe: Prevent u8 wrapping of ITR value to something less than 10us There were a couple cases where the ITR value generated via the adaptive ITR scheme could exceed 126. This resulted in the value becoming either 0 or something less than 10. Switching back and forth between a value less than 10 and a value greater than 10 can cause issues as certain hardware features such as RSC to not function well when the ITR value has dropped that low. CC: stable@vger.kernel.org Fixes: b4ded8327fea ("ixgbe: Update adaptive ITR algorithm") Reported-by: Gregg Leventhal <gleventhal@janestreet.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> 11 September 2019, 16:39:35 UTC
f4b752a mlx4: fix spelling mistake "veify" -> "verify" There is a spelling mistake in a mlx4_err error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 14:20:04 UTC
c3dc1fa net: hns3: fix spelling mistake "undeflow" -> "underflow" There is a spelling mistake in a .msg literal string. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 14:17:00 UTC
b93fb20 net: lmc: fix spelling mistake "runnin" -> "running" There is a spelling mistake in the lmc_trace message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 14:11:59 UTC
90aa11f NFC: st95hf: fix spelling mistake "receieve" -> "receive" There is a spelling mistake in a dev_err message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 14:07:07 UTC
c5c1a03 net/rds: An rds_sock is added too early to the hash table In rds_bind(), an rds_sock is added to the RDS bind hash table before rs_transport is set. This means that the socket can be found by the receive code path when rs_transport is NULL. And the receive code path de-references rs_transport for congestion update check. This can cause a panic. An rds_sock should not be added to the bind hash table before all the needed fields are set. Reported-by: syzbot+4b4f8163c2e246df3c4c@syzkaller.appspotmail.com Signed-off-by: Ka-Cheong Poon <ka-cheong.poon@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 14:05:40 UTC
3e49317 mac80211: Do not send Layer 2 Update frame before authorization The Layer 2 Update frame is used to update bridges when a station roams to another AP even if that STA does not transmit any frames after the reassociation. This behavior was described in IEEE Std 802.11F-2003 as something that would happen based on MLME-ASSOCIATE.indication, i.e., before completing 4-way handshake. However, this IEEE trial-use recommended practice document was published before RSN (IEEE Std 802.11i-2004) and as such, did not consider RSN use cases. Furthermore, IEEE Std 802.11F-2003 was withdrawn in 2006 and as such, has not been maintained amd should not be used anymore. Sending out the Layer 2 Update frame immediately after association is fine for open networks (and also when using SAE, FT protocol, or FILS authentication when the station is actually authenticated by the time association completes). However, it is not appropriate for cases where RSN is used with PSK or EAP authentication since the station is actually fully authenticated only once the 4-way handshake completes after authentication and attackers might be able to use the unauthenticated triggering of Layer 2 Update frame transmission to disrupt bridge behavior. Fix this by postponing transmission of the Layer 2 Update frame from station entry addition to the point when the station entry is marked authorized. Similarly, send out the VLAN binding update only if the STA entry has already been authorized. Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 13:59:26 UTC
49baa01 Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller" This reverts commit 414126f9e5abf1973c661d24229543a9458fa8ce. This commit broke eMMC storage access on a new consumer MiniPC based on AMD SoC, which has eMMC connected to: 02:00.0 SD Host controller: O2 Micro, Inc. Device 8620 (rev 01) (prog-if 01) Subsystem: O2 Micro, Inc. Device 0002 During probe, several errors are seen including: mmc1: Got data interrupt 0x02000000 even though no data operation was in progress. mmc1: Timeout waiting for hardware interrupt. mmc1: error -110 whilst initialising MMC card Reverting this commit allows the eMMC storage to be detected & usable again. Signed-off-by: Daniel Drake <drake@endlessm.com> Fixes: 414126f9e5ab ("mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> 11 September 2019, 13:57:21 UTC
aea64b5 Revert "mmc: bcm2835: Terminate timeout work synchronously" The commit 37fefadee8bb ("mmc: bcm2835: Terminate timeout work synchronously") causes lockups in case of hardware timeouts due the timeout work also calling cancel_delayed_work_sync() on its own. So revert it. Fixes: 37fefadee8bb ("mmc: bcm2835: Terminate timeout work synchronously") Cc: stable@vger.kernel.org Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> 11 September 2019, 13:57:21 UTC
61f7f7c gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist Another day; another DSDT bug we need to workaround... Since commit ca876c7483b6 ("gpiolib-acpi: make sure we trigger edge events at least once on boot") we call _AEI edge handlers at boot. In some rare cases this causes problems. One example of this is the Minix Neo Z83-4 mini PC, this device has a clear DSDT bug where it has some copy and pasted code for dealing with Micro USB-B connector host/device role switching, while the mini PC does not even have a micro-USB connector. This code, which should not be there, messes with the DDC data pin from the HDMI connector (switching it to GPIO mode) breaking HDMI support. To avoid problems like this, this commit adds a new gpiolib_acpi.run_edge_events_on_boot kernel commandline option, which allows disabling the running of _AEI edge event handlers at boot. The default value is -1/auto which uses a DMI based blacklist, the initial version of this blacklist contains the Neo Z83-4 fixing the HDMI breakage. Cc: stable@vger.kernel.org Cc: Daniel Drake <drake@endlessm.com> Cc: Ian W MORRISON <ianwmorrison@gmail.com> Reported-by: Ian W MORRISON <ianwmorrison@gmail.com> Suggested-by: Ian W MORRISON <ianwmorrison@gmail.com> Fixes: ca876c7483b6 ("gpiolib-acpi: make sure we trigger edge events at least once on boot") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20190827202835.213456-1-hdegoede@redhat.com Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Ian W MORRISON <ianwmorrison@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> 11 September 2019, 09:46:54 UTC
3dfdecc lib/Kconfig: fix OBJAGG in lib/ menu structure Keep the "Library routines" menu intact by moving OBJAGG into it. Otherwise OBJAGG is displayed/presented as an orphan in the various config menus. Fixes: 0a020d416d0a ("lib: introduce initial implementation of object aggregation manager") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Ido Schimmel <idosch@mellanox.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 08:30:10 UTC
49f6c90 net: sonic: replace dev_kfree_skb in sonic_send_packet sonic_send_packet will be processed in irq or non-irq context, so it would better use dev_kfree_skb_any instead of dev_kfree_skb. Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor drivers") Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 08:14:01 UTC
2507e6a wimax: i2400: fix memory leak In i2400m_op_rfkill_sw_toggle cmd buffer should be released along with skb response. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 11 September 2019, 08:10:13 UTC
f794dc2 sctp: fix the missing put_user when dumping transport thresholds This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump a transport thresholds info. Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds. Fixes: 8add543e369d ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> 10 September 2019, 17:32:28 UTC
d4d6ec6 sch_hhf: ensure quantum and hhf_non_hh_weight are non-zero In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero, it would make no progress inside the loop in hhf_dequeue() thus kernel would get stuck. Fix this by checking this corner case in hhf_change(). Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Reported-by: syzbot+bc6297c11f19ee807dc2@syzkaller.appspotmail.com Reported-by: syzbot+041483004a7f45f1f20a@syzkaller.appspotmail.com Reported-by: syzbot+55be5f513bed37fc4367@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Terry Lam <vtlam@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 10 September 2019, 17:31:00 UTC
8b142a0 net_sched: check cops->tcf_block in tc_bind_tclass() At least sch_red and sch_tbf don't implement ->tcf_block() while still have a non-zero tc "class". Instead of adding nop implementations to each of such qdisc's, we can just relax the check of cops->tcf_block() in tc_bind_tclass(). They don't support TC filter anyway. Reported-by: syzbot+21b29db13c065852f64b@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 10 September 2019, 17:28:56 UTC
3120b9a Merge tag 'ipc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull ipc regression fixes from Arnd Bergmann: "Fix ipc regressions from y2038 patches These are two regression fixes for bugs that got introduced during the system call rework that went into linux-5.1 but only bisected and fixed now: - One patch affects semtimedop() on many of the less common 32-bit architectures, this just needs a single-line bugfix. - The other affects only sparc64 and has a slightly more invasive workaround to apply the same change to sparc64 that was done to the generic code used everywhere else" * tag 'ipc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: ipc: fix sparc64 ipc() wrapper ipc: fix semtimedop for generic 32-bit architectures 10 September 2019, 11:34:13 UTC
1dea33e gpiolib: of: fix fallback quirks handling We should only try to execute fallback quirks handling when previous call returned -ENOENT, and not when we did not get -EPROBE_DEFER. The other errors should be treated as hard errors: we did find the GPIO description, but for some reason we failed to handle it properly. The fallbacks should only be executed when previous handlers returned -ENOENT, which means the mapping/description was not found. Also let's remove the explicit deferral handling when iterating through GPIO suffixes: it is not needed anymore as we will not be calling fallbacks for anything but -ENOENT. Fixes: df451f83e1fc ("gpio: of: fix Freescale SPI CS quirk handling") Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Link: https://lore.kernel.org/r/20190903231856.GA165165@dtor-ws Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> 10 September 2019, 10:31:35 UTC
aefde29 Merge tag 'gpio-v5.4-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into fixes gpio: fixes for v5.4 - fix a memory leak in gpio-mockup - fix two flag validation bugs in gpiolib's character device ioctl()'s 10 September 2019, 10:12:04 UTC
94a72b3 bridge/mdb: remove wrong use of NLM_F_MULTI NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end. In fact, NLMSG_DONE is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: 949f1e39a617 ("bridge: mdb: notify on router port add and del") CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> 10 September 2019, 08:10:53 UTC
c8dc559 net/ibmvnic: Fix missing { in __ibmvnic_reset Commit 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue") adds a } without corresponding { causing build break. Fixes: 1c2977c09499 ("net/ibmvnic: free reset work of removed device from queue") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Reviewed-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> 10 September 2019, 07:44:49 UTC
21670bd drm/lima: fix lima_gem_wait() return value drm_gem_reservation_object_wait() returns 0 if it succeeds and -ETIME if it timeouts, but lima driver assumed that 0 is error. Cc: stable@vger.kernel.org Fixes: a1d2a6339961e ("drm/lima: driver for ARM Mali4xx GPUs") Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com> Signed-off-by: Qiang Yu <yuq825@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190908024800.23229-1-anarsoul@gmail.com 10 September 2019, 02:09:00 UTC
56037ca Merge tag 'regulator-fix-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "This is obviouly very late, containing three small and simple driver specific fixes. The main one is the TWL fix, this fixes issues with cpufreq on the PMICs used with BeagleBoard generation OMAP SoCs which had been broken due to changes in the generic OPP code exposing a bug in the regulator driver for these devices causing them to think that OPPs weren't supported on the system. Sorry about sending this so late, I hadn't registered that the TWL issue manifested in cpufreq" * tag 'regulator-fix-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: twl: voltage lists for vdd1/2 on twl4030 regulator: act8945a-regulator: fix ldo register addresses in set_mode hook regulator: slg51000: Fix a couple NULL vs IS_ERR() checks 09 September 2019, 17:58:57 UTC
cf8f169 virtio_ring: fix unmap of indirect descriptors The function virtqueue_add_split() DMA-maps the scatterlist buffers. In case a mapping error occurs the already mapped buffers must be unmapped. This happens by jumping to the 'unmap_release' label. In case of indirect descriptors the release is wrong and may leak kernel memory. Because the implementation assumes that the head descriptor is already mapped it starts iterating over the descriptor list starting from the head descriptor. However for indirect descriptors the head descriptor is never mapped in case of an error. The fix is to initialize the start index with zero in case of indirect descriptors and use the 'desc' pointer directly for iterating over the descriptor chain. Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> 09 September 2019, 14:43:15 UTC
2eb0964 drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+ This bit was fliped on for "syncing dependencies between camera and graphics". BSpec has no recollection why, and it is causing unrecoverable GPU hangs with Vulkan compute workloads. From BSpec, setting bit5 to 0 enables relaxed padding requirements for buffers, 1D and 2D non-array, non-MSAA, non-mip-mapped linear surfaces; and *must* be set to 0h on skl+ to ensure "Out of Bounds" case is suppressed. Reported-by: Jason Ekstrand <jason@jlekstrand.net> Suggested-by: Jason Ekstrand <jason@jlekstrand.net> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110998 Fixes: 8424171e135c ("drm/i915/gen9: h/w w/a: syncing dependencies between camera and graphics") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: denys.kostin@globallogic.com Cc: Jason Ekstrand <jason@jlekstrand.net> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.1+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190904100707.7377-1-chris@chris-wilson.co.uk (cherry picked from commit 9d7b01e93526efe79dbf75b69cc5972b5a4f7b37) Signed-off-by: Jani Nikula <jani.nikula@intel.com> 09 September 2019, 13:10:28 UTC
bb1a71f drm/i915: Limit MST to <= 8bpc once again My attempt at allowing MST to use the higher color depths has regressed some configurations. Apparently people have setups where all MST streams will fit into the DP link with 8bpc but won't fit with higher color depths. What we really should be doing is reducing the bpc for all the streams on the same link until they start to fit. But that requires a bit more work, so in the meantime let's revert back closer to the old behavior and limit MST to at most 8bpc. Cc: stable@vger.kernel.org Cc: Lyude Paul <lyude@redhat.com> Tested-by: Geoffrey Bennett <gmux22@gmail.com> Fixes: f1477219869c ("drm/i915: Remove the 8bpc shackles from DP MST") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111505 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190828102059.2512-1-ville.syrjala@linux.intel.com Reviewed-by: Lyude Paul <lyude@redhat.com> (cherry picked from commit 75427b2a2bffc083d51dec389c235722a9c69b05) Signed-off-by: Jani Nikula <jani.nikula@intel.com> 09 September 2019, 13:07:50 UTC
5ca2f54 gpio: fix line flag validation in lineevent_create lineevent_create should not allow any of GPIOHANDLE_REQUEST_OUTPUT, GPIOHANDLE_REQUEST_OPEN_DRAIN or GPIOHANDLE_REQUEST_OPEN_SOURCE to be set. Fixes: d7c51b47ac11 ("gpio: userspace ABI for reading/writing GPIO lines") Cc: stable <stable@vger.kernel.org> Signed-off-by: Kent Gibson <warthog618@gmail.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> 09 September 2019, 08:04:53 UTC
e95fbc1 gpio: fix line flag validation in linehandle_create linehandle_create should not allow both GPIOHANDLE_REQUEST_INPUT and GPIOHANDLE_REQUEST_OUTPUT to be set. Fixes: d7c51b47ac11 ("gpio: userspace ABI for reading/writing GPIO lines") Cc: stable <stable@vger.kernel.org> Signed-off-by: Kent Gibson <warthog618@gmail.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> 09 September 2019, 08:01:55 UTC
59929d3 gpio: mockup: add missing single_release() When using single_open() for opening, single_release() should be used instead of seq_release(), otherwise there is a memory leak. Fixes: 2a9e27408e12 ("gpio: mockup: rework debugfs interface") Cc: stable <stable@vger.kernel.org> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> 09 September 2019, 07:55:27 UTC
f74c2bb Linux 5.3-rc8 08 September 2019, 20:33:15 UTC
983f700 Merge tag 'compiler-attributes-for-linus-v5.3-rc8' of git://github.com/ojeda/linux Pull section attribute fix from Miguel Ojeda: "Fix Oops in Clang-compiled kernels (Nick Desaulniers)" * tag 'compiler-attributes-for-linus-v5.3-rc8' of git://github.com/ojeda/linux: include/linux/compiler.h: fix Oops for Clang-compiled kernels 08 September 2019, 16:34:55 UTC
def8b72 Merge tag 'gpio-v5.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "All related to the PCA953x driver when handling chips with more than 8 ports, now that works again" * tag 'gpio-v5.3-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: pca953x: use pca953x_read_regs instead of regmap_bulk_read gpio: pca953x: correct type of reg_direction 08 September 2019, 16:30:31 UTC
bfafddd include/linux/compiler.h: fix Oops for Clang-compiled kernels GCC unescapes escaped string section names while Clang does not. Because __section uses the `#` stringification operator for the section name, it doesn't need to be escaped. This fixes an Oops observed in distro's that use systemd and not net.core.bpf_jit_enable=1, when their kernels are compiled with Clang. Link: https://github.com/ClangBuiltLinux/linux/issues/619 Link: https://bugs.llvm.org/show_bug.cgi?id=42950 Link: https://marc.info/?l=linux-netdev&m=156412960619946&w=2 Link: https://lore.kernel.org/lkml/20190904181740.GA19688@gmail.com/ Acked-by: Will Deacon <will@kernel.org> Reported-by: Sedat Dilek <sedat.dilek@gmail.com> Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> [Cherry-picked from the __section cleanup series for 5.3] [Adjusted commit message] Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> 08 September 2019, 12:53:58 UTC
afa8b47 x86/timer: Force PIT initialization when !X86_FEATURE_ARAT KVM guests with commit c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets") applied to guest kernel have been observed to have unusually higher CPU usage with symptoms of increase in vm exits for HLT and MSW_WRITE (MSR_IA32_TSCDEADLINE). This is caused by older QEMUs lacking support for X86_FEATURE_ARAT. lapic clock retains CLOCK_EVT_FEAT_C3STOP and nohz stays inactive. There's no usable broadcast device either. Do the PIT initialization if guest CPU lacks X86_FEATURE_ARAT. On real hardware it shouldn't matter as ARAT and DEADLINE come together. Fixes: c8c4076723da ("x86/timer: Skip PIT initialization on modern chipsets") Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> 08 September 2019, 07:01:15 UTC
950b07c Revert "x86/apic: Include the LDR when clearing out APIC registers" This reverts commit 558682b5291937a70748d36fd9ba757fb25b99ae. Chris Wilson reports that it breaks his CPU hotplug test scripts. In particular, it breaks offlining and then re-onlining the boot CPU, which we treat specially (and the BIOS does too). The symptoms are that we can offline the CPU, but it then does not come back online again: smpboot: CPU 0 is now offline smpboot: Booting Node 0 Processor 0 APIC 0x0 smpboot: do_boot_cpu failed(-1) to wakeup CPU#0 Thomas says he knows why it's broken (my personal suspicion: our magic handling of the "cpu0_logical_apicid" thing), but for 5.3 the right fix is to just revert it, since we've never touched the LDR bits before, and it's not worth the risk to do anything else at this stage. [ Hotpluging of the boot CPU is special anyway, and should be off by default. See the "BOOTPARAM_HOTPLUG_CPU0" config option and the cpu0_hotplug kernel parameter. In general you should not do it, and it has various known limitations (hibernate and suspend require the boot CPU, for example). But it should work, even if the boot CPU is special and needs careful treatment - Linus ] Link: https://lore.kernel.org/lkml/156785100521.13300.14461504732265570003@skylake-alporthouse-com/ Reported-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bandan Das <bsd@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 07 September 2019, 21:25:54 UTC
fb377eb ipc: fix sparc64 ipc() wrapper Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl to a commit from my y2038 series in linux-5.1, as I missed the custom sys_ipc() wrapper that sparc64 uses in place of the generic version that I patched. The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel now do not allow being called with the IPC_64 flag any more, resulting in a -EINVAL error when they don't recognize the command. Instead, the correct way to do this now is to call the internal ksys_old_{sem,shm,msg}ctl() functions to select the API version. As we generally move towards these functions anyway, change all of sparc_ipc() to consistently use those in place of the sys_*() versions, and move the required ksys_*() declarations into linux/syscalls.h The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link errors when ipc is disabled. Reported-by: Matt Turner <mattst88@gmail.com> Fixes: 275f22148e87 ("ipc: rename old-style shmctl/semctl/msgctl syscalls") Cc: stable@vger.kernel.org Tested-by: Matt Turner <mattst88@gmail.com> Tested-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> 07 September 2019, 19:42:25 UTC
b3a9964 Merge tag 'char-misc-5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull Documentation updates from Greg KH: "A few small patches for the documenation file that came in through the char-misc tree in -rc7 for your tree. They fix the mistake in the .rst format that kept the table of companies from showing up in the html output, and most importantly, add people's names to the list showing support for our process" * tag 'char-misc-5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Documentation/process: Add Qualcomm process ambassador for hardware security issues Documentation/process/embargoed-hardware-issues: Microsoft ambassador Documentation/process: Add Google contact for embargoed hardware issues Documentation/process: Volunteer as the ambassador for Xen 07 September 2019, 18:48:28 UTC
a8e0aba Documentation/process: Add Qualcomm process ambassador for hardware security issues Add Trilok Soni as process ambassador for hardware security issues from Qualcomm. Signed-off-by: Trilok Soni <tsoni@codeaurora.org> Link: https://lore.kernel.org/r/1567796517-8964-1-git-send-email-tsoni@codeaurora.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 07 September 2019, 17:30:54 UTC
d3464cc Merge tag 'dmaengine-fix-5.3' of git://git.infradead.org/users/vkoul/slave-dma Pull dmaengine fixes from Vinod Koul: "Some late fixes for drivers: - memory leak in ti crossbar dma driver - cleanup of omap dma probe - Fix for link list configuration in sprd dma driver - Handling fixed for DMACHCLR if iommu is mapped in rcar dma" * tag 'dmaengine-fix-5.3' of git://git.infradead.org/users/vkoul/slave-dma: dmaengine: rcar-dmac: Fix DMACHCLR handling if iommu is mapped dmaengine: sprd: Fix the DMA link-list configuration dmaengine: ti: omap-dma: Add cleanup in omap_dma_probe() dmaengine: ti: dma-crossbar: Fix a memory leak bug 07 September 2019, 17:00:34 UTC
28abe57 nfp: flower: cmsg rtnl locks can timeout reify messages Flower control message replies are handled in different locations. The truly high priority replies are handled in the BH (tasklet) context, while the remaining replies are handled in a predefined Linux work queue. The work queue handler orders replies into high and low priority groups, and always start servicing the high priority replies within the received batch first. Reply Type: Rtnl Lock: Handler: CMSG_TYPE_PORT_MOD no BH tasklet (mtu) CMSG_TYPE_TUN_NEIGH no BH tasklet CMSG_TYPE_FLOW_STATS no BH tasklet CMSG_TYPE_PORT_REIFY no WQ high CMSG_TYPE_PORT_MOD yes WQ high (link/mtu) CMSG_TYPE_MERGE_HINT yes WQ low CMSG_TYPE_NO_NEIGH no WQ low CMSG_TYPE_ACTIVE_TUNS no WQ low CMSG_TYPE_QOS_STATS no WQ low CMSG_TYPE_LAG_CONFIG no WQ low A subset of control messages can block waiting for an rtnl lock (from both work queue priority groups). The rtnl lock is heavily contended for by external processes such as systemd-udevd, systemd-network and libvirtd, especially during netdev creation, such as when flower VFs and representors are instantiated. Kernel netlink instrumentation shows that external processes (such as systemd-udevd) often use successive rtnl_trylock() sequences, which can result in an rtnl_lock() blocked control message to starve for longer periods of time during rtnl lock contention, i.e. netdev creation. In the current design a single blocked control message will block the entire work queue (both priorities), and introduce a latency which is nondeterministic and dependent on system wide rtnl lock usage. In some extreme cases, one blocked control message at exactly the wrong time, just before the maximum number of VFs are instantiated, can block the work queue for long enough to prevent VF representor REIFY replies from getting handled in time for the 40ms timeout. The firmware will deliver the total maximum number of REIFY message replies in around 300us. Only REIFY and MTU update messages require replies within a timeout period (of 40ms). The MTU-only updates are already done directly in the BH (tasklet) handler. Move the REIFY handler down into the BH (tasklet) in order to resolve timeouts caused by a blocked work queue waiting on rtnl locks. Signed-off-by: Fred Lotter <frederik.lotter@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 September 2019, 16:05:50 UTC
3dcbdb1 net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list Historically, support for frag_list packets entering skb_segment() was limited to frag_list members terminating on exact same gso_size boundaries. This is verified with a BUG_ON since commit 89319d3801d1 ("net: Add frag_list support to skb_segment"), quote: As such we require all frag_list members terminate on exact MSS boundaries. This is checked using BUG_ON. As there should only be one producer in the kernel of such packets, namely GRO, this requirement should not be difficult to maintain. However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"), the "exact MSS boundaries" assumption no longer holds: An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but leaves the frag_list members as originally merged by GRO with the original 'gso_size'. Example of such programs are bpf-based NAT46 or NAT64. This lead to a kernel BUG_ON for flows involving: - GRO generating a frag_list skb - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room() - skb_segment() of the skb See example BUG_ON reports in [0]. In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"), skb_segment() was modified to support the "gso_size mangling" case of a frag_list GRO'ed skb, but *only* for frag_list members having head_frag==true (having a page-fragment head). Alas, GRO packets having frag_list members with a linear kmalloced head (head_frag==false) still hit the BUG_ON. This commit adds support to skb_segment() for a 'head_skb' packet having a frag_list whose members are *non* head_frag, with gso_size mangled, by disabling SG and thus falling-back to copying the data from the given 'head_skb' into the generated segmented skbs - as suggested by Willem de Bruijn [1]. Since this approach involves the penalty of skb_copy_and_csum_bits() when building the segments, care was taken in order to enable this solution only when required: - untrusted gso_size, by testing SKB_GSO_DODGY is set (SKB_GSO_DODGY is set by any gso_size mangling functions in net/core/filter.c) - the frag_list is non empty, its item is a non head_frag, *and* the headlen of the given 'head_skb' does not match the gso_size. [0] https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/ https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/ [1] https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/ Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 September 2019, 15:58:48 UTC
8652f17 ipv6: addrconf_f6i_alloc - fix non-null pointer check to !IS_ERR() Fixes a stupid bug I recently introduced... ip6_route_info_create() returns an ERR_PTR(err) and not a NULL on error. Fixes: d55a2e374a94 ("net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)'") Cc: David Ahern <dsahern@gmail.com> Cc: Lorenzo Colitti <lorenzo@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 September 2019, 15:46:44 UTC
fe163e5 isdn/capi: check message length in capi_write() syzbot reported: BUG: KMSAN: uninit-value in capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 CPU: 0 PID: 10025 Comm: syz-executor379 Not tainted 4.20.0-rc7+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313 capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700 do_loop_readv_writev fs/read_write.c:703 [inline] do_iter_write+0x83e/0xd80 fs/read_write.c:961 vfs_writev fs/read_write.c:1004 [inline] do_writev+0x397/0x840 fs/read_write.c:1039 __do_sys_writev fs/read_write.c:1112 [inline] __se_sys_writev+0x9b/0xb0 fs/read_write.c:1109 __x64_sys_writev+0x4a/0x70 fs/read_write.c:1109 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 [...] The problem is that capi_write() is reading past the end of the message. Fix it by checking the message's length in the needed places. Reported-and-tested-by: syzbot+0849c524d9c634f5ae66@syzkaller.appspotmail.com Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 September 2019, 15:44:25 UTC
1c2977c net/ibmvnic: free reset work of removed device from queue Commit 36f1031c51a2 ("ibmvnic: Do not process reset during or after device removal") made the change to exit reset if the driver has been removed, but does not free reset work items of the adapter from queue. Ensure all reset work items are freed when breaking out of the loop early. Fixes: 36f1031c51a2 ("ibmnvic: Do not process reset during or after device removal”) Signed-off-by: Juliet Kim <julietk@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 September 2019, 15:36:14 UTC
63b2ed4 net: phylink: Fix flow control resolution Regarding to IEEE 802.3-2015 standard section 2 28B.3 Priority resolution - Table 28-3 - Pause resolution In case of Local device Pause=1 AsymDir=0, Link partner Pause=1 AsymDir=1, Local device resolution should be enable PAUSE transmit, disable PAUSE receive. And in case of Local device Pause=1 AsymDir=1, Link partner Pause=1 AsymDir=0, Local device resolution should be enable PAUSE receive, disable PAUSE transmit. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Stefan Chulski <stefanc@marvell.com> Reported-by: Shaul Ben-Mayor <shaulb@marvell.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> 07 September 2019, 15:26:13 UTC
b82573f net/hamradio/6pack: Fix the size of a sk_buff used in 'sp_bump()' We 'allocate' 'count' bytes here. In fact, 'dev_alloc_skb' already add some extra space for padding, so a bit more is allocated. However, we use 1 byte for the KISS command, then copy 'count' bytes, so count+1 bytes. Explicitly allocate and use 1 more byte to be safe. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net> 07 September 2019, 13:46:28 UTC
back to top