https://github.com/torvalds/linux

sort by:
Revision Author Date Message Commit Date
6422251 Merge tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm Pull drm fixes from Dave Airlie: "Nothing too crazy at the end of the cycle, the kmb modesetting fixes are probably a bit large but it's not a major driver, and its fixing monitor doesn't turn on type problems. Otherwise it's just a few minor patches, one ast regression revert, an msm power stability fix. ast: - fix regression with connector detect msm: - fix power stability issue msxfb: - fix crash on unload panel: - sync fix kmb: - modesetting fixes" * tag 'drm-fixes-2021-10-22' of git://anongit.freedesktop.org/drm/drm: Revert "drm/ast: Add detect function support" drm/kmb: Enable ADV bridge after modeset drm/kmb: Corrected typo in handle_lcd_irq drm/kmb: Disable change of plane parameters drm/kmb: Remove clearing DPHY regs drm/kmb: Limit supported mode to 1080p drm/kmb: Work around for higher system clock drm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel drm: mxsfb: Fix NULL pointer dereference crash on unload drm/msm/devfreq: Restrict idle clamping to a618 for now 22 October 2021, 05:06 UTC
658aafc memblock: exclude MEMBLOCK_NOMAP regions from kmemleak Vladimir Zapolskiy reports: Commit a7259df76702 ("memblock: make memblock_find_in_range method private") invokes a kernel panic while running kmemleak on OF platforms with nomaped regions: Unable to handle kernel paging request at virtual address fff000021e00000 [...] scan_block+0x64/0x170 scan_gray_list+0xe8/0x17c kmemleak_scan+0x270/0x514 kmemleak_write+0x34c/0x4ac The memory allocated from memblock is registered with kmemleak, but if it is marked MEMBLOCK_NOMAP it won't have linear map entries so an attempt to scan such areas will fault. Ideally, memblock_mark_nomap() would inform kmemleak to ignore MEMBLOCK_NOMAP memory, but it can be called before kmemleak interfaces operating on physical addresses can use __va() conversion. Make sure that functions that mark allocated memory as MEMBLOCK_NOMAP take care of informing kmemleak to ignore such memory. Link: https://lore.kernel.org/all/8ade5174-b143-d621-8c8e-dc6a1898c6fb@linaro.org Link: https://lore.kernel.org/all/c30ff0a2-d196-c50d-22f0-bd50696b1205@quicinc.com Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private") Reported-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org> Tested-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 22 October 2021, 04:30 UTC
6c9a545 Revert "memblock: exclude NOMAP regions from kmemleak" Commit 6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak") breaks boot on EFI systems with kmemleak and VM_DEBUG enabled: efi: Processing EFI memory map: efi: 0x000090000000-0x000091ffffff [Conventional| | | | | | | | | | |WB|WT|WC|UC] efi: 0x000092000000-0x0000928fffff [Runtime Data|RUN| | | | | | | | | |WB|WT|WC|UC] ------------[ cut here ]------------ kernel BUG at mm/kmemleak.c:1140! Internal error: Oops - BUG: 0 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc6-next-20211019+ #104 pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : kmemleak_free_part_phys+0x64/0x8c lr : kmemleak_free_part_phys+0x38/0x8c sp : ffff800011eafbc0 x29: ffff800011eafbc0 x28: 1fffff7fffb41c0d x27: fffffbfffda0e068 x26: 0000000092000000 x25: 1ffff000023d5f94 x24: ffff800011ed84d0 x23: ffff800011ed84c0 x22: ffff800011ed83d8 x21: 0000000000900000 x20: ffff800011782000 x19: 0000000092000000 x18: ffff800011ee0730 x17: 0000000000000000 x16: 0000000000000000 x15: 1ffff0000233252c x14: ffff800019a905a0 x13: 0000000000000001 x12: ffff7000023d5ed7 x11: 1ffff000023d5ed6 x10: ffff7000023d5ed6 x9 : dfff800000000000 x8 : ffff800011eaf6b7 x7 : 0000000000000001 x6 : ffff800011eaf6b0 x5 : 00008ffffdc2a12a x4 : ffff7000023d5ed7 x3 : 1ffff000023dbf99 x2 : 1ffff000022f0463 x1 : 0000000000000000 x0 : ffffffffffffffff Call trace: kmemleak_free_part_phys+0x64/0x8c memblock_mark_nomap+0x5c/0x78 reserve_regions+0x294/0x33c efi_init+0x2d0/0x490 setup_arch+0x80/0x138 start_kernel+0xa0/0x3ec __primary_switched+0xc0/0xc8 Code: 34000041 97d526e7 f9418e80 36000040 (d4210000) random: get_random_bytes called from print_oops_end_marker+0x34/0x80 with crng_init=0 ---[ end trace 0000000000000000 ]--- The crash happens because kmemleak_free_part_phys() tries to use __va() before memstart_addr is initialized and this triggers a VM_BUG_ON() in arch/arm64/include/asm/memory.h: Revert 6e44bd6d34d6 ("memblock: exclude NOMAP regions from kmemleak"), the issue it is fixing will be fixed differently. Reported-by: Qian Cai <quic_qiancai@quicinc.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 22 October 2021, 04:30 UTC
9d235ac Merge branch 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucounts fixes from Eric Biederman: "There has been one very hard to track down bug in the ucount code that we have been tracking since roughly v5.14 was released. Alex managed to find a reliable reproducer a few days ago and then I was able to instrument the code and figure out what the issue was. It turns out the sigqueue_alloc single atomic operation optimization did not play nicely with ucounts multiple level rlimits. It turned out that either sigqueue_alloc or sigqueue_free could be operating on multiple levels and trigger the conditions for the optimization on more than one level at the same time. To deal with that situation I have introduced inc_rlimit_get_ucounts and dec_rlimit_put_ucounts that just focuses on the optimization and the rlimit and ucount changes. While looking into the big bug I found I couple of other little issues so I am including those fixes here as well. When I have time I would very much like to dig into process ownership of the shared signal queue and see if we could pick a single owner for the entire queue so that all of the rlimits can count to that owner. That should entirely remove the need to call get_ucounts and put_ucounts in sigqueue_alloc and sigqueue_free. It is difficult because Linux unlike POSIX supports setuid that works on a single thread" * 'ucount-fixes-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring ucounts: Proper error handling in set_cred_ucounts ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds ucounts: Fix signal ucount refcounting 22 October 2021, 03:27 UTC
6c2c712 Merge tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, and can. We'll have one more fix for a socket accounting regression, it's still getting polished. Otherwise things look fine. Current release - regressions: - revert "vrf: reset skb conntrack connection on VRF rcv", there are valid uses for previous behavior - can: m_can: fix iomap_read_fifo() and iomap_write_fifo() Current release - new code bugs: - mlx5: e-switch, return correct error code on group creation failure Previous releases - regressions: - sctp: fix transport encap_port update in sctp_vtag_verify - stmmac: fix E2E delay mechanism (in PTP timestamping) Previous releases - always broken: - netfilter: ip6t_rt: fix out-of-bounds read of ipv6_rt_hdr - netfilter: xt_IDLETIMER: fix out-of-bound read caused by lack of init - netfilter: ipvs: make global sysctl read-only in non-init netns - tcp: md5: fix selection between vrf and non-vrf keys - ipv6: count rx stats on the orig netdev when forwarding - bridge: mcast: use multicast_membership_interval for IGMPv3 - can: - j1939: fix UAF for rx_kref of j1939_priv abort sessions on receiving bad messages - isotp: fix TX buffer concurrent access in isotp_sendmsg() fix return error on FC timeout on TX path - ice: fix re-init of RDMA Tx queues and crash if RDMA was not inited - hns3: schedule the polling again when allocation fails, prevent stalls - drivers: add missing of_node_put() when aborting for_each_available_child_of_node() - ptp: fix possible memory leak and UAF in ptp_clock_register() - e1000e: fix packet loss in burst mode on Tiger Lake and later - mlx5e: ipsec: fix more checksum offload issues" * tag 'net-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits) usbnet: sanity check for maxpacket net: enetc: make sure all traffic classes can send large frames net: enetc: fix ethtool counter name for PM0_TERR ptp: free 'vclock_index' in ptp_clock_release() sfc: Don't use netif_info before net_device setup sfc: Export fibre-specific supported link modes net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags net/mlx5e: IPsec: Fix a misuse of the software parser's fields net/mlx5e: Fix vlan data lost during suspend flow net/mlx5: E-switch, Return correct error code on group creation failure net/mlx5: Lag, change multipath and bonding to be mutually exclusive ice: Add missing E810 device ids igc: Update I226_K device ID e1000e: Fix packet loss on Tiger Lake and later e1000e: Separate TGP board type from SPT ptp: Fix possible memory leak in ptp_clock_register() net: stmmac: Fix E2E delay mechanism nfc: st95hf: Make spi remove() callback return zero net: hns3: disable sriov before unload hclge layer net: hns3: fix vf reset workqueue cannot exit ... 22 October 2021, 01:36 UTC
0a3221b Merge tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a bug exposed by a previous fix, where running guests with certain SMT topologies could crash the host on Power8. - Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is enabled. Thanks to Nathan Lynch, Srikar Dronamraju, and Valentin Schneider. * tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/smp: do not decrement idle task preempt count in CPU offline powerpc/idle: Don't corrupt back chain when going idle 22 October 2021, 01:30 UTC
595cb5e Revert "drm/ast: Add detect function support" This reverts commit aae74ff9caa8de9a45ae2e46068c417817392a26, since it prevents my AMD Milan system from booting, with: [ 27.189558] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 27.197506] #PF: supervisor write access in kernel mode [ 27.203333] #PF: error_code(0x0002) - not-present page [ 27.209064] PGD 0 P4D 0 [ 27.211885] Oops: 0002 [#1] PREEMPT SMP NOPTI [ 27.216744] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc6+ #15 [ 27.223928] Hardware name: AMD Corporation ETHANOL_X/ETHANOL_X, BIOS RXM1006B 08/20/2021 [ 27.232955] RIP: 0010:run_timer_softirq+0x38b/0x4a0 [ 27.238397] Code: 4c 89 f7 e8 37 27 ac 00 49 c7 46 08 00 00 00 00 49 8b 04 24 48 85 c0 74 71 4d 8b 3c 24 4d 89 7e 08 66 90 49 8b 07 49 8b 57 08 <48> 89 02 48 85 c0 74 04 48 89 50 08 49 8b 77 18 41 f6 47 22 20 4c [ 27.259350] RSP: 0018:ffffc42d00003ee8 EFLAGS: 00010086 [ 27.265176] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000101 [ 27.273134] RDX: 0000000000000000 RSI: 0000000000000087 RDI: 0000000000000001 [ 27.281084] RBP: ffffc42d00003f70 R08: 0000000000000000 R09: 00000000000003eb [ 27.289043] R10: ffffa0860cb300d0 R11: ffffa0c44de290b0 R12: ffffc42d00003ef8 [ 27.297002] R13: 00000000fffef200 R14: ffffa0c44de18dc0 R15: ffffa0867a882350 [ 27.304961] FS: 0000000000000000(0000) GS:ffffa0c44de00000(0000) knlGS:0000000000000000 [ 27.313988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 27.320396] CR2: 0000000000000000 CR3: 000000014569c001 CR4: 0000000000770ef0 [ 27.328346] PKRU: 55555554 [ 27.331359] Call Trace: [ 27.334073] <IRQ> [ 27.336314] ? __queue_work+0x420/0x420 [ 27.340589] ? lapic_next_event+0x21/0x30 [ 27.345060] ? clockevents_program_event+0x8f/0xe0 [ 27.350402] __do_softirq+0xfb/0x2db [ 27.354388] irq_exit_rcu+0x98/0xd0 [ 27.358275] sysvec_apic_timer_interrupt+0xac/0xd0 [ 27.363620] </IRQ> [ 27.365955] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 27.371685] RIP: 0010:cpuidle_enter_state+0xcc/0x390 [ 27.377292] Code: 3d 01 79 0a 50 e8 44 ed 77 ff 49 89 c6 0f 1f 44 00 00 31 ff e8 f5 f8 77 ff 80 7d d7 00 0f 85 e6 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 17 01 00 00 49 63 c7 4c 2b 75 c8 48 8d 14 40 48 8d [ 27.398243] RSP: 0018:ffffffffb0e03dc8 EFLAGS: 00000246 [ 27.404069] RAX: ffffa0c44de00000 RBX: 0000000000000001 RCX: 000000000000001f [ 27.412028] RDX: 0000000000000000 RSI: ffffffffb0bafc1f RDI: ffffffffb0bbdb81 [ 27.419986] RBP: ffffffffb0e03e00 R08: 00000006549f8f3f R09: ffffffffb1065200 [ 27.427935] R10: ffffa0c44de27ae4 R11: ffffa0c44de27ac4 R12: ffffa0c5634cb000 [ 27.435894] R13: ffffffffb1065200 R14: 00000006549f8f3f R15: 0000000000000001 [ 27.443854] ? cpuidle_enter_state+0xbb/0x390 [ 27.448712] cpuidle_enter+0x2e/0x40 [ 27.452695] call_cpuidle+0x23/0x40 [ 27.456584] do_idle+0x1f0/0x270 [ 27.460181] cpu_startup_entry+0x20/0x30 [ 27.464553] rest_init+0xd4/0xe0 [ 27.468149] arch_call_rest_init+0xe/0x1b [ 27.472619] start_kernel+0x6bc/0x6e2 [ 27.476764] x86_64_start_reservations+0x24/0x26 [ 27.481912] x86_64_start_kernel+0x75/0x79 [ 27.486477] secondary_startup_64_no_verify+0xb0/0xbb [ 27.492111] Modules linked in: kvm_amd(+) kvm ipmi_si(+) ipmi_devintf rapl wmi_bmof ipmi_msghandler input_leds ccp k10temp mac_hid sch_fq_codel msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear ast i2c_algo_bit drm_vram_helper drm_ttm_helper ttm drm_kms_helper crct10dif_pclmul crc32_pclmul ghash_clmulni_intel syscopyarea aesni_intel sysfillrect crypto_simd sysimgblt fb_sys_fops cryptd hid_generic cec nvme ahci usbhid drm e1000e nvme_core hid libahci i2c_piix4 wmi [ 27.551789] CR2: 0000000000000000 [ 27.555482] ---[ end trace 897987dfe93dccc6 ]--- [ 27.560630] RIP: 0010:run_timer_softirq+0x38b/0x4a0 [ 27.566069] Code: 4c 89 f7 e8 37 27 ac 00 49 c7 46 08 00 00 00 00 49 8b 04 24 48 85 c0 74 71 4d 8b 3c 24 4d 89 7e 08 66 90 49 8b 07 49 8b 57 08 <48> 89 02 48 85 c0 74 04 48 89 50 08 49 8b 77 18 41 f6 47 22 20 4c [ 27.587021] RSP: 0018:ffffc42d00003ee8 EFLAGS: 00010086 [ 27.592848] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000101 [ 27.600808] RDX: 0000000000000000 RSI: 0000000000000087 RDI: 0000000000000001 [ 27.608765] RBP: ffffc42d00003f70 R08: 0000000000000000 R09: 00000000000003eb [ 27.616716] R10: ffffa0860cb300d0 R11: ffffa0c44de290b0 R12: ffffc42d00003ef8 [ 27.624673] R13: 00000000fffef200 R14: ffffa0c44de18dc0 R15: ffffa0867a882350 [ 27.632624] FS: 0000000000000000(0000) GS:ffffa0c44de00000(0000) knlGS:0000000000000000 [ 27.641650] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 27.648159] CR2: 0000000000000000 CR3: 000000014569c001 CR4: 0000000000770ef0 [ 27.656119] PKRU: 55555554 [ 27.659133] Kernel panic - not syncing: Fatal exception in interrupt [ 29.030411] Shutting down cpus with NMI [ 29.034699] Kernel Offset: 0x2e600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 29.046790] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Since unreliable, found by bisecting for KASAN's use-after-free in enqueue_timer+0x4f/0x1e0, where the timer callback is called. Reported-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Fixes: aae74ff9caa8 ("drm/ast: Add detect function support") Link: https://lore.kernel.org/lkml/0f7871be-9ca6-5ae4-3a40-5db9a8fb2365@amd.com/ Cc: Ainux <ainux.wang@gmail.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: sterlingteng@gmail.com Cc: chenhuacai@kernel.org Cc: Chuck Lever III <chuck.lever@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Jon Grimm <jon.grimm@amd.com> Cc: dri-devel <dri-devel@lists.freedesktop.org> Cc: linux-kernel <linux-kernel@vger.kernel.org> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211021153006.92983-1-kim.phillips@amd.com 21 October 2021, 19:52 UTC
7e1c544 Merge tag 'drm-misc-fixes-2021-10-21-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.15-rc7: - Rebased, to remove vc4 patches. - Fix mxsfb crash on unload. - Use correct sync parameters for Feixin K101-IM2BYL02. - Assorted kmb modeset/atomic fixes. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/e66eaf89-b9b9-41f5-d0d2-dad7e59fabb5@linux.intel.com 21 October 2021, 19:35 UTC
730b64d Merge tag 'drm-msm-fixes-2021-10-18' of https://gitlab.freedesktop.org/drm/msm into drm-fixes One more fix for v5.15, to work around a power stability issue on a630 (and possibly others) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGs1WPLthmd=ToDcEHm=u-7O38RAVJ2XwRoS8xPmC520vg@mail.gmail.com 21 October 2021, 19:22 UTC
397430b usbnet: sanity check for maxpacket maxpacket of 0 makes no sense and oopses as we need to divide by it. Give up. V2: fixed typo in log and stylistic issues Signed-off-by: Oliver Neukum <oneukum@suse.com> Reported-by: syzbot+76bb1d34ffa0adc03baa@syzkaller.appspotmail.com Reviewed-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20211021122944.21816-1-oneukum@suse.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 21 October 2021, 13:44 UTC
e378f49 net: enetc: make sure all traffic classes can send large frames The enetc driver does not implement .ndo_change_mtu, instead it configures the MAC register field PTC{Traffic Class}MSDUR[MAXSDU] statically to a large value during probe time. The driver used to configure only the max SDU for traffic class 0, and that was fine while the driver could only use traffic class 0. But with the introduction of mqprio, sending a large frame into any other TC than 0 is broken. This patch fixes that by replicating per traffic class the static configuration done in enetc_configure_port_mac(). Fixes: cbe9e835946f ("enetc: Enable TC offloading with mqprio") Reported-by: Richie Pearn <richard.pearn@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20211020173340.1089992-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 21 October 2021, 13:44 UTC
fb8dc5f net: enetc: fix ethtool counter name for PM0_TERR There are two counters named "MAC tx frames", one of them is actually incorrect. The correct name for that counter should be "MAC tx error frames", which is symmetric to the existing "MAC rx error frames". Fixes: 16eb4c85c964 ("enetc: Add ethtool statistics") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: <Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://lore.kernel.org/r/20211020165206.1069889-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 21 October 2021, 13:44 UTC
b6b19a7 ptp: free 'vclock_index' in ptp_clock_release() 'vclock_index' is accessed from sysfs, it shouled be freed in release function, so move it from ptp_clock_unregister() to ptp_clock_release(). Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 21 October 2021, 11:50 UTC
bf6abf3 sfc: Don't use netif_info before net_device setup Use pci_info instead to avoid unnamed/uninitialized noise: [197088.688729] sfc 0000:01:00.0: Solarflare NIC detected [197088.690333] sfc 0000:01:00.0: Part Number : SFN5122F [197088.729061] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no SR-IOV VFs probed [197088.729071] sfc 0000:01:00.0 (unnamed net_device) (uninitialized): no PTP support Inspired by fa44821a4ddd ("sfc: don't use netif_info et al before net_device is registered") from Heiner Kallweit. Signed-off-by: Erik Ekman <erik@kryo.se> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 21 October 2021, 11:39 UTC
c62041c sfc: Export fibre-specific supported link modes The 1/10GbaseT modes were set up for cards with SFP+ cages in 3497ed8c852a5 ("sfc: report supported link speeds on SFP connections"). 10GbaseT was likely used since no 10G fibre mode existed. The missing fibre modes for 1/10G were added to ethtool.h in 5711a9822144 ("net: ethtool: add support for 1000BaseX and missing 10G link modes") shortly thereafter. The user guide available at https://support-nic.xilinx.com/wp/drivers lists support for the following cable and transceiver types in section 2.9: - QSFP28 100G Direct Attach Cables - QSFP28 100G SR Optical Transceivers (with SR4 modules listed) - SFP28 25G Direct Attach Cables - SFP28 25G SR Optical Transceivers - QSFP+ 40G Direct Attach Cables - QSFP+ 40G Active Optical Cables - QSFP+ 40G SR4 Optical Transceivers - QSFP+ to SFP+ Breakout Direct Attach Cables - QSFP+ to SFP+ Breakout Active Optical Cables - SFP+ 10G Direct Attach Cables - SFP+ 10G SR Optical Transceivers - SFP+ 10G LR Optical Transceivers - SFP 1000BASE‐T Transceivers - 1G Optical Transceivers (From user guide issue 28. Issue 16 which also includes older cards like SFN5xxx/SFN6xxx has matching lists for 1/10/40G transceiver types.) Regarding SFP+ 10GBASE‐T transceivers the latest guide says: "Solarflare adapters do not support 10GBASE‐T transceiver modules." Tested using SFN5122F-R7 (with 2 SFP+ ports). Supported link modes do not change depending on module used (tested with 1000BASE-T, 1000BASE-BX10, 10GBASE-LR). Before: $ ethtool ext Settings for ext: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 10000baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Link partner advertised link modes: Not reported Link partner advertised pause frame use: No Link partner advertised auto-negotiation: No Link partner advertised FEC modes: Not reported Speed: 1000Mb/s Duplex: Full Auto-negotiation: off Port: FIBRE PHYAD: 255 Transceiver: internal Current message level: 0x000020f7 (8439) drv probe link ifdown ifup rx_err tx_err hw Link detected: yes After: $ ethtool ext Settings for ext: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 1000baseX/Full 10000baseCR/Full 10000baseSR/Full 10000baseLR/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: No Supported FEC modes: Not reported Advertised link modes: Not reported Advertised pause frame use: No Advertised auto-negotiation: No Advertised FEC modes: Not reported Link partner advertised link modes: Not reported Link partner advertised pause frame use: No Link partner advertised auto-negotiation: No Link partner advertised FEC modes: Not reported Speed: 1000Mb/s Duplex: Full Auto-negotiation: off Port: FIBRE PHYAD: 255 Transceiver: internal Supports Wake-on: g Wake-on: d Current message level: 0x000020f7 (8439) drv probe link ifdown ifup rx_err tx_err hw Link detected: yes Signed-off-by: Erik Ekman <erik@kryo.se> Acked-by: Martin Habets <habetsm.xilinx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 21 October 2021, 11:38 UTC
1439caa Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter fixes for net: 1) Crash due to missing initialization of timer data in xt_IDLETIMER, from Juhee Kang. 2) NF_CONNTRACK_SECMARK should be bool in Kconfig, from Vegard Nossum. 3) Skip netdev events on netns removal, from Florian Westphal. 4) Add testcase to show port shadowing via UDP, also from Florian. 5) Remove pr_debug() code in ip6t_rt, this fixes a crash due to unsafe access to non-linear skbuff, from Xin Long. 6) Make net/ipv4/vs/debug_level read-only from non-init netns, from Antoine Tenart. 7) Remove bogus invocation to bash in selftests/netfilter/nft_flowtable.sh also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 21 October 2021, 11:32 UTC
e0bfcf9 Merge tag 'mlx5-fixes-2021-10-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2021-10-20 ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 21 October 2021, 11:11 UTC
a689702 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-10-20 This series contains updates to e1000e, igc, and ice drivers. Sasha fixes an issue with dropped packets on Tiger Lake platforms for e1000e and corrects a device ID for igc. Tony adds missing E810 device IDs for ice. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 21 October 2021, 11:10 UTC
7405609 drm/kmb: Enable ADV bridge after modeset On KMB, ADV bridge must be programmed and powered on prior to MIPI DSI HW initialization. v2: changed to atomic_bridge_chain_enable (Sam) Fixes: 98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver") Co-developed-by: Edmund Dea <edmund.j.dea@intel.com> Signed-off-by: Edmund Dea <edmund.j.dea@intel.com> Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211019230719.789958-1-anitha.chrisanthus@intel.com Acked-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> 21 October 2021, 09:08 UTC
004d271 drm/kmb: Corrected typo in handle_lcd_irq Check for Overflow bits for layer3 in the irq handler. Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display") Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-5-anitha.chrisanthus@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> 21 October 2021, 09:08 UTC
982f8ad drm/kmb: Disable change of plane parameters Due to HW limitations, KMB cannot change height, width, or pixel format after initial plane configuration. v2: removed memset disp_cfg as it is already zero. Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display") Signed-off-by: Edmund Dea <edmund.j.dea@intel.com> Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-4-anitha.chrisanthus@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> 21 October 2021, 09:08 UTC
13047a0 drm/kmb: Remove clearing DPHY regs Don't clear the shared DPHY registers common to MIPI Rx and MIPI Tx during DSI initialization since this was causing MIPI Rx reset. Rest of the writes are bitwise, so will not affect Mipi Rx side. Fixes: 98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver") Signed-off-by: Edmund Dea <edmund.j.dea@intel.com> Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-3-anitha.chrisanthus@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> 21 October 2021, 09:08 UTC
a79f40c drm/kmb: Limit supported mode to 1080p KMB only supports single resolution(1080p), this commit checks for 1920x1080x60 or 1920x1080x59 in crtc_mode_valid. Also, modes with vfp < 4 are not supported in KMB display. This change prunes display modes with vfp < 4. v2: added vfp check Fixes: 7f7b96a8a0a1 ("drm/kmb: Add support for KeemBay Display") Co-developed-by: Edmund Dea <edmund.j.dea@intel.com> Signed-off-by: Edmund Dea <edmund.j.dea@intel.com> Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Link:https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-2-anitha.chrisanthus@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> 21 October 2021, 09:08 UTC
3e4c31e drm/kmb: Work around for higher system clock Use a different value for system clock offset in the ppl/llp ratio calculations for clocks higher than 500 Mhz. Fixes: 98521f4d4b4c ("drm/kmb: Mipi DSI part of the display driver") Signed-off-by: Anitha Chrisanthus <anitha.chrisanthus@intel.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211013233632.471892-1-anitha.chrisanthus@intel.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> 21 October 2021, 09:08 UTC
7729706 drm/panel: ilitek-ili9881c: Fix sync for Feixin K101-IM2BYL02 panel This adjusts sync values according to the datasheet Fixes: 1c243751c095 ("drm/panel: ilitek-ili9881c: add support for Feixin K101-IM2BYL02 panel") Co-developed-by: Marius Gripsgard <marius@ubports.com> Signed-off-by: Dan Johansen <strit@manjaro.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210818214818.298089-1-strit@manjaro.org Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> 21 October 2021, 09:08 UTC
3cfc183 drm: mxsfb: Fix NULL pointer dereference crash on unload The mxsfb->crtc.funcs may already be NULL when unloading the driver, in which case calling mxsfb_irq_disable() via drm_irq_uninstall() from mxsfb_unload() leads to NULL pointer dereference. Since all we care about is masking the IRQ and mxsfb->base is still valid, just use that to clear and mask the IRQ. Fixes: ae1ed00932819 ("drm: mxsfb: Stop using DRM simple display pipeline helper") Signed-off-by: Marek Vasut <marex@denx.de> Cc: Daniel Abrecht <public@danielabrecht.ch> Cc: Emil Velikov <emil.l.velikov@gmail.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stefan Agner <stefan@agner.ch> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20211016210446.171616-1-marex@denx.de Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> 21 October 2021, 09:08 UTC
2f111a6 Merge tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client Pull ceph fixes from Ilya Dryomov: "Two important filesystem fixes, marked for stable. The blocklisted superblocks issue was particularly annoying because for unexperienced users it essentially exacted a reboot to establish a new functional mount in that scenario" * tag 'ceph-for-5.15-rc7' of git://github.com/ceph/ceph-client: ceph: fix handling of "meta" errors ceph: skip existing superblocks that are blocklisted or shut down when mounting 20 October 2021, 20:23 UTC
515dcc2 Merge tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - fix more dma-debug fallout (Gerald Schaefer, Hamza Mahfooz) - fix a kerneldoc warning (Logan Gunthorpe) * tag 'dma-mapping-5.15-2' of git://git.infradead.org/users/hch/dma-mapping: dma-debug: teach add_dma_entry() about DMA_ATTR_SKIP_CPU_SYNC dma-debug: fix sg checks in debug_dma_map_sg() dma-mapping: fix the kerneldoc for dma_map_sgtable() 20 October 2021, 20:16 UTC
1d00032 net/mlx5e: IPsec: Fix work queue entry ethernet segment checksum flags Current Work Queue Entry (WQE) checksum (csum) flags in the ethernet segment (eseg) in case of IPsec crypto offload datapath are not aligned with PRM/HW expectations. Currently the driver always sets the l3_inner_csum flag in case of IPsec because of the wrong usage of skb->encapsulation as indicator for inner IPsec header since skb->encapsulation is always ON for IPsec packets since IPsec itself is an encapsulation protocol. The above forced a failing attempts of calculating csum of non-existing segments (like in the IP|ESP|TCP packet case which does not have an l3_inner) which led to lots of packet drops hence the low throughput. Fix by using xo->inner_ipproto as indicator for inner IPsec header instead of skb->encapsulation in addition to setting the csum flags as following: * Tunnel Mode: * Pkt: MAC IP ESP IP L4 * CSUM: l3_cs | l3_inner_cs | l4_inner_cs * * Transport Mode: * Pkt: MAC IP ESP L4 * CSUM: l3_cs [ | l4_cs (checksum partial case)] * * Tunnel(VXLAN TCP/UDP) over Transport Mode * Pkt: MAC IP ESP UDP VXLAN IP L4 * CSUM: l3_cs | l3_inner_cs | l4_inner_cs Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 20 October 2021, 17:42 UTC
d10457f net/mlx5e: IPsec: Fix a misuse of the software parser's fields IPsec crypto offload current Software Parser (SWP) fields settings in the ethernet segment (eseg) are not aligned with PRM/HW expectations. Among others in case of IP|ESP|TCP packet, current driver sets the offsets for inner_l3 and inner_l4 although there is no inner l3/l4 headers relative to ESP header in such packets. SWP provides the offsets for HW ,so it can be used to find csum fields to offload the checksum, however these are not necessarily used by HW and are used as fallback in case HW fails to parse the packet, e.g when performing IPSec Transport Aware (IP | ESP | TCP) there is no need to add SW parse on inner packet. So in some cases packets csum was calculated correctly , whereas in other cases it failed. The later faced csum errors (caused by wrong packet length calculations) which led to lots of packet drops hence the low throughput. Fix by setting the SWP fields as expected in a IP|ESP|TCP packet. the following describe the expected SWP offsets: * Tunnel Mode: * SWP: OutL3 InL3 InL4 * Pkt: MAC IP ESP IP L4 * * Transport Mode: * SWP: OutL3 OutL4 * Pkt: MAC IP ESP L4 * * Tunnel(VXLAN TCP/UDP) over Transport Mode * SWP: OutL3 InL3 InL4 * Pkt: MAC IP ESP UDP VXLAN IP L4 Fixes: f1267798c980 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload") Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 20 October 2021, 17:42 UTC
68e66e1 net/mlx5e: Fix vlan data lost during suspend flow During suspend flow the driver calls mlx5e_destroy_vlan_table() which does not only delete the vlans steering flow rules, but also frees the data on currently active vlans, thus it is not restored during resume flow. This fix keeps the vlan data on suspend flow and frees it only on driver remove flow. Fixes: 6783f0a21a3c ("net/mlx5e: Dynamic alloc vlan table for netdev when needed") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 20 October 2021, 17:42 UTC
a6f7433 net/mlx5: E-switch, Return correct error code on group creation failure Dan Carpenter report: The patch f47e04eb96e0: "net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups" from May 31, 2021, leads to the following Smatch static checker warning: drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c:483 esw_qos_create_rate_group() warn: passing zero to 'ERR_PTR' If min rate normalization failed then error code may be overwritten to 0 if scheduling element destruction succeed. Ignore this value and always return initial one. Fixes: f47e04eb96e0 ("net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 20 October 2021, 17:42 UTC
14fe247 net/mlx5: Lag, change multipath and bonding to be mutually exclusive Both multipath and bonding events are changing the HW LAG state independently. Handling one of the features events while the other is already enabled can cause unwanted behavior, for example handling bonding event while multipath enabled will disable the lag and cause multipath to stop working. Fix it by ignoring bonding event while in multipath and ignoring FIB events while in bonding mode. Fixes: 544fe7c2e654 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 20 October 2021, 17:42 UTC
8e37395 Merge tag 'sound-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Again it became bigger than wished, unfortunately, as this contains quite a few ASoC fixes that came up a bit late. It also includes yet more HD- and USB-audio quirks: I decided to merge them now, as those are for stable, and we'll need them sooner or later. Although the volumes are a bit high, all changes are device-specific (and reasonably small) fixes, so it should be safe for the late rc" * tag 'sound-5.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Fix microphone sound on Jieli webcam. ALSA: hda/realtek: Fixes HP Spectre x360 15-eb1xxx speakers ALSA: usb-audio: Provide quirk for Sennheiser GSP670 Headset ALSA: hda/realtek: Add quirk for Clevo PC50HS ALSA: usb-audio: add Schiit Hel device to quirk table ASoC: wm8960: Fix clock configuration on slave mode ASoC: cs42l42: Ensure 0dB full scale volume is used for headsets ASoC: soc-core: fix null-ptr-deref in snd_soc_del_component_unlocked() ASoC: codec: wcd938x: Add irq config support ASoC: DAPM: Fix missing kctl change notifications ASoC: Intel: bytcht_es8316: Utilize dev_err_probe() to avoid log saturation ASoC: Intel: bytcht_es8316: Switch to use gpiod_get_optional() ASoC: Intel: bytcht_es8316: Use temporary variable for struct device ASoC: Intel: bytcht_es8316: Get platform data via dev_get_platdata() ASoC: wcd938x: Fix jack detection issue ASoC: nau8824: Fix headphone vs headset, button-press detection no longer working ASoC: cs4341: Add SPI device ID table ASoC: pcm179x: Add missing entries SPI to device ID table ASoC: fsl_xcvr: Fix channel swap issue with ARC ASoC: pcm512x: Mend accesses to the I2S_1 and I2S_2 registers 20 October 2021, 16:13 UTC
6da52de Merge tag 'audit-pr-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "One small audit patch to add a pointer NULL check" * tag 'audit-pr-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: fix possible null-pointer dereference in audit_filter_rules 20 October 2021, 16:11 UTC
7dcf78b ice: Add missing E810 device ids As part of support for E810 XXV devices, some device ids were inadvertently left out. Add those missing ids. Fixes: 195fb97766da ("ice: add additional E810 device id") Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Paul Menzel <pmenzel@molgen.mpg.de> 20 October 2021, 16:07 UTC
79cc832 igc: Update I226_K device ID The device ID for I226_K was incorrectly assigned, update the device ID to the correct one. Fixes: bfa5e98c9de4 ("igc: Add new device ID") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 20 October 2021, 16:07 UTC
639e298 e1000e: Fix packet loss on Tiger Lake and later Update the HW MAC initialization flow. Do not gate DMA clock from the modPHY block. Keeping this clock will prevent dropped packets sent in burst mode on the Kumeran interface. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213651 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213377 Fixes: fb776f5d57ee ("e1000e: Add support for Tiger Lake") Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Mark Pearson <markpearson@lenovo.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 20 October 2021, 16:06 UTC
fc9b289 Merge tag 'trace-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Recursion fix for tracing. While cleaning up some of the tracing recursion protection logic, I discovered a scenario that the current design would miss, and would allow an infinite recursion. Removing an optimization trick that opened the hole fixes the issue and cleans up the code as well" * tag 'trace-v5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have all levels of checks prevent recursion 20 October 2021, 16:02 UTC
1e59977 Merge tag 'nios2_fixes_for_v5.15_part2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux Pull nios2 fix from Dinh Nguyen: - Renamed CTL_STATUS to CTL_FSTATUS to fix a redefined warning * tag 'nios2_fixes_for_v5.15_part2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: NIOS2: irqflags: rename a redefined register name 20 October 2021, 15:56 UTC
0afe64b Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Tools: - kvm_stat: do not show halt_wait_ns since it is not a cumulative statistic x86: - clean ups and fixes for bus lock vmexit and lazy allocation of rmaps - two fixes for SEV-ES (one more coming as soon as I get reviews) - fix for static_key underflow ARM: - Properly refcount pages used as a concatenated stage-2 PGD - Fix missing unlock when detecting the use of MTE+VM_SHARED" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SEV-ES: reduce ghcb_sa_len to 32 bits KVM: VMX: Remove redundant handling of bus lock vmexit KVM: kvm_stat: do not show halt_wait_ns KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET" KVM: SEV-ES: Set guest_state_protected after VMSA update KVM: X86: fix lazy allocation of rmaps KVM: SEV-ES: fix length of string I/O KVM: arm64: Release mmap_lock when using VM_SHARED with MTE KVM: arm64: Report corrupted refcount at EL2 KVM: arm64: Fix host stage-2 PGD refcount KVM: s390: Function documentation fixes 20 October 2021, 15:52 UTC
280db5d e1000e: Separate TGP board type from SPT We have the same LAN controller on different PCHs. Separate TGP board type from SPT which will allow for specific fixes to be applied for TGP platforms. Suggested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Tested-by: Mark Pearson <markpearson@lenovo.com> Tested-by: Nechama Kraus <nechamax.kraus@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 20 October 2021, 15:51 UTC
5ebcbe3 ucounts: Move get_ucounts from cred_alloc_blank to key_change_session_keyring Setting cred->ucounts in cred_alloc_blank does not make sense. The uid and user_ns are deliberately not set in cred_alloc_blank but instead the setting is delayed until key_change_session_keyring. So move dealing with ucounts into key_change_session_keyring as well. Unfortunately that movement of get_ucounts adds a new failure mode to key_change_session_keyring. I do not see anything stopping the parent process from calling setuid and changing the relevant part of it's cred while keyctl_session_to_parent is running making it fundamentally necessary to call get_ucounts in key_change_session_keyring. Which means that the new failure mode cannot be avoided. A failure of key_change_session_keyring results in a single threaded parent keeping it's existing credentials. Which results in the parent process not being able to access the session keyring and whichever keys are in the new keyring. Further get_ucounts is only expected to fail if the number of bits in the refernece count for the structure is too few. Since the code has no other way to report the failure of get_ucounts and because such failures are not expected to be common add a WARN_ONCE to report this problem to userspace. Between the WARN_ONCE and the parent process not having access to the keys in the new session keyring I expect any failure of get_ucounts will be noticed and reported and we can find another way to handle this condition. (Possibly by just making ucounts->count an atomic_long_t). Cc: stable@vger.kernel.org Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred") Link: https://lkml.kernel.org/r/7k0ias0uf.fsf_-_@disp2133 Tested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> 20 October 2021, 15:34 UTC
4225fea ptp: Fix possible memory leak in ptp_clock_register() I got memory leak as follows when doing fault injection test: unreferenced object 0xffff88800906c618 (size 8): comm "i2c-idt82p33931", pid 4421, jiffies 4294948083 (age 13.188s) hex dump (first 8 bytes): 70 74 70 30 00 00 00 00 ptp0.... backtrace: [<00000000312ed458>] __kmalloc_track_caller+0x19f/0x3a0 [<0000000079f6e2ff>] kvasprintf+0xb5/0x150 [<0000000026aae54f>] kvasprintf_const+0x60/0x190 [<00000000f323a5f7>] kobject_set_name_vargs+0x56/0x150 [<000000004e35abdd>] dev_set_name+0xc0/0x100 [<00000000f20cfe25>] ptp_clock_register+0x9f4/0xd30 [ptp] [<000000008bb9f0de>] idt82p33_probe.cold+0x8b6/0x1561 [ptp_idt82p33] When posix_clock_register() returns an error, the name allocated in dev_set_name() will be leaked, the put_device() should be used to give up the device reference, then the name will be freed in kobject_cleanup() and other memory will be freed in ptp_clock_release(). Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 13:44 UTC
3cb9580 net: stmmac: Fix E2E delay mechanism When utilizing End to End delay mechanism, the following error messages show up: |root@ehl1:~# ptp4l --tx_timestamp_timeout=50 -H -i eno2 -E -m |ptp4l[950.573]: selected /dev/ptp3 as PTP clock |ptp4l[950.586]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE |ptp4l[950.586]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE |ptp4l[952.879]: port 1: new foreign master 001395.fffe.4897b4-1 |ptp4l[956.879]: selected best master clock 001395.fffe.4897b4 |ptp4l[956.879]: port 1: assuming the grand master role |ptp4l[956.879]: port 1: LISTENING to GRAND_MASTER on RS_GRAND_MASTER |ptp4l[962.017]: port 1: received DELAY_REQ without timestamp |ptp4l[962.273]: port 1: received DELAY_REQ without timestamp |ptp4l[963.090]: port 1: received DELAY_REQ without timestamp Commit f2fb6b6275eb ("net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a") already addresses this problem for the dwmac v5.10. However, same holds true for all dwmacs above version v4.10. Correct the check accordingly. Afterwards everything works as expected. Tested on Intel Atom(R) x6414RE Processor. Fixes: 14f347334bf2 ("net: stmmac: Correctly take timestamp for PTPv2") Fixes: f2fb6b6275eb ("net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a") Suggested-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 13:43 UTC
641e3fd nfc: st95hf: Make spi remove() callback return zero If something goes wrong in the remove callback, returning an error code just results in an error message. The device still disappears. So don't skip disabling the regulator in st95hf_remove() if resetting the controller via spi fails. Also don't return an error code which just results in two error messages. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 13:41 UTC
323e9a9 Merge branch 'hns3-fixes' Guangbin Huang says: ==================== net: hns3: add some fixes for -net This series adds some fixes for the HNS3 ethernet driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:38 UTC
0dd8a25 net: hns3: disable sriov before unload hclge layer HNS3 driver includes hns3.ko, hnae3.ko and hclge.ko. hns3.ko includes network stack and pci_driver, hclge.ko includes HW device action, algo_ops and timer task, hnae3.ko includes some register function. When SRIOV is enable and hclge.ko is removed, HW device is unloaded but VF still exists, PF will not reply VF mbx messages, and cause errors. This patch fix it by disable SRIOV before remove hclge.ko. Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:38 UTC
1385cc8 net: hns3: fix vf reset workqueue cannot exit The task of VF reset is performed through the workqueue. It checks the value of hdev->reset_pending to determine whether to exit the loop. However, the value of hdev->reset_pending may also be assigned by the interrupt function hclgevf_misc_irq_handle(), which may cause the loop fail to exit and keep occupying the workqueue. This loop is not necessary, so remove it and the workqueue will be rescheduled if the reset needs to be retried or a new reset occurs. Fixes: 1cc9bc6e5867 ("net: hns3: split hclgevf_reset() into preparing and rebuilding part") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:38 UTC
68752b2 net: hns3: schedule the polling again when allocation fails Currently when there is a rx page allocation failure, it is possible that polling may be stopped if there is no more packet to be reveiced, which may cause queue stall problem under memory pressure. This patch makes sure polling is scheduled again when there is any rx page allocation failure, and polling will try to allocate receive buffers until it succeeds. Now the allocation retry is added, it is unnecessary to do the rx page allocation at the end of rx cleaning, so remove it. And reset the unused_count to zero after calling hns3_nic_alloc_rx_buffers() to avoid calling hns3_nic_alloc_rx_buffers() repeatedly under memory pressure. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:38 UTC
9f9f0f1 net: hns3: fix for miscalculation of rx unused desc rx unused desc is the desc that need attatching new buffer before refilling to hw to receive new packet, the number of desc need attatching new buffer is calculated using next_to_use and next_to_clean. when next_to_use == next_to_clean, currently hns3 driver assumes that all the desc has the buffer attatched, but 'next_to_use == next_to_clean' also means all the desc need attatching new buffer if hw has comsumed all the desc and the driver has not attatched any buffer to the desc yet. This patch adds 'refill' in desc_cb to indicate whether a new buffer has been refilled to a desc. Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:38 UTC
adfb7b4 net: hns3: fix the max tx size according to user manual Currently the max tx size supported by the hw is calculated by using the max BD num supported by the hw. According to the hw user manual, the max tx size is fixed value for both non-TSO and TSO skb. This patch updates the max tx size according to the manual. Fixes: 8ae10cfb5089("net: hns3: support tx-scatter-gather-fraglist feature") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:38 UTC
731797f net: hns3: add limit ets dwrr bandwidth cannot be 0 If ets dwrr bandwidth of tc is set to 0, the hardware will switch to SP mode. In this case, this tc may occupy all the tx bandwidth if it has huge traffic, so it violates the purpose of the user setting. To fix this problem, limit the ets dwrr bandwidth must greater than 0. Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:38 UTC
b63fcaa net: hns3: reset DWRR of unused tc to zero Currently, DWRR of tc will be initialized to a fixed value when this tc is enabled, but it is not been reset to 0 when this tc is disabled. It cause a problem that the DWRR of unused tc is not 0 after using tc tool to add and delete multi-tc parameters. For examples, after enabling 4 TCs and restoring to 1 TC by follow tc commands: $ tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 1 2 3 queues \ 8@0 8@8 8@16 8@24 hw 1 mode channel $ tc qdisc del dev eth0 root Now there is just one TC is enabled for eth0, but the tc info querying by debugfs is shown as follow: $ cat /mnt/hns3/0000:7d:00.0/tm/tc_sch_info enabled tc number: 1 weight_offset: 14 TC MODE WEIGHT 0 dwrr 100 1 dwrr 100 2 dwrr 100 3 dwrr 100 4 dwrr 0 5 dwrr 0 6 dwrr 0 7 dwrr 0 This patch fixes it by resetting DWRR of tc to 0 when tc is disabled. Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:38 UTC
6048410 net: hns3: Add configuration of TM QCN error event Add configuration of interrupt type and fifo interrupt enable of TM QCN error event if enabled, otherwise this event will not be reported when there is error. Fixes: d914971df022 ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()") Signed-off-by: Jiaran Zhang <zhangjiaran@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:38 UTC
787252a powerpc/smp: do not decrement idle task preempt count in CPU offline With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we get: BUG: scheduling while atomic: swapper/1/0/0x00000000 no locks held by swapper/1/0. CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100 Call Trace: dump_stack_lvl+0xac/0x108 __schedule_bug+0xac/0xe0 __schedule+0xcf8/0x10d0 schedule_idle+0x3c/0x70 do_idle+0x2d8/0x4a0 cpu_startup_entry+0x38/0x40 start_secondary+0x2ec/0x3a0 start_secondary_prolog+0x10/0x14 This is because powerpc's arch_cpu_idle_dead() decrements the idle task's preempt count, for reasons explained in commit a7c2bb8279d2 ("powerpc: Re-enable preemption before cpu_die()"), specifically "start_secondary() expects a preempt_count() of 0." However, since commit 2c669ef6979c ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug") and commit f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled"), that justification no longer holds. The idle task isn't supposed to re-enable preemption, so remove the vestigial preempt_enable() from the CPU offline path. Tested with pseries and powernv in qemu, and pseries on PowerVM. Fixes: 2c669ef6979c ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com 20 October 2021, 10:38 UTC
496c5fe powerpc/idle: Don't corrupt back chain when going idle In isa206_idle_insn_mayloss() we store various registers into the stack red zone, which is allowed. However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again, to 0(r1), which corrupts the stack back chain. We used to do the same in isa206_idle_insn_mayloss() itself, but we fixed that in 73287caa9210 ("powerpc64/idle: Fix SP offsets when saving GPRs"), however we missed that the macro also corrupts the back chain. Corrupting the back chain is bad for debuggability but doesn't necessarily cause a bug. However we recently changed the stack handling in some KVM code, and it now relies on the stack back chain being valid when it returns. The corruption causes that code to return with r1 pointing somewhere in kernel data, at some point LR is restored from the stack and we branch to NULL or somewhere else invalid. Only affects Power8 hosts running KVM guests, with dynamic_mt_modes enabled (which it is by default). The fixes tag below points to the commit that changed the KVM stack handling, exposing this bug. The actual corruption of the back chain has always existed since 948cf67c4726 ("powerpc: Add NAP mode support on Power7 in HV mode"). Fixes: 9b4416c5095c ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211020094826.3222052-1-mpe@ellerman.id.au 20 October 2021, 10:37 UTC
55161e6 vrf: Revert "Reset skb conntrack connection..." This reverts commit 09e856d54bda5f288ef8437a90ab2b9b3eab83d1. When an interface is enslaved in a VRF, prerouting conntrack hook is called twice: once in the context of the original input interface, and once in the context of the VRF interface. If no special precausions are taken, this leads to creation of two conntrack entries instead of one, and breaks SNAT. Commit above was intended to avoid creation of extra conntrack entries when input interface is enslaved in a VRF. It did so by resetting conntrack related data associated with the skb when it enters VRF context. However it breaks netfilter operation. Imagine a use case when conntrack zone must be assigned based on the original input interface, rather than VRF interface (that would make original interfaces indistinguishable). One could create netfilter rules similar to these: chain rawprerouting { type filter hook prerouting priority raw; iif realiface1 ct zone set 1 return iif realiface2 ct zone set 2 return } This works before the mentioned commit, but not after: zone assignment is "forgotten", and any subsequent NAT or filtering that is dependent on the conntrack zone does not work. Here is a reproducer script that demonstrates the difference in behaviour. ========== #!/bin/sh # This script demonstrates unexpected change of nftables behaviour # caused by commit 09e856d54bda5f28 ""vrf: Reset skb conntrack # connection on VRF rcv" # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1 # # Before the commit, it was possible to assign conntrack zone to a # packet (or mark it for `notracking`) in the prerouting chanin, raw # priority, based on the `iif` (interface from which the packet # arrived). # After the change, # if the interface is enslaved in a VRF, such # assignment is lost. Instead, assignment based on the `iif` matching # the VRF master interface is honored. Thus it is impossible to # distinguish packets based on the original interface. # # This script demonstrates this change of behaviour: conntrack zone 1 # or 2 is assigned depending on the match with the original interface # or the vrf master interface. It can be observed that conntrack entry # appears in different zone in the kernel versions before and after # the commit. IPIN=172.30.30.1 IPOUT=172.30.30.2 PFXL=30 ip li sh vein >/dev/null 2>&1 && ip li del vein ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf nft list table testct >/dev/null 2>&1 && nft delete table testct ip li add vein type veth peer veout ip li add tvrf type vrf table 9876 ip li set veout master tvrf ip li set vein up ip li set veout up ip li set tvrf up /sbin/sysctl -w net.ipv4.conf.veout.accept_local=1 /sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0 ip addr add $IPIN/$PFXL dev vein ip addr add $IPOUT/$PFXL dev veout nft -f - <<__END__ table testct { chain rawpre { type filter hook prerouting priority raw; iif { veout, tvrf } meta nftrace set 1 iif veout ct zone set 1 return iif tvrf ct zone set 2 return notrack } chain rawout { type filter hook output priority raw; notrack } } __END__ uname -rv conntrack -F ping -W 1 -c 1 -I vein $IPOUT conntrack -L Signed-off-by: Eugene Crosser <crosser@average.org> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> 20 October 2021, 10:27 UTC
ba69fd9 net: dsa: Fix an error handling path in 'dsa_switch_parse_ports_of()' If we return before the end of the 'for_each_child_of_node()' iterator, the reference taken on 'port' must be released. Add the missing 'of_node_put()' calls. Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/15d5310d1d55ad51c1af80775865306d92432e03.1634587046.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org> 19 October 2021, 22:41 UTC
34dc2fd ucounts: Proper error handling in set_cred_ucounts Instead of leaking the ucounts in new if alloc_ucounts fails, store the result of alloc_ucounts into a temporary variable, which is later assigned to new->ucounts. Cc: stable@vger.kernel.org Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred") Link: https://lkml.kernel.org/r/87pms2s0v8.fsf_-_@disp2133 Tested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> 19 October 2021, 16:04 UTC
629715a ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds The purpose of inc_rlimit_ucounts and dec_rlimit_ucounts in commit_creds is to change which rlimit counter is used to track a process when the credentials changes. Use the same test for both to guarantee the tracking is correct. Cc: stable@vger.kernel.org Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Link: https://lkml.kernel.org/r/87v91us0w4.fsf_-_@disp2133 Tested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Alexey Gladkov <legion@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> 19 October 2021, 16:01 UTC
d9abdee Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "19 patches. Subsystems affected by this patch series: mm (userfaultfd, migration, memblock, mempolicy, slub, secretmem, and thp), ocfs2, binfmt, vfs, and misc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mailmap: add Andrej Shadura mm/thp: decrease nr_thps in file's mapping on THP split mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem() vfs: check fd has read access in kernel_read_file_from_fd() elfcore: correct reference to CONFIG_UML mm, slub: fix incorrect memcg slab count for bulk free mm, slub: fix potential use-after-free in slab_debugfs_fops mm, slub: fix potential memoryleak in kmem_cache_open() mm, slub: fix mismatch between reconstructed freelist depth and cnt mm, slub: fix two bugs in slab_debug_trace_open() mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind() memblock: check memory total_size ocfs2: mount fails with buffer overflow in strlen ocfs2: fix data corruption after conversion from inline format mm/migrate: fix CPUHP state to update node demotion order mm/migrate: add CPU hotplug to demotion #ifdef mm/migrate: optimize hotplug-time demotion order updates userfaultfd: fix a race between writeprotect and exit_mmap() mm/userfaultfd: selftests: fix memory corruption with thp enabled 19 October 2021, 15:41 UTC
04ee275 Merge tag 'linux-can-fixes-for-5.15-20211019' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-10-19 this is a pull request of a single patch for net/master. The patch is by me and fixes the error handling in case of a FC timeout in the TX path of the ISOTOP CAN protocol. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 19 October 2021, 12:14 UTC
c69b2f4 cavium: Fix return values of the probe function During the process of driver probing, the probe function should return < 0 for failure, otherwise, the kernel will treat value > 0 as success. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 October 2021, 12:09 UTC
e211210 mISDN: Fix return values of the probe function During the process of driver probing, the probe function should return < 0 for failure, otherwise, the kernel will treat value > 0 as success. Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 October 2021, 12:09 UTC
1bd85aa ceph: fix handling of "meta" errors Currently, we check the wb_err too early for directories, before all of the unsafe child requests have been waited on. In order to fix that we need to check the mapping->wb_err later nearer to the end of ceph_fsync. We also have an overly-complex method for tracking errors after blocklisting. The errors recorded in cleanup_session_requests go to a completely separate field in the inode, but we end up reporting them the same way we would for any other error (in fsync). There's no real benefit to tracking these errors in two different places, since the only reporting mechanism for them is in fsync, and we'd need to advance them both every time. Given that, we can just remove i_meta_err, and convert the places that used it to instead just use mapping->wb_err instead. That also fixes the original problem by ensuring that we do a check_and_advance of the wb_err at the end of the fsync op. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52864 Reported-by: Patrick Donnelly <pdonnell@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> 19 October 2021, 07:36 UTC
98d0a6f ceph: skip existing superblocks that are blocklisted or shut down when mounting Currently when mounting, we may end up finding an existing superblock that corresponds to a blocklisted MDS client. This means that the new mount ends up being unusable. If we've found an existing superblock with a client that is already blocklisted, and the client is not configured to recover on its own, fail the match. Ditto if the superblock has been forcibly unmounted. While we're in here, also rename "other" to the more conventional "fsc". Cc: stable@vger.kernel.org URL: https://bugzilla.redhat.com/show_bug.cgi?id=1901499 Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> 19 October 2021, 07:36 UTC
d674a8f can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path When the a large chunk of data send and the receiver does not send a Flow Control frame back in time, the sendmsg() does not return a error code, but the number of bytes sent corresponding to the size of the packet. If a timeout occurs the isotp_tx_timer_handler() is fired, sets sk->sk_err and calls the sk->sk_error_report() function. It was wrongly expected that the error would be propagated to user space in every case. For isotp_sendmsg() blocking on wait_event_interruptible() this is not the case. This patch fixes the problem by checking if sk->sk_err is set and returning the error to user space. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://github.com/hartkopp/can-isotp/issues/42 Link: https://github.com/hartkopp/can-isotp/pull/43 Link: https://lore.kernel.org/all/20210507091839.1366379-1-mkl@pengutronix.de Cc: stable@vger.kernel.org Reported-by: Sottas Guillaume (LMB) <Guillaume.Sottas@liebherr.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> 19 October 2021, 07:10 UTC
362d5df mailmap: add Andrej Shadura Add a mapping for my old work email for BelDisplayTech to the personal email, and make sure the Collabora email has the correct spelling of the first name. Link: https://lkml.kernel.org/r/20210917091016.30232-1-andrew.shadura@collabora.co.uk Signed-off-by: Andrej Shadura <andrew.shadura@collabora.co.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
1ca7554 mm/thp: decrease nr_thps in file's mapping on THP split Decrease nr_thps counter in file's mapping to ensure that the page cache won't be dropped excessively on file write access if page has been already split. I've tried a test scenario running a big binary, kernel remaps it with THPs, then force a THP split with /sys/kernel/debug/split_huge_pages. During any further open of that binary with O_RDWR or O_WRITEONLY kernel drops page cache for it, because of non-zero thps counter. Link: https://lkml.kernel.org/r/20211012120237.2600-1-m.szyprowski@samsung.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: 09d91cda0e82 ("mm,thp: avoid writes to file with THP in pagecache") Fixes: 06d3eff62d9d ("mm/thp: fix node page state in split_huge_page_to_list()") Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: <sfoon.kim@samsung.com> Cc: Song Liu <songliubraving@fb.com> Cc: Rik van Riel <riel@surriel.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
79f9bc5 mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem() Check for a NULL page->mapping before dereferencing the mapping in page_is_secretmem(), as the page's mapping can be nullified while gup() is running, e.g. by reclaim or truncation. BUG: kernel NULL pointer dereference, address: 0000000000000068 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 6 PID: 4173897 Comm: CPU 3/KVM Tainted: G W RIP: 0010:internal_get_user_pages_fast+0x621/0x9d0 Code: <48> 81 7a 68 80 08 04 bc 0f 85 21 ff ff 8 89 c7 be RSP: 0018:ffffaa90087679b0 EFLAGS: 00010046 RAX: ffffe3f37905b900 RBX: 00007f2dd561e000 RCX: ffffe3f37905b934 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe3f37905b900 ... CR2: 0000000000000068 CR3: 00000004c5898003 CR4: 00000000001726e0 Call Trace: get_user_pages_fast_only+0x13/0x20 hva_to_pfn+0xa9/0x3e0 try_async_pf+0xa1/0x270 direct_page_fault+0x113/0xad0 kvm_mmu_page_fault+0x69/0x680 vmx_handle_exit+0xe1/0x5d0 kvm_arch_vcpu_ioctl_run+0xd81/0x1c70 kvm_vcpu_ioctl+0x267/0x670 __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x56/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lkml.kernel.org/r/20211007231502.3552715-1-seanjc@google.com Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: Sean Christopherson <seanjc@google.com> Reported-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Stephen <stephenackerman16@gmail.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
032146c vfs: check fd has read access in kernel_read_file_from_fd() If we open a file without read access and then pass the fd to a syscall whose implementation calls kernel_read_file_from_fd(), we get a warning from __kernel_read(): if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ))) This currently affects both finit_module() and kexec_file_load(), but it could affect other syscalls in the future. Link: https://lkml.kernel.org/r/20211007220110.600005-1-willy@infradead.org Fixes: b844f0ecbc56 ("vfs: define kernel_copy_file_from_fd()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Hao Sun <sunhao.th@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
b0e9012 elfcore: correct reference to CONFIG_UML Commit 6e7b64b9dd6d ("elfcore: fix building with clang") introduces special handling for two architectures, ia64 and User Mode Linux. However, the wrong name, i.e., CONFIG_UM, for the intended Kconfig symbol for User-Mode Linux was used. Although the directory for User Mode Linux is ./arch/um; the Kconfig symbol for this architecture is called CONFIG_UML. Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs: UM Referencing files: include/linux/elfcore.h Similar symbols: UML, NUMA Correct the name of the config to the intended one. [akpm@linux-foundation.org: fix um/x86_64, per Catalin] Link: https://lkml.kernel.org/r/20211006181119.2851441-1-catalin.marinas@arm.com Link: https://lkml.kernel.org/r/YV6pejGzLy5ppEpt@arm.com Link: https://lkml.kernel.org/r/20211006082209.417-1-lukas.bulwahn@gmail.com Fixes: 6e7b64b9dd6d ("elfcore: fix building with clang") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Barret Rhoden <brho@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
3ddd602 mm, slub: fix incorrect memcg slab count for bulk free kmem_cache_free_bulk() will call memcg_slab_free_hook() for all objects when doing bulk free. So we shouldn't call memcg_slab_free_hook() again for bulk free to avoid incorrect memcg slab count. Link: https://lkml.kernel.org/r/20210916123920.48704-6-linmiaohe@huawei.com Fixes: d1b2cf6cb84a ("mm: memcg/slab: uncharge during kmem_cache_free_bulk()") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Faiyaz Mohammed <faiyazm@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
67823a5 mm, slub: fix potential use-after-free in slab_debugfs_fops When sysfs_slab_add failed, we shouldn't call debugfs_slab_add() for s because s will be freed soon. And slab_debugfs_fops will use s later leading to a use-after-free. Link: https://lkml.kernel.org/r/20210916123920.48704-5-linmiaohe@huawei.com Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Faiyaz Mohammed <faiyazm@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
9037c57 mm, slub: fix potential memoryleak in kmem_cache_open() In error path, the random_seq of slub cache might be leaked. Fix this by using __kmem_cache_release() to release all the relevant resources. Link: https://lkml.kernel.org/r/20210916123920.48704-4-linmiaohe@huawei.com Fixes: 210e7a43fa90 ("mm: SLUB freelist randomization") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Faiyaz Mohammed <faiyazm@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
899447f mm, slub: fix mismatch between reconstructed freelist depth and cnt If object's reuse is delayed, it will be excluded from the reconstructed freelist. But we forgot to adjust the cnt accordingly. So there will be a mismatch between reconstructed freelist depth and cnt. This will lead to free_debug_processing() complaining about freelist count or a incorrect slub inuse count. Link: https://lkml.kernel.org/r/20210916123920.48704-3-linmiaohe@huawei.com Fixes: c3895391df38 ("kasan, slub: fix handling of kasan_slab_free hook") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Faiyaz Mohammed <faiyazm@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
2127d22 mm, slub: fix two bugs in slab_debug_trace_open() Patch series "Fixups for slub". This series contains various bug fixes for slub. We fix memoryleak, use-afer-free, NULL pointer dereferencing and so on in slub. More details can be found in the respective changelogs. This patch (of 5): It's possible that __seq_open_private() will return NULL. So we should check it before using lest dereferencing NULL pointer. And in error paths, we forgot to release private buffer via seq_release_private(). Memory will leak in these paths. Link: https://lkml.kernel.org/r/20210916123920.48704-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210916123920.48704-2-linmiaohe@huawei.com Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Faiyaz Mohammed <faiyazm@codeaurora.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Bharata B Rao <bharata@linux.ibm.com> Cc: Roman Gushchin <guro@fb.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
6d2aec9 mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING | MPOL_LOCAL in mbind() syzbot reported access to unitialized memory in mbind() [1] Issue came with commit bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") This commit added a new bit in MPOL_MODE_FLAGS, but only checked valid combination (MPOL_F_NUMA_BALANCING can only be used with MPOL_BIND) in do_set_mempolicy() This patch moves the check in sanitize_mpol_flags() so that it is also used by mbind() [1] BUG: KMSAN: uninit-value in __mpol_equal+0x567/0x590 mm/mempolicy.c:2260 __mpol_equal+0x567/0x590 mm/mempolicy.c:2260 mpol_equal include/linux/mempolicy.h:105 [inline] vma_merge+0x4a1/0x1e60 mm/mmap.c:1190 mbind_range+0xcc8/0x1e80 mm/mempolicy.c:811 do_mbind+0xf42/0x15f0 mm/mempolicy.c:1333 kernel_mbind mm/mempolicy.c:1483 [inline] __do_sys_mbind mm/mempolicy.c:1490 [inline] __se_sys_mbind+0x437/0xb80 mm/mempolicy.c:1486 __x64_sys_mbind+0x19d/0x200 mm/mempolicy.c:1486 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: slab_alloc_node mm/slub.c:3221 [inline] slab_alloc mm/slub.c:3230 [inline] kmem_cache_alloc+0x751/0xff0 mm/slub.c:3235 mpol_new mm/mempolicy.c:293 [inline] do_mbind+0x912/0x15f0 mm/mempolicy.c:1289 kernel_mbind mm/mempolicy.c:1483 [inline] __do_sys_mbind mm/mempolicy.c:1490 [inline] __se_sys_mbind+0x437/0xb80 mm/mempolicy.c:1486 __x64_sys_mbind+0x19d/0x200 mm/mempolicy.c:1486 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae ===================================================== Kernel panic - not syncing: panic_on_kmsan set ... CPU: 0 PID: 15049 Comm: syz-executor.0 Tainted: G B 5.15.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1ff/0x28e lib/dump_stack.c:106 dump_stack+0x25/0x28 lib/dump_stack.c:113 panic+0x44f/0xdeb kernel/panic.c:232 kmsan_report+0x2ee/0x300 mm/kmsan/report.c:186 __msan_warning+0xd7/0x150 mm/kmsan/instrumentation.c:208 __mpol_equal+0x567/0x590 mm/mempolicy.c:2260 mpol_equal include/linux/mempolicy.h:105 [inline] vma_merge+0x4a1/0x1e60 mm/mmap.c:1190 mbind_range+0xcc8/0x1e80 mm/mempolicy.c:811 do_mbind+0xf42/0x15f0 mm/mempolicy.c:1333 kernel_mbind mm/mempolicy.c:1483 [inline] __do_sys_mbind mm/mempolicy.c:1490 [inline] __se_sys_mbind+0x437/0xb80 mm/mempolicy.c:1486 __x64_sys_mbind+0x19d/0x200 mm/mempolicy.c:1486 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lkml.kernel.org/r/20211001215630.810592-1-eric.dumazet@gmail.com Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
5173ed7 memblock: check memory total_size mem=[X][G|M] is broken on ARM64 platform, there are cases that even type.cnt is 1, but total_size is not 0 because regions are merged into 1. So only check 'cnt' is not enough, total_size should be used, othersize bootargs 'mem=[X][G|B]' not work anymore. Link: https://lkml.kernel.org/r/20210930024437.32598-1-peng.fan@oss.nxp.com Fixes: e888fa7bb882 ("memblock: Check memory add/cap ordering") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
b15fa92 ocfs2: mount fails with buffer overflow in strlen Starting with kernel 5.11 built with CONFIG_FORTIFY_SOURCE mouting an ocfs2 filesystem with either o2cb or pcmk cluster stack fails with the trace below. Problem seems to be that strings for cluster stack and cluster name are not guaranteed to be null terminated in the disk representation, while strlcpy assumes that the source string is always null terminated. This causes a read outside of the source string triggering the buffer overflow detection. detected buffer overflow in strlen ------------[ cut here ]------------ kernel BUG at lib/string.c:1149! invalid opcode: 0000 [#1] SMP PTI CPU: 1 PID: 910 Comm: mount.ocfs2 Not tainted 5.14.0-1-amd64 #1 Debian 5.14.6-2 RIP: 0010:fortify_panic+0xf/0x11 ... Call Trace: ocfs2_initialize_super.isra.0.cold+0xc/0x18 [ocfs2] ocfs2_fill_super+0x359/0x19b0 [ocfs2] mount_bdev+0x185/0x1b0 legacy_get_tree+0x27/0x40 vfs_get_tree+0x25/0xb0 path_mount+0x454/0xa20 __x64_sys_mount+0x103/0x140 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lkml.kernel.org/r/20210929180654.32460-1-vvidic@valentin-vidic.from.hr Signed-off-by: Valentin Vidic <vvidic@valentin-vidic.from.hr> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
5314454 ocfs2: fix data corruption after conversion from inline format Commit 6dbf7bb55598 ("fs: Don't invalidate page buffers in block_write_full_page()") uncovered a latent bug in ocfs2 conversion from inline inode format to a normal inode format. The code in ocfs2_convert_inline_data_to_extents() attempts to zero out the whole cluster allocated for file data by grabbing, zeroing, and dirtying all pages covering this cluster. However these pages are beyond i_size, thus writeback code generally ignores these dirty pages and no blocks were ever actually zeroed on the disk. This oversight was fixed by commit 693c241a5f6a ("ocfs2: No need to zero pages past i_size.") for standard ocfs2 write path, inline conversion path was apparently forgotten; the commit log also has a reasoning why the zeroing actually is not needed. After commit 6dbf7bb55598, things became worse as writeback code stopped invalidating buffers on pages beyond i_size and thus these pages end up with clean PageDirty bit but with buffers attached to these pages being still dirty. So when a file is converted from inline format, then writeback triggers, and then the file is grown so that these pages become valid, the invalid dirtiness state is preserved, mark_buffer_dirty() does nothing on these pages (buffers are already dirty) but page is never written back because it is clean. So data written to these pages is lost once pages are reclaimed. Simple reproducer for the problem is: xfs_io -f -c "pwrite 0 2000" -c "pwrite 2000 2000" -c "fsync" \ -c "pwrite 4000 2000" ocfs2_file After unmounting and mounting the fs again, you can observe that end of 'ocfs2_file' has lost its contents. Fix the problem by not doing the pointless zeroing during conversion from inline format similarly as in the standard write path. [akpm@linux-foundation.org: fix whitespace, per Joseph] Link: https://lkml.kernel.org/r/20210930095405.21433-1-jack@suse.cz Fixes: 6dbf7bb55598 ("fs: Don't invalidate page buffers in block_write_full_page()") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Gang He <ghe@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: "Markov, Andrey" <Markov.Andrey@Dell.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
a6a0251 mm/migrate: fix CPUHP state to update node demotion order The node demotion order needs to be updated during CPU hotplug. Because whether a NUMA node has CPU may influence the demotion order. The update function should be called during CPU online/offline after the node_states[N_CPU] has been updated. That is done in CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD during CPU offline. But in commit 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events"), the function to update node demotion order is called in CPUHP_AP_ONLINE_DYN during CPU online/offline. This doesn't satisfy the order requirement. For example, there are 4 CPUs (P0, P1, P2, P3) in 2 sockets (P0, P1 in S0 and P2, P3 in S1), the demotion order is - S0 -> NUMA_NO_NODE - S1 -> NUMA_NO_NODE After P2 and P3 is offlined, because S1 has no CPU now, the demotion order should have been changed to - S0 -> S1 - S1 -> NO_NODE but it isn't changed, because the order updating callback for CPU hotplug doesn't see the new nodemask. After that, if P1 is offlined, the demotion order is changed to the expected order as above. So in this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the update function on them. Link: https://lkml.kernel.org/r/20210929060351.7293-1-ying.huang@intel.com Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: "Huang, Ying" <ying.huang@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
76af6a0 mm/migrate: add CPU hotplug to demotion #ifdef Once upon a time, the node demotion updates were driven solely by memory hotplug events. But now, there are handlers for both CPU and memory hotplug. However, the #ifdef around the code checks only memory hotplug. A system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU hotplug events. Update the #ifdef around the common code. Add memory and CPU-specific #ifdefs for their handlers. These memory/CPU #ifdefs avoid unused function warnings when their Kconfig option is off. [arnd@arndb.de: rework hotplug_memory_notifier() stub] Link: https://lkml.kernel.org/r/20211013144029.2154629-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20210924161255.E5FE8F7E@davehans-spike.ostc.intel.com Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
295be91 mm/migrate: optimize hotplug-time demotion order updates Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2. This contains two fixes for the "automatic demotion" code which was merged into 5.15: * Fix memory hotplug performance regression by watching suppressing any real action on irrelevant hotplug events. * Ensure CPU hotplug handler is registered when memory hotplug is disabled. This patch (of 2): == tl;dr == Automatic demotion opted for a simple, lazy approach to handling hotplug events. This noticeably slows down memory hotplug[1]. Optimize away updates to the demotion order when memory hotplug events should have no effect. This has no effect on CPU hotplug. There is no known problem on the CPU side and any work there will be in a separate series. == Background == Automatic demotion is a memory migration strategy to ensure that new allocations have room in faster memory tiers on tiered memory systems. The kernel maintains an array (node_demotion[]) to drive these migrations. The node_demotion[] path is calculated by starting at nodes with CPUs and then "walking" to nodes with memory. Only hotplug events which online or offline a node with memory (N_ONLINE) or CPUs (N_CPU) will actually affect the migration order. == Problem == However, the current code is lazy. It completely regenerates the migration order on *any* CPU or memory hotplug event. The logic was that these events are extremely rare and that the overhead from indiscriminate order regeneration is minimal. Part of the update logic involves a synchronize_rcu(), which is a pretty big hammer. Its overhead was large enough to be detected by some 0day tests that watch memory hotplug performance[1]. == Solution == Add a new helper (node_demotion_topo_changed()) which can differentiate between superfluous and impactful hotplug events. Skip the expensive update operation for superfluous events. == Aside: Locking == It took me a few moments to declare the locking to be safe enough for node_demotion_topo_changed() to work. It all hinges on the memory hotplug lock: During memory hotplug events, 'mem_hotplug_lock' is held for write. This ensures that two memory hotplug events can not be called simultaneously. CPU hotplug has a similar lock (cpuhp_state_mutex) which also provides mutual exclusion between CPU hotplug events. In addition, the demotion code acquire and hold the mem_hotplug_lock for read during its CPU hotplug handlers. This provides mutual exclusion between the demotion memory hotplug callbacks and the CPU hotplug callbacks. This effectively allows treating the migration target generation code to act as if it is single-threaded. 1. https://lore.kernel.org/all/20210905135932.GE15026@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20210924161251.093CCD06@davehans-spike.ostc.intel.com Link: https://lkml.kernel.org/r/20210924161253.D7673E31@davehans-spike.ostc.intel.com Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reported-by: kernel test robot <oliver.sang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Xu <weixugc@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
cb185d5 userfaultfd: fix a race between writeprotect and exit_mmap() A race is possible when a process exits, its VMAs are removed by exit_mmap() and at the same time userfaultfd_writeprotect() is called. The race was detected by KASAN on a development kernel, but it appears to be possible on vanilla kernels as well. Use mmget_not_zero() to prevent the race as done in other userfaultfd operations. Link: https://lkml.kernel.org/r/20210921200247.25749-1-namit@vmware.com Fixes: 63b2d4174c4ad ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl") Signed-off-by: Nadav Amit <namit@vmware.com> Tested-by: Li Wang <liwang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
8913970 mm/userfaultfd: selftests: fix memory corruption with thp enabled In RHEL's gating selftests we've encountered memory corruption in the uffd event test even with upstream kernel: # ./userfaultfd anon 128 4 nr_pages: 32768, nr_pages_per_cpu: 32768 bounces: 3, mode: rnd racing read, userfaults: 6240 missing (6240) 14729 wp (14729) bounces: 2, mode: racing read, userfaults: 1444 missing (1444) 28877 wp (28877) bounces: 1, mode: rnd read, userfaults: 6055 missing (6055) 14699 wp (14699) bounces: 0, mode: read, userfaults: 82 missing (82) 25196 wp (25196) testing uffd-wp with pagemap (pgsize=4096): done testing uffd-wp with pagemap (pgsize=2097152): done testing events (fork, remap, remove): ERROR: nr 32427 memory corruption 0 1 (errno=0, line=963) ERROR: faulting process failed (errno=0, line=1117) It can be easily reproduced when global thp enabled, which is the default for RHEL. It's also known as a side effect of commit 0db282ba2c12 ("selftest: use mmap instead of posix_memalign to allocate memory", 2021-07-23), which is imho right itself on using mmap() to make sure the addresses will be untagged even on arm. The problem is, for each test we allocate buffers using two allocate_area() calls. We assumed these two buffers won't affect each other, however they could, because mmap() could have found that the two buffers are near each other and having the same VMA flags, so they got merged into one VMA. It won't be a big problem if thp is not enabled, but when thp is agressively enabled it means when initializing the src buffer it could accidentally setup part of the dest buffer too when there's a shared THP that overlaps the two regions. Then some of the dest buffer won't be able to be trapped by userfaultfd missing mode, then it'll cause memory corruption as described. To fix it, do release_pages() after initializing the src buffer. Since the previous two release_pages() calls are after uffd_test_ctx_clear() which will unmap all the buffers anyway (which is stronger than release pages; as unmap() also tear town pgtables), drop them as they shouldn't really be anything useful. We can mark the Fixes tag upon 0db282ba2c12 as it's reported to only happen there, however the real "Fixes" IMHO should be 8ba6e8640844, as before that commit we'll always do explicit release_pages() before registration of uffd, and 8ba6e8640844 changed that logic by adding extra unmap/map and we didn't release the pages at the right place. Meanwhile I don't have a solid glue anyway on whether posix_memalign() could always avoid triggering this bug, hence it's safer to attach this fix to commit 8ba6e8640844. Link: https://lkml.kernel.org/r/20210923232512.210092-1-peterx@redhat.com Fixes: 8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each test") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1994931 Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Li Wang <liwan@redhat.com> Tested-by: Li Wang <liwang@redhat.com> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 19 October 2021, 06:22 UTC
2966492 ALSA: usb-audio: Fix microphone sound on Jieli webcam. When a Jieli Technology USB Webcam is connected, the video part works well, but the mic sound is speeded up. On dmesg there are messages about different rates from the runtime rates, warnings about volume resolution and lastly, the log is filled, every 5 seconds, with retire_capture_urb error messages. The mic works only when ep packet size is set to wMaxPacketSize (normal sound and no more retire_capture_urb error messages). Skipping reading sample rate, fixes the messages about different rates and forcing a volume resolution, fixes warnings about volume range. I have arbitrarily choosed the value (16): I read in a comment that there should be no more than 255 levels, so 4096 (max volume) / 16 = 0-255. Signed-off-by: Marco Giunta <giun7a@gmail.com> Link: https://lore.kernel.org/r/20211018162552.12082-1-giun7a@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de> 19 October 2021, 06:07 UTC
6e3ee99 audit: fix possible null-pointer dereference in audit_filter_rules Fix possible null-pointer dereference in audit_filter_rules. audit_filter_rules() error: we previously assumed 'ctx' could be null Cc: stable@vger.kernel.org Fixes: bf361231c295 ("audit: add saddr_fam filter field") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Paul Moore <paul@paul-moore.com> 18 October 2021, 22:27 UTC
ed65df6 tracing: Have all levels of checks prevent recursion While writing an email explaining the "bit = 0" logic for a discussion on making ftrace_test_recursion_trylock() disable preemption, I discovered a path that makes the "not do the logic if bit is zero" unsafe. The recursion logic is done in hot paths like the function tracer. Thus, any code executed causes noticeable overhead. Thus, tricks are done to try to limit the amount of code executed. This included the recursion testing logic. Having recursion testing is important, as there are many paths that can end up in an infinite recursion cycle when tracing every function in the kernel. Thus protection is needed to prevent that from happening. Because it is OK to recurse due to different running context levels (e.g. an interrupt preempts a trace, and then a trace occurs in the interrupt handler), a set of bits are used to know which context one is in (normal, softirq, irq and NMI). If a recursion occurs in the same level, it is prevented*. Then there are infrastructure levels of recursion as well. When more than one callback is attached to the same function to trace, it calls a loop function to iterate over all the callbacks. Both the callbacks and the loop function have recursion protection. The callbacks use the "ftrace_test_recursion_trylock()" which has a "function" set of context bits to test, and the loop function calls the internal trace_test_and_set_recursion() directly, with an "internal" set of bits. If an architecture does not implement all the features supported by ftrace then the callbacks are never called directly, and the loop function is called instead, which will implement the features of ftrace. Since both the loop function and the callbacks do recursion protection, it was seemed unnecessary to do it in both locations. Thus, a trick was made to have the internal set of recursion bits at a more significant bit location than the function bits. Then, if any of the higher bits were set, the logic of the function bits could be skipped, as any new recursion would first have to go through the loop function. This is true for architectures that do not support all the ftrace features, because all functions being traced must first go through the loop function before going to the callbacks. But this is not true for architectures that support all the ftrace features. That's because the loop function could be called due to two callbacks attached to the same function, but then a recursion function inside the callback could be called that does not share any other callback, and it will be called directly. i.e. traced_function_1: [ more than one callback tracing it ] call loop_func loop_func: trace_recursion set internal bit call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 traced_function_2: [ only traced by above callback ] call callback callback: trace_recursion [ skipped because internal bit is set, return 0 ] call traced_function_2 [ wash, rinse, repeat, BOOM! out of shampoo! ] Thus, the "bit == 0 skip" trick is not safe, unless the loop function is call for all functions. Since we want to encourage architectures to implement all ftrace features, having them slow down due to this extra logic may encourage the maintainers to update to the latest ftrace features. And because this logic is only safe for them, remove it completely. [*] There is on layer of recursion that is allowed, and that is to allow for the transition between interrupt context (normal -> softirq -> irq -> NMI), because a trace may occur before the context update is visible to the trace recursion logic. Link: https://lore.kernel.org/all/609b565a-ed6e-a1da-f025-166691b5d994@linux.alibaba.com/ Link: https://lkml.kernel.org/r/20211018154412.09fcad3c@gandalf.local.home Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@hansenpartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Jisheng Zhang <jszhang@kernel.org> Cc: =?utf-8?b?546L6LSH?= <yun.wang@linux.alibaba.com> Cc: Guo Ren <guoren@kernel.org> Cc: stable@vger.kernel.org Fixes: edc15cafcbfa3 ("tracing: Avoid unnecessary multiple recursion checks") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> 18 October 2021, 22:12 UTC
8a64ef0 nfp: bpf: silence bitwise vs. logical OR warning A new warning in clang points out two places in this driver where boolean expressions are being used with a bitwise OR instead of a logical one: drivers/net/ethernet/netronome/nfp/nfp_asm.c:199:20: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical] reg->src_lmextn = swreg_lmextn(lreg) | swreg_lmextn(rreg); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ || drivers/net/ethernet/netronome/nfp/nfp_asm.c:199:20: note: cast one or both operands to int to silence this warning drivers/net/ethernet/netronome/nfp/nfp_asm.c:280:20: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical] reg->src_lmextn = swreg_lmextn(lreg) | swreg_lmextn(rreg); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ || drivers/net/ethernet/netronome/nfp/nfp_asm.c:280:20: note: cast one or both operands to int to silence this warning 2 errors generated. The motivation for the warning is that logical operations short circuit while bitwise operations do not. In this case, it does not seem like short circuiting is harmful so implement the suggested fix of changing to a logical operation to fix the warning. Link: https://github.com/ClangBuiltLinux/linux/issues/1479 Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20211018193101.2340261-1-nathan@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> 18 October 2021, 21:50 UTC
5ca6779 drm/msm/devfreq: Restrict idle clamping to a618 for now Until we better understand the stability issues caused by frequent frequency changes, lets limit them to a618. Signed-off-by: Rob Clark <robdclark@chromium.org> Tested-by: John Stultz <john.stultz@linaro.org> Tested-by: Caleb Connolly <caleb.connolly@linaro.org> Link: https://lore.kernel.org/r/20211018153627.2787882-1-robdclark@gmail.com Signed-off-by: Rob Clark <robdclark@chromium.org> 18 October 2021, 21:31 UTC
15bc01e ucounts: Fix signal ucount refcounting In commit fda31c50292a ("signal: avoid double atomic counter increments for user accounting") Linus made a clever optimization to how rlimits and the struct user_struct. Unfortunately that optimization does not work in the obvious way when moved to nested rlimits. The problem is that the last decrement of the per user namespace per user sigpending counter might also be the last decrement of the sigpending counter in the parent user namespace as well. Which means that simply freeing the leaf ucount in __free_sigqueue is not enough. Maintain the optimization and handle the tricky cases by introducing inc_rlimit_get_ucounts and dec_rlimit_put_ucounts. By moving the entire optimization into functions that perform all of the work it becomes possible to ensure that every level is handled properly. The new function inc_rlimit_get_ucounts returns 0 on failure to increment the ucount. This is different than inc_rlimit_ucounts which increments the ucounts and returns LONG_MAX if the ucount counter has exceeded it's maximum or it wrapped (to indicate the counter needs to decremented). I wish we had a single user to account all pending signals to across all of the threads of a process so this complexity was not necessary Cc: stable@vger.kernel.org Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") v1: https://lkml.kernel.org/r/87mtnavszx.fsf_-_@disp2133 Link: https://lkml.kernel.org/r/87fssytizw.fsf_-_@disp2133 Reviewed-by: Alexey Gladkov <legion@kernel.org> Tested-by: Rune Kleveland <rune.kleveland@infomedia.dk> Tested-by: Yu Zhao <yuzhao@google.com> Tested-by: Jordan Glover <Golden_Miller83@protonmail.ch> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> 18 October 2021, 21:02 UTC
9f1ee7b KVM: SEV-ES: reduce ghcb_sa_len to 32 bits The size of the GHCB scratch area is limited to 16 KiB (GHCB_SCRATCH_AREA_LIMIT), so there is no need for it to be a u64. This fixes a build error on 32-bit systems: i686-linux-gnu-ld: arch/x86/kvm/svm/sev.o: in function `sev_es_string_io: sev.c:(.text+0x110f): undefined reference to `__udivdi3' Cc: stable@vger.kernel.org Fixes: 019057bd73d1 ("KVM: SEV-ES: fix length of string I/O") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 18 October 2021, 18:07 UTC
d61863c KVM: VMX: Remove redundant handling of bus lock vmexit Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK. We can remove redundant handling of bus lock vmexit. Unconditionally Set exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit(). Signed-off-by: Hao Xiang <hao.xiang@linux.alibaba.com> Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 18 October 2021, 18:07 UTC
01c7d26 KVM: kvm_stat: do not show halt_wait_ns Similar to commit 111d0bda8eeb ("tools/kvm_stat: Exempt time-based counters"), we should not show timer values in kvm_stat. Remove the new halt_wait_ns. Fixes: 87bcc5fa092f ("KVM: stats: Add halt_wait_ns stats for all architectures") Cc: Jing Zhang <jingzhangos@google.com> Cc: Stefan Raspl <raspl@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Stefan Raspl <raspl@linux.ibm.com> Message-Id: <20211006121724.4154-1-borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 18 October 2021, 18:07 UTC
9139a7a KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload WARN if the static keys used to track if any vCPU has disabled its APIC are left elevated at module exit. Unlike the underflow case, nothing in the static key infrastructure will complain if a key is left elevated, and because an elevated key only affects performance, nothing in KVM will fail if either key is improperly incremented. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211013003554.47705-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 18 October 2021, 18:07 UTC
f7d8a19 Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET" Revert a change to open code bits of kvm_lapic_set_base() when emulating APIC RESET to fix an apic_hw_disabled underflow bug due to arch.apic_base and apic_hw_disabled being unsyncrhonized when the APIC is created. If kvm_arch_vcpu_create() fails after creating the APIC, kvm_free_lapic() will see the initialized-to-zero vcpu->arch.apic_base and decrement apic_hw_disabled without KVM ever having incremented apic_hw_disabled. Using kvm_lapic_set_base() in kvm_lapic_reset() is also desirable for a potential future where KVM supports RESET outside of vCPU creation, in which case all the side effects of kvm_lapic_set_base() are needed, e.g. to handle the transition from x2APIC => xAPIC. Alternatively, KVM could temporarily increment apic_hw_disabled (and call kvm_lapic_set_base() at RESET), but that's a waste of cycles and would impact the performance of other vCPUs and VMs. The other subtle side effect is that updating the xAPIC ID needs to be done at RESET regardless of whether the APIC was previously enabled, i.e. kvm_lapic_reset() needs an explicit call to kvm_apic_set_xapic_id() regardless of whether or not kvm_lapic_set_base() also performs the update. That makes stuffing the enable bit at vCPU creation slightly more palatable, as doing so affects only the apic_hw_disabled key. Opportunistically tweak the comment to explicitly call out the connection between vcpu->arch.apic_base and apic_hw_disabled, and add a comment to call out the need to always do kvm_apic_set_xapic_id() at RESET. Underflow scenario: kvm_vm_ioctl() { kvm_vm_ioctl_create_vcpu() { kvm_arch_vcpu_create() { if (something_went_wrong) goto fail_free_lapic; /* vcpu->arch.apic_base is initialized when something_went_wrong is false. */ kvm_vcpu_reset() { kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event) { vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE; } } return 0; fail_free_lapic: kvm_free_lapic() { /* vcpu->arch.apic_base is not yet initialized when something_went_wrong is true. */ if (!(vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE)) static_branch_slow_dec_deferred(&apic_hw_disabled); // <= underflow bug. } return r; } } } This (mostly) reverts commit 421221234ada41b4a9f0beeb08e30b07388bd4bd. Fixes: 421221234ada ("KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET") Reported-by: syzbot+9fc046ab2b0cf295a063@syzkaller.appspotmail.com Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211013003554.47705-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 18 October 2021, 18:07 UTC
baa1e5c KVM: SEV-ES: Set guest_state_protected after VMSA update The refactoring in commit bb18a6777465 ("KVM: SEV: Acquire vcpu mutex when updating VMSA") left behind the assignment to svm->vcpu.arch.guest_state_protected; add it back. Signed-off-by: Peter Gonda <pgonda@google.com> [Delta between v2 and v3 of Peter's patch, which had already been committed; the commit message is my own. - Paolo] Fixes: bb18a6777465 ("KVM: SEV: Acquire vcpu mutex when updating VMSA") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 18 October 2021, 18:07 UTC
fa13843 KVM: X86: fix lazy allocation of rmaps If allocation of rmaps fails, but some of the pointers have already been written, those pointers can be cleaned up when the memslot is freed, or even reused later for another attempt at allocating the rmaps. Therefore there is no need to WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be skipped lest KVM will overwrite the previous pointer and will indeed leak memory. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> 18 October 2021, 18:07 UTC
back to top