sort by:
Revision Author Date Message Commit Date
7ff57e9 net/smc: Use a mutex for locking "struct smc_pnettable" smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib() which, in turn, calls mutex_lock(&smc_ib_devices.mutex). read_lock() disables preemption. Therefore, the code acquires a mutex while in atomic context and it leads to a SAC bug. Fix this bug by replacing the rwlock with a mutex. Reported-and-tested-by: syzbot+4f322a6d84e991c38775@syzkaller.appspotmail.com Fixes: 64e28b52c7a6 ("net/smc: add pnet table namespace support") Confirmed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20220223100252.22562-1-fmdefrancesco@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 24 February 2022, 17:09:33 UTC
e13ad14 bnx2x: fix driver load from initrd Commit b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0") added new firmware support in the driver with maintaining older firmware compatibility. However, older firmware was not added in MODULE_FIRMWARE() which caused missing firmware files in initrd image leading to driver load failure from initrd. This patch adds MODULE_FIRMWARE() for older firmware version to have firmware files included in initrd. Fixes: b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215627 Signed-off-by: Manish Chopra <manishc@marvell.com> Signed-off-by: Alok Prasad <palok@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Link: https://lore.kernel.org/r/20220223085720.12021-1-manishc@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 24 February 2022, 17:06:18 UTC
e8240ad Revert "xen-netback: Check for hotplug-status existence before watching" This reverts commit 2afeec08ab5c86ae21952151f726bfe184f6b23d. The reasoning in the commit was wrong - the code expected to setup the watch even if 'hotplug-status' didn't exist. In fact, it relied on the watch being fired the first time - to check if maybe 'hotplug-status' is already set to 'connected'. Not registering a watch for non-existing path (which is the case if hotplug script hasn't been executed yet), made the backend not waiting for the hotplug script to execute. This in turns, made the netfront think the interface is fully operational, while in fact it was not (the vif interface on xen-netback side might not be configured yet). This was a workaround for 'hotplug-status' erroneously being removed. But since that is reverted now, the workaround is not necessary either. More discussion at https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Michael Brown <mbrown@fensystems.co.uk> Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 24 February 2022, 16:58:37 UTC
0f4558a Revert "xen-netback: remove 'hotplug-status' once it has served its purpose" This reverts commit 1f2565780e9b7218cf92c7630130e82dcc0fe9c2. The 'hotplug-status' node should not be removed as long as the vif device remains configured. Otherwise the xen-netback would wait for re-running the network script even if it was already called (in case of the frontent re-connecting). But also, it _should_ be removed when the vif device is destroyed (for example when unbinding the driver) - otherwise hotplug script would not configure the device whenever it re-appear. Moving removal of the 'hotplug-status' node was a workaround for nothing calling network script after xen-netback module is reloaded. But when vif interface is re-created (on xen-netback unbind/bind for example), the script should be called, regardless of who does that - currently this case is not handled by the toolstack, and requires manual script call. Keeping hotplug-status=connected to skip the call is wrong and leads to not configured interface. More discussion at https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#u Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Paul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 24 February 2022, 16:58:37 UTC
5facf49 Merge tag 'mlx5-fixes-2022-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2022-02-22 This series provides bug fixes to mlx5 driver. Please pull and let me know if there is any problem. * tag 'mlx5-fixes-2022-02-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Fix VF min/max rate parameters interchange mistake net/mlx5e: Add missing increment of count net/mlx5e: MPLSoUDP decap, fix check for unsupported matches net/mlx5e: Fix MPLSoUDP encap to use MPLS action information net/mlx5e: Add feature check for set fec counters net/mlx5e: TC, Skip redundant ct clear actions net/mlx5e: TC, Reject rules with forward and drop actions net/mlx5e: TC, Reject rules with drop and modify hdr action net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets net/mlx5e: Fix wrong return value on ioctl EEPROM query failure net/mlx5: Fix possible deadlock on rule deletion net/mlx5: Fix tc max supported prio for nic mode net/mlx5: Fix wrong limitation of metadata match on ecpf net/mlx5: Update log_max_qp value to be 17 at most net/mlx5: DR, Fix the threshold that defines when pool sync is initiated net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte net/mlx5: DR, Cache STE shadow memory net/mlx5: Update the list of the PCI supported devices ==================== Link: https://lore.kernel.org/r/20220224001123.365265-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> 24 February 2022, 04:30:01 UTC
ca49df9 net/mlx5e: Fix VF min/max rate parameters interchange mistake The VF min and max rate were passed incorrectly and resulted in wrongly interchanging them. Fix the order of parameters in mlx5_esw_qos_set_vport_rate(). Fixes: d7df09f5e7b4 ("net/mlx5: E-switch, Enable vport QoS on demand") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:19 UTC
5ee02b7 net/mlx5e: Add missing increment of count Add mistakenly missing increment of count variable when looping over output buffer in mlx5e_self_test(). This resolves the issue of garbage values output when querying with self test via ethtool. before: $ ethtool -t eth2 The test result is PASS The test extra info: Link Test 0 Speed Test 1768697188 Health Test 758528120 Loopback Test 3288687 after: $ ethtool -t eth2 The test result is PASS The test extra info: Link Test 0 Speed Test 0 Health Test 0 Loopback Test 0 Fixes: 7990b1b5e8bd ("net/mlx5e: loopback test is not supported in switchdev mode") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:19 UTC
fdc18e4 net/mlx5e: MPLSoUDP decap, fix check for unsupported matches Currently offload of rule on bareudp device require tunnel key in order to match on mpls fields and without it the mpls fields are ignored, this is incorrect due to the fact udp tunnel doesn't have key to match on. Fix by returning error in case flow is matching on tunnel key. Fixes: 72046a91d134 ("net/mlx5e: Allow to match on mpls parameters") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:19 UTC
c63741b net/mlx5e: Fix MPLSoUDP encap to use MPLS action information Currently the MPLSoUDP encap builds the MPLS header using encap action information (tunnel id, ttl and tos) instead of the MPLS action information (label, ttl, tc and bos) which is wrong. Fix by storing the MPLS action information during the flow action parse and later using it to create the encap MPLS header. Fixes: f828ca6a2fb6 ("net/mlx5e: Add support for hw encapsulation of MPLS over UDP") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:18 UTC
7fac052 net/mlx5e: Add feature check for set fec counters Fec counters support is checked via the PCAM feature_cap_mask, bit 0: PPCNT_counter_group_Phy_statistical_counter_group. Add feature check to avoid faulty behavior. Fixes: 0a1498ebfa55 ("net/mlx5e: Expose FEC counters via ethtool") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:18 UTC
fb7e76e net/mlx5e: TC, Skip redundant ct clear actions Offload of ct clear action is just resetting the reg_c register. It's done by allocating modify hdr resources which is limited. Doing it multiple times is redundant and wasting modify hdr resources and if resources depleted the driver will fail offloading the rule. Ignore redundant ct clear actions after the first one. Fixes: 806401c20a0f ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:18 UTC
3d65492 net/mlx5e: TC, Reject rules with forward and drop actions Such rules are redundant but allowed and passed to the driver. The driver does not support offloading such rules so return an error. Fixes: 03a9d11e6eeb ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:17 UTC
23216d3 net/mlx5e: TC, Reject rules with drop and modify hdr action This kind of action is not supported by firmware and generates a syndrome. kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3) Fixes: d7e75a325cb2 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:17 UTC
7eaf1f3 net/mlx5e: kTLS, Use CHECKSUM_UNNECESSARY for device-offloaded packets For RX TLS device-offloaded packets, the HW spec guarantees checksum validation for the offloaded packets, but does not define whether the CQE.checksum field matches the original packet (ciphertext) or the decrypted one (plaintext). This latitude allows architetctural improvements between generations of chips, resulting in different decisions regarding the value type of CQE.checksum. Hence, for these packets, the device driver should not make use of this CQE field. Here we block CHECKSUM_COMPLETE usage for RX TLS device-offloaded packets, and use CHECKSUM_UNNECESSARY instead. Value of the packet's tcp_hdr.csum is not modified by the HW, and it always matches the original ciphertext. Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:16 UTC
0b89429 net/mlx5e: Fix wrong return value on ioctl EEPROM query failure The ioctl EEPROM query wrongly returns success on read failures, fix that by returning the appropriate error code. Fixes: bb64143eee8c ("net/mlx5e: Add ethtool support for dump module EEPROM") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:15 UTC
b645e57 net/mlx5: Fix possible deadlock on rule deletion Add missing call to up_write_ref_node() which releases the semaphore in case the FTE doesn't have destinations, such in drop rule case. Fixes: 465e7baab6d9 ("net/mlx5: Fix deletion of duplicate rules") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:14 UTC
be7f4b0 net/mlx5: Fix tc max supported prio for nic mode Only prio 1 is supported if firmware doesn't support ignore flow level for nic mode. The offending commit removed the check wrongly. Add it back. Fixes: 9a99c8f1253a ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:13 UTC
07666c7 net/mlx5: Fix wrong limitation of metadata match on ecpf Match metadata support check returns false for ecpf device. However, this support does exist for ecpf and therefore this limitation should be removed to allow feature such as stacked devices and internal port offloaded to be supported. Fixes: 92ab1eb392c6 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:13 UTC
7f83996 net/mlx5: Update log_max_qp value to be 17 at most Currently, log_max_qp value is dependent on what FW reports as its max capability. In reality, due to a bug, some FWs report a value greater than 17, even though they don't support log_max_qp > 17. This FW issue led the driver to exhaust memory on startup. Thus, log_max_qp value is set to be no more than 17 regardless of what FW reports, as it was before the cited commit. Fixes: f79a609ea6bf ("net/mlx5: Update log_max_qp value to FW max capability") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:12 UTC
ecd9c5c net/mlx5: DR, Fix the threshold that defines when pool sync is initiated When deciding whether to start syncing and actually free all the "hot" ICM chunks, we need to consider the type of the ICM chunks that we're dealing with. For instance, the amount of available ICM for MODIFY_ACTION is significantly lower than the usual STE ICM, so the threshold should account for that - otherwise we can deplete MODIFY_ACTION memory just by creating and deleting the same modify header action in a continuous loop. This patch replaces the hard-coded threshold with a dynamic value. Fixes: 1c58651412bb ("net/mlx5: DR, ICM memory pools sync optimization") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:11 UTC
ffb0753 net/mlx5: DR, Don't allow match on IP w/o matching on full ethertype/ip_version Currently SMFS allows adding rule with matching on src/dst IP w/o matching on full ethertype or ip_version, which is not supported by HW. This patch fixes this issue and adds the check as it is done in DMFS. Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:10 UTC
0aec12d net/mlx5: DR, Fix slab-out-of-bounds in mlx5_cmd_dr_create_fte When adding a rule with 32 destinations, we hit the following out-of-band access issue: BUG: KASAN: slab-out-of-bounds in mlx5_cmd_dr_create_fte+0x18ee/0x1e70 This patch fixes the issue by both increasing the allocated buffers to accommodate for the needed actions and by checking the number of actions to prevent this issue when a rule with too many actions is provided. Fixes: 1ffd498901c1 ("net/mlx5: DR, Increase supported num of actions to 32") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:10 UTC
e5b2bc3 net/mlx5: DR, Cache STE shadow memory During rule insertion on each ICM memory chunk we also allocate shadow memory used for management. This includes the hw_ste, dr_ste and miss list per entry. Since the scale of these allocations is large we noticed a performance hiccup that happens once malloc and free are stressed. In extreme usecases when ~1M chunks are freed at once, it might take up to 40 seconds to complete this, up to the point the kernel sees this as self-detected stall on CPU: rcu: INFO: rcu_sched self-detected stall on CPU To resolve this we will increase the reuse of shadow memory. Doing this we see that a time in the aforementioned usecase dropped from ~40 seconds to ~8-10 seconds. Fixes: 29cf8febd185 ("net/mlx5: DR, ICM pool memory allocator") Signed-off-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:09 UTC
f908a35 net/mlx5: Update the list of the PCI supported devices Add the upcoming BlueField-4 and ConnectX-8 device IDs. Fixes: 2e9d3e83ab82 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Meir Lichtinger <meirl@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 24 February 2022, 00:08:08 UTC
0228d37 Merge branch 'ftgmac100-fixes' Heyi Guo says: ==================== drivers/net/ftgmac100: fix occasional DHCP failure This patch set is to fix the issues discussed in the mail thread: https://lore.kernel.org/netdev/51f5b7a7-330f-6b3c-253d-10e45cdb6805@linux.alibaba.com/ and follows the advice from Andrew Lunn. The first 2 patches refactors the code to enable adjust_link calling reset function directly. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 23 February 2022, 12:50:19 UTC
1baf2e5 drivers/net/ftgmac100: fix DHCP potential failure with systemd DHCP failures were observed with systemd 247.6. The issue could be reproduced by rebooting Aspeed 2600 and then running ifconfig ethX down/up. It is caused by below procedures in the driver: 1. ftgmac100_open() enables net interface and call phy_start() 2. When PHY is link up, it calls netif_carrier_on() and then adjust_link callback 3. ftgmac100_adjust_link() will schedule the reset task 4. ftgmac100_reset_task() will then reset the MAC in another schedule After step 2, systemd will be notified to send DHCP discover packet, while the packet might be corrupted by MAC reset operation in step 4. Call ftgmac100_reset() directly instead of scheduling task to fix the issue. Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 February 2022, 12:50:19 UTC
3c773db drivers/net/ftgmac100: adjust code place for function call dependency This is to prepare for ftgmac100_adjust_link() to call ftgmac100_reset() directly. Only code places are changed. Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 February 2022, 12:50:19 UTC
4f1e728 drivers/net/ftgmac100: refactor ftgmac100_reset_task to enable direct function call This is to prepare for ftgmac100_adjust_link() to call reset function directly, instead of task schedule. Signed-off-by: Heyi Guo <guoheyi@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 February 2022, 12:50:19 UTC
ecf4a24 net: sched: avoid newline at end of message in NL_SET_ERR_MSG_MOD Fix following coccicheck warning: ./net/sched/act_api.c:277:7-49: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 February 2022, 12:45:44 UTC
404ba13 MAINTAINERS: add myself as co-maintainer for Realtek DSA switch drivers Adding myself (Alvin Šipraga) as another maintainer for the Realtek DSA switch drivers. I intend to help Linus out with reviewing and testing changes to these drivers, particularly the rtl8365mb driver which I authored and have hardware access to. Cc: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> 23 February 2022, 12:36:21 UTC
a1f8fec tipc: Fix end of loop tests for list_for_each_entry() These tests are supposed to check if the loop exited via a break or not. However the tests are wrong because if we did not exit via a break then "p" is not a valid pointer. In that case, it's the equivalent of "if (*(u32 *)sr == *last_key) {". That's going to work most of the time, but there is a potential for those to be equal. Fixes: 1593123a6a49 ("tipc: add name table dump to new netlink api") Fixes: 1a1a143daf84 ("tipc: add publication dump to new netlink api") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 February 2022, 12:35:40 UTC
de7b2ef udp_tunnel: Fix end of loop test in udp_tunnel_nic_unregister() This test is checking if we exited the list via break or not. However if it did not exit via a break then "node" does not point to a valid udp_tunnel_nic_shared_node struct. It will work because of the way the structs are laid out it's the equivalent of "if (info->shared->udp_tunnel_nic_info != dev)" which will always be true, but it's not the right way to test. Fixes: 74cc6d182d03 ("udp_tunnel: add the ability to share port tables") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 February 2022, 12:35:00 UTC
a58da53 vhost/vsock: don't check owner in vhost_vsock_stop() while releasing vhost_vsock_stop() calls vhost_dev_check_owner() to check the device ownership. It expects current->mm to be valid. vhost_vsock_stop() is also called by vhost_vsock_dev_release() when the user has not done close(), so when we are in do_exit(). In this case current->mm is invalid and we're releasing the device, so we should clean it anyway. Let's check the owner only when vhost_vsock_stop() is called by an ioctl. When invoked from release we can not fail so we don't check return code of vhost_vsock_stop(). We need to stop vsock even if it's not the owner. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Cc: stable@vger.kernel.org Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> 23 February 2022, 12:32:33 UTC
277f2bb ibmvnic: schedule failover only if vioctl fails If client is unable to initiate a failover reset via H_VIOCTL hcall, then it should schedule a failover reset as a last resort. Otherwise, there is no need to do a last resort. Fixes: 334c42414729 ("ibmvnic: improve failover sysfs entry") Reported-by: Cris Forno <cforno12@outlook.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: Dany Madden <drt@linux.ibm.com> Link: https://lore.kernel.org/r/20220221210545.115283-1-drt@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 23 February 2022, 01:06:27 UTC
342b641 net: dsa: fix panic when removing unoffloaded port from bridge If a bridged port is not offloaded to the hardware - either because the underlying driver does not implement the port_bridge_{join,leave} ops, or because the operation failed - then its dp->bridge pointer will be NULL when dsa_port_bridge_leave() is called. Avoid dereferncing NULL. This fixes the following splat when removing a port from a bridge: Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000 Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP CPU: 3 PID: 1119 Comm: brctl Tainted: G O 5.17.0-rc4-rt4 #1 Call trace: dsa_port_bridge_leave+0x8c/0x1e4 dsa_slave_changeupper+0x40/0x170 dsa_slave_netdevice_event+0x494/0x4d4 notifier_call_chain+0x80/0xe0 raw_notifier_call_chain+0x1c/0x24 call_netdevice_notifiers_info+0x5c/0xac __netdev_upper_dev_unlink+0xa4/0x200 netdev_upper_dev_unlink+0x38/0x60 del_nbp+0x1b0/0x300 br_del_if+0x38/0x114 add_del_if+0x60/0xa0 br_ioctl_stub+0x128/0x2dc br_ioctl_call+0x68/0xb0 dev_ifsioc+0x390/0x554 dev_ioctl+0x128/0x400 sock_do_ioctl+0xb4/0xf4 sock_ioctl+0x12c/0x4e0 __arm64_sys_ioctl+0xa8/0xf0 invoke_syscall+0x4c/0x110 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x28/0x84 el0_svc+0x1c/0x50 el0t_64_sync_handler+0xa8/0xb0 el0t_64_sync+0x17c/0x180 Code: f9402f00 f0002261 f9401302 913cc021 (a9401404) ---[ end trace 0000000000000000 ]--- Fixes: d3eed0e57d5d ("net: dsa: keep the bridge_dev and bridge_num as part of the same structure") Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220221203539.310690-1-alvin@pqrs.dk Signed-off-by: Jakub Kicinski <kuba@kernel.org> 23 February 2022, 01:03:02 UTC
ef527f9 net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends Whenever one of these functions pull all data from an skb in a frag_list, use consume_skb() instead of kfree_skb() to avoid polluting drop monitoring. Fixes: 6fa01ccd8830 ("skbuff: Add pskb_extract() helper function") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220220154052.1308469-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 23 February 2022, 00:32:35 UTC
5663b85 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net This is fixing up the use without proper initialization in patch 5/5 -o- Hi, The following patchset contains Netfilter fixes for net: 1) Missing #ifdef CONFIG_IP6_NF_IPTABLES in recent xt_socket fix. 2) Fix incorrect flow action array size in nf_tables. 3) Unregister flowtable hooks from netns exit path. 4) Fix missing limit object release, from Florian Westphal. 5) Memleak in nf_tables object update path, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 22 February 2022, 11:00:51 UTC
dad3bde netfilter: nf_tables: fix memory leak during stateful obj update stateful objects can be updated from the control plane. The transaction logic allocates a temporary object for this purpose. The ->init function was called for this object, so plain kfree() leaks resources. We must call ->destroy function of the object. nft_obj_destroy does this, but it also decrements the module refcount, but the update path doesn't increment it. To avoid special-casing the update object release, do module_get for the update case too and release it via nft_obj_destroy(). Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Cc: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> 22 February 2022, 07:28:04 UTC
1a58f84 netfilter: nft_limit: fix stateful object memory leak We need to provide a destroy callback to release the extra fields. Fixes: 3b9e2ea6c11b ("netfilter: nft_limit: move stateful fields out of expression data") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> 21 February 2022, 14:52:14 UTC
6069da4 netfilter: nf_tables: unregister flowtable hooks on netns exit Unregister flowtable hooks before they are releases via nf_tables_flowtable_destroy() otherwise hook core reports UAF. BUG: KASAN: use-after-free in nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142 Read of size 4 at addr ffff8880736f7438 by task syz-executor579/3666 CPU: 0 PID: 3666 Comm: syz-executor579 Not tainted 5.16.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] __dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106 dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106 lib/dump_stack.c:106 print_address_description+0x65/0x380 mm/kasan/report.c:247 mm/kasan/report.c:247 __kasan_report mm/kasan/report.c:433 [inline] __kasan_report mm/kasan/report.c:433 [inline] mm/kasan/report.c:450 kasan_report+0x19a/0x1f0 mm/kasan/report.c:450 mm/kasan/report.c:450 nf_hook_entries_grow+0x5a7/0x700 net/netfilter/core.c:142 net/netfilter/core.c:142 __nf_register_net_hook+0x27e/0x8d0 net/netfilter/core.c:429 net/netfilter/core.c:429 nf_register_net_hook+0xaa/0x180 net/netfilter/core.c:571 net/netfilter/core.c:571 nft_register_flowtable_net_hooks+0x3c5/0x730 net/netfilter/nf_tables_api.c:7232 net/netfilter/nf_tables_api.c:7232 nf_tables_newflowtable+0x2022/0x2cf0 net/netfilter/nf_tables_api.c:7430 net/netfilter/nf_tables_api.c:7430 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] net/netfilter/nfnetlink.c:652 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] net/netfilter/nfnetlink.c:652 nfnetlink_rcv+0x10e6/0x2550 net/netfilter/nfnetlink.c:652 net/netfilter/nfnetlink.c:652 __nft_release_hook() calls nft_unregister_flowtable_net_hooks() which only unregisters the hooks, then after RCU grace period, it is guaranteed that no packets add new entries to the flowtable (no flow offload rules and flowtable hooks are reachable from packet path), so it is safe to call nf_flow_table_free() which cleans up the remaining entries from the flowtable (both software and hardware) and it unbinds the flow_block. Fixes: ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()") Reported-by: syzbot+e918523f77e62790d6d9@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> 21 February 2022, 14:51:55 UTC
b6ad626 net: mdio-ipq4019: add delay after clock enable Experimentation shows that PHY detect might fail when the code attempts MDIO bus read immediately after clock enable. Add delay to stabilize the clock before bus access. PHY detect failure started to show after commit 7590fc6f80ac ("net: mdio: Demote probed message to debug print") that removed coincidental delay between clock enable and bus access. 10ms is meant to match the time it take to send the probed message over UART at 115200 bps. This might be a far overshoot. Fixes: 23a890d493e3 ("net: mdio: Add the reset function for IPQ MDIO driver") Signed-off-by: Baruch Siach <baruch.siach@siklu.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> 21 February 2022, 13:04:53 UTC
cc20cce gso: do not skip outer ip header in case of ipip and net_failover We encounter a tcp drop issue in our cloud environment. Packet GROed in host forwards to a VM virtio_net nic with net_failover enabled. VM acts as a IPVS LB with ipip encapsulation. The full path like: host gro -> vm virtio_net rx -> net_failover rx -> ipvs fullnat -> ipip encap -> net_failover tx -> virtio_net tx When net_failover transmits a ipip pkt (gso_type = 0x0103, which means SKB_GSO_TCPV4, SKB_GSO_DODGY and SKB_GSO_IPXIP4), there is no gso did because it supports TSO and GSO_IPXIP4. But network_header points to inner ip header. Call Trace: tcp4_gso_segment ------> return NULL inet_gso_segment ------> inner iph, network_header points to ipip_gso_segment inet_gso_segment ------> outer iph skb_mac_gso_segment Afterwards virtio_net transmits the pkt, only inner ip header is modified. And the outer one just keeps unchanged. The pkt will be dropped in remote host. Call Trace: inet_gso_segment ------> inner iph, outer iph is skipped skb_mac_gso_segment __skb_gso_segment validate_xmit_skb validate_xmit_skb_list sch_direct_xmit __qdisc_run __dev_queue_xmit ------> virtio_net dev_hard_start_xmit __dev_queue_xmit ------> net_failover ip_finish_output2 ip_output iptunnel_xmit ip_tunnel_xmit ipip_tunnel_xmit ------> ipip dev_hard_start_xmit __dev_queue_xmit ip_finish_output2 ip_output ip_forward ip_rcv __netif_receive_skb_one_core netif_receive_skb_internal napi_gro_receive receive_buf virtnet_poll net_rx_action The root cause of this issue is specific with the rare combination of SKB_GSO_DODGY and a tunnel device that adds an SKB_GSO_ tunnel option. SKB_GSO_DODGY is set from external virtio_net. We need to reset network header when callbacks.gso_segment() returns NULL. This patch also includes ipv6_gso_segment(), considering SIT, etc. Fixes: cb32f511a70b ("ipip: add GSO/TSO support") Signed-off-by: Tao Liu <thomas.liu@ucloud.cn> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> 21 February 2022, 11:41:30 UTC
5a34497 Merge branch 'bnxt_en-fixes' Michael Chan says: ==================== bnxt_en: Bug fixes This series contains bug fixes for FEC reporting, ethtool self test, multicast setup, devlink health reporting and live patching, and a firmware response timeout. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 20 February 2022, 13:47:15 UTC
1278d17 bnxt_en: Fix devlink fw_activate To install a livepatch, first flash the package to NVM, and then activate the patch through the "HWRM_FW_LIVEPATCH" fw command. To uninstall a patch from NVM, flash the removal package and then activate it through the "HWRM_FW_LIVEPATCH" fw command. The "HWRM_FW_LIVEPATCH" fw command has to consider following scenarios: 1. no patch in NVM and no patch active. Do nothing. 2. patch in NVM, but not active. Activate the patch currently in NVM. 3. patch is not in NVM, but active. Deactivate the patch. 4. patch in NVM and the patch active. Do nothing. Fix the code to handle these scenarios during devlink "fw_activate". To install and activate a live patch: devlink dev flash pci/0000:c1:00.0 file thor_patch.pkg devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset To remove and deactivate a live patch: devlink dev flash pci/0000:c1:00.0 file thor_patch_rem.pkg devlink -f dev reload pci/0000:c1:00.0 action fw_activate limit no_reset Fixes: 3c4153394e2c ("bnxt_en: implement firmware live patching") Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 February 2022, 13:47:15 UTC
b891106 bnxt_en: Increase firmware message response DMA wait time When polling for the firmware message response, we first poll for the response message header. Once the valid length is detected in the header, we poll for the valid bit at the end of the message which signals DMA completion. Normally, this poll time for DMA completion is extremely short (0 to a few usec). But on some devices under some rare conditions, it can be up to about 20 msec. Increase this delay to 50 msec and use udelay() for the first 10 usec for the common case, and usleep_range() beyond that. Also, change the error message to include the above delay time when printing the timeout value. Fixes: 3c8c20db769c ("bnxt_en: move HWRM API implementation into separate file") Reviewed-by: Vladimir Olovyannikov <vladimir.olovyannikov@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 February 2022, 13:47:15 UTC
0e0e3c5 bnxt_en: Restore the resets_reliable flag in bnxt_open() During ifdown, we call bnxt_inv_fw_health_reg() which will clear both the status_reliable and resets_reliable flags if these registers are mapped. This is correct because a FW reset during ifdown will clear these register mappings. If we detect that FW has gone through reset during the next ifup, we will remap these registers. But during normal ifup with no FW reset, we need to restore the resets_reliable flag otherwise we will not show the reset counter during devlink diagnose. Fixes: 8cc95ceb7087 ("bnxt_en: improve fw diagnose devlink health messages") Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 February 2022, 13:47:15 UTC
8cdb159 bnxt_en: Fix incorrect multicast rx mask setting when not requested We should setup multicast only when net_device flags explicitly has IFF_MULTICAST set. Otherwise we will incorrectly turn it on even when not asked. Fix it by only passing the multicast table to the firmware if IFF_MULTICAST is set. Fixes: 7d2837dd7a32 ("bnxt_en: Setup multicast properly after resetting device.") Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 February 2022, 13:47:14 UTC
cfcab3b bnxt_en: Fix occasional ethtool -t loopback test failures In the current code, we setup the port to PHY or MAC loopback mode and then transmit a test broadcast packet for the loopback test. This scheme fails sometime if the port is shared with management firmware that can also send packets. The driver may receive the management firmware's packet and the test will fail when the contents don't match the test packet. Change the test packet to use it's own MAC address as the destination and setup the port to only receive it's own MAC address. This should filter out other packets sent by management firmware. Fixes: 91725d89b97a ("bnxt_en: Add PHY loopback to ethtool self-test.") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 February 2022, 13:47:14 UTC
6758f93 bnxt_en: Fix offline ethtool selftest with RDMA enabled For offline (destructive) self tests, we need to stop the RDMA driver first. Otherwise, the RDMA driver will run into unrecoverable errors when destructive firmware tests are being performed. The irq_re_init parameter used in the half close and half open sequence when preparing the NIC for offline tests should be set to true because the RDMA driver will free all IRQs before the offline tests begin. Fixes: 55fd0cf320c3 ("bnxt_en: Add external loopback test to ethtool selftest.") Reviewed-by: Edwin Peer <edwin.peer@broadcom.com> Reviewed-by: Ben Li <ben.li@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 February 2022, 13:47:14 UTC
84d3c83 bnxt_en: Fix active FEC reporting to ethtool ethtool --show-fec <interface> does not show anything when the Active FEC setting in the chip is set to None. Fix it to properly return ETHTOOL_FEC_OFF in that case. Fixes: 8b2775890ad8 ("bnxt_en: Report FEC settings to ethtool.") Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> 20 February 2022, 13:47:14 UTC
b1a5983 netfilter: nf_tables_offload: incorrect flow offload action array size immediate verdict expression needs to allocate one slot in the flow offload action array, however, immediate data expression does not need to do so. fwd and dup expression need to allocate one slot, this is missing. Add a new offload_action interface to report if this expression needs to allocate one slot in the flow offload action array. Fixes: be2861dc36d7 ("netfilter: nft_{fwd,dup}_netdev: add offload support") Reported-and-tested-by: Nick Gregory <Nick.Gregory@Sophos.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> 20 February 2022, 00:22:20 UTC
8940e6b net: dsa: avoid call to __dev_set_promiscuity() while rtnl_mutex isn't held If the DSA master doesn't support IFF_UNICAST_FLT, then the following call path is possible: dsa_slave_switchdev_event_work -> dsa_port_host_fdb_add -> dev_uc_add -> __dev_set_rx_mode -> __dev_set_promiscuity Since the blamed commit, dsa_slave_switchdev_event_work() no longer holds rtnl_lock(), which triggers the ASSERT_RTNL() from __dev_set_promiscuity(). Taking rtnl_lock() around dev_uc_add() is impossible, because all the code paths that call dsa_flush_workqueue() do so from contexts where the rtnl_mutex is already held - so this would lead to an instant deadlock. dev_uc_add() in itself doesn't require the rtnl_mutex for protection. There is this comment in __dev_set_rx_mode() which assumes so: /* Unicast addresses changes may only happen under the rtnl, * therefore calling __dev_set_promiscuity here is safe. */ but it is from commit 4417da668c00 ("[NET]: dev: secondary unicast address support") dated June 2007, and in the meantime, commit f1f28aa3510d ("netdev: Add addr_list_lock to struct net_device."), dated July 2008, has added &dev->addr_list_lock to protect this instead of the global rtnl_mutex. Nonetheless, __dev_set_promiscuity() does assume rtnl_mutex protection, but it is the uncommon path of what we typically expect dev_uc_add() to do. So since only the uncommon path requires rtnl_lock(), just check ahead of time whether dev_uc_add() would result into a call to __dev_set_promiscuity(), and handle that condition separately. DSA already configures the master interface to be promiscuous if the tagger requires this. We can extend this to also cover the case where the master doesn't handle dev_uc_add() (doesn't support IFF_UNICAST_FLT), and on the premise that we'd end up making it promiscuous during operation anyway, either if a DSA slave has a non-inherited MAC address, or if the bridge notifies local FDB entries for its own MAC address, the address of a station learned on a foreign port, etc. Fixes: 0faf890fc519 ("net: dsa: drop rtnl_lock from dsa_slave_switchdev_event_work") Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 21:18:33 UTC
3d00827 net: dsa: microchip: fix bridging with more than two member ports Commit b3612ccdf284 ("net: dsa: microchip: implement multi-bridge support") plugged a packet leak between ports that were members of different bridges. Unfortunately, this broke another use case, namely that of more than two ports that are members of the same bridge. After that commit, when a port is added to a bridge, hardware bridging between other member ports of that bridge will be cleared, preventing packet exchange between them. Fix by ensuring that the Port VLAN Membership bitmap includes any existing ports in the bridge, not just the port being added. Fixes: b3612ccdf284 ("net: dsa: microchip: implement multi-bridge support") Signed-off-by: Svenning Sørensen <sss@secomea.com> Tested-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 16:22:46 UTC
5486f5b net: Force inlining of checksum functions in net/checksum.h All functions defined as static inline in net/checksum.h are meant to be inlined for performance reason. But since commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") the compiler is allowed to uninline functions when it wants. Fair enough in the general case, but for tiny performance critical checksum helpers that's counter-productive. The problem mainly arises when selecting CONFIG_CC_OPTIMISE_FOR_SIZE, Those helpers being 'static inline' in header files you suddenly find them duplicated many times in the resulting vmlinux. Here is a typical exemple when building powerpc pmac32_defconfig with CONFIG_CC_OPTIMISE_FOR_SIZE. csum_sub() appears 4 times: c04a23cc <csum_sub>: c04a23cc: 7c 84 20 f8 not r4,r4 c04a23d0: 7c 63 20 14 addc r3,r3,r4 c04a23d4: 7c 63 01 94 addze r3,r3 c04a23d8: 4e 80 00 20 blr ... c04a2ce8: 4b ff f6 e5 bl c04a23cc <csum_sub> ... c04a2d2c: 4b ff f6 a1 bl c04a23cc <csum_sub> ... c04a2d54: 4b ff f6 79 bl c04a23cc <csum_sub> ... c04a754c <csum_sub>: c04a754c: 7c 84 20 f8 not r4,r4 c04a7550: 7c 63 20 14 addc r3,r3,r4 c04a7554: 7c 63 01 94 addze r3,r3 c04a7558: 4e 80 00 20 blr ... c04ac930: 4b ff ac 1d bl c04a754c <csum_sub> ... c04ad264: 4b ff a2 e9 bl c04a754c <csum_sub> ... c04e3b08 <csum_sub>: c04e3b08: 7c 84 20 f8 not r4,r4 c04e3b0c: 7c 63 20 14 addc r3,r3,r4 c04e3b10: 7c 63 01 94 addze r3,r3 c04e3b14: 4e 80 00 20 blr ... c04e5788: 4b ff e3 81 bl c04e3b08 <csum_sub> ... c04e65c8: 4b ff d5 41 bl c04e3b08 <csum_sub> ... c0512d34 <csum_sub>: c0512d34: 7c 84 20 f8 not r4,r4 c0512d38: 7c 63 20 14 addc r3,r3,r4 c0512d3c: 7c 63 01 94 addze r3,r3 c0512d40: 4e 80 00 20 blr ... c0512dfc: 4b ff ff 39 bl c0512d34 <csum_sub> ... c05138bc: 4b ff f4 79 bl c0512d34 <csum_sub> ... Restore the expected behaviour by using __always_inline for all functions defined in net/checksum.h vmlinux size is even reduced by 256 bytes with this patch: text data bss dec hex filename 6980022 2515362 194384 9689768 93daa8 vmlinux.before 6979862 2515266 194384 9689512 93d9a8 vmlinux.now Fixes: ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly") Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 16:07:12 UTC
0033fce Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-02-18 This series contains updates to ice driver only. Wojciech fixes protocol matching for slow-path switchdev so that all packets are correctly redirected. Michal removes accidental unconditional setting of l4 port filtering flag. Jake adds locking to protect VF reset and removal to fix various issues that can be encountered when they race with each other. Tom Rix propagates an error and initializes a struct to resolve reported Clang issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 12:35:20 UTC
90141ed Merge branch 'mptcp-fixes' Mat Martineau says: ==================== mptcp: Fix address advertisement races and stabilize tests Patches 1, 2, and 7 modify two self tests to give consistent, accurate results by fixing timing issues and accounting for syncookie behavior. Paches 3-6 fix two races in overlapping address advertisement send and receive. Associated self tests are updated, including addition of two MIBs to enable testing and tracking dropped address events. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 12:28:01 UTC
e35f885 selftests: mptcp: be more conservative with cookie MPJ limits Since commit 2843ff6f36db ("mptcp: remote addresses fullmesh"), an MPTCP client can attempt creating multiple MPJ subflow simultaneusly. In such scenario the server, when syncookies are enabled, could end-up accepting incoming MPJ syn even above the configured subflow limit, as the such limit can be enforced in a reliable way only after the subflow creation. In case of syncookie, only after the 3rd ack reception. As a consequence the related self-tests case sporadically fails, as it verify that the server always accept the expected number of MPJ syn. Address the issues relaxing the MPJ syn number constrain. Note that the check on the accepted number of MPJ 3rd ack still remains intact. Fixes: 2843ff6f36db ("mptcp: remote addresses fullmesh") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 12:28:01 UTC
6ef84b1 selftests: mptcp: more robust signal race test The in kernel MPTCP PM implementation can process a single incoming add address option at any given time. In the mentioned test the server can surpass such limit. Let the setup cope with that allowing a faster add_addr retransmission. Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors") Fixes: f7efc7771eac ("mptcp: drop argument port from mptcp_pm_announce_addr") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/254 Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 12:28:00 UTC
f73c119 mptcp: add mibs counter for ignored incoming options The MPTCP in kernel path manager has some constraints on incoming addresses announce processing, so that in edge scenarios it can end-up dropping (ignoring) some of such announces. The above is not very limiting in practice since such scenarios are very uncommon and MPTCP will recover due to ADD_ADDR retransmissions. This patch adds a few MIB counters to account for such drop events to allow easier introspection of the critical scenarios. Fixes: f7efc7771eac ("mptcp: drop argument port from mptcp_pm_announce_addr") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 12:28:00 UTC
837cf45 mptcp: fix race in incoming ADD_ADDR option processing If an MPTCP endpoint received multiple consecutive incoming ADD_ADDR options, mptcp_pm_add_addr_received() can overwrite the current remote address value after the PM lock is released in mptcp_pm_nl_add_addr_received() and before such address is echoed. Fix the issue caching the remote address value a little earlier and always using the cached value after releasing the PM lock. Fixes: f7efc7771eac ("mptcp: drop argument port from mptcp_pm_announce_addr") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 12:28:00 UTC
98247bc mptcp: fix race in overlapping signal events After commit a88c9e496937 ("mptcp: do not block subflows creation on errors"), if a signal address races with a failing subflow creation, the subflow creation failure control path can trigger the selection of the next address to be announced while the current announced is still pending. The above will cause the unintended suppression of the ADD_ADDR announce. Fix the issue skipping the to-be-suppressed announce before it will mark an endpoint as already used. The relevant announce will be triggered again when the current one will complete. Fixes: a88c9e496937 ("mptcp: do not block subflows creation on errors") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 12:28:00 UTC
5b31dda selftests: mptcp: improve 'fair usage on close' stability The mentioned test has to wait for a subflow creation failure. The current code looks for TCP sockets in TW state and sometimes misses the relevant event. Switch to a more stable check, looking for the associated mib counter. Fixes: 46e967d187ed ("selftests: mptcp: add tests for subflow creation failure") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/257 Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 12:28:00 UTC
0cd33c5 selftests: mptcp: fix diag instability Instead of waiting for an arbitrary amount of time for the MPTCP MP_CAPABLE handshake to complete, explicitly wait for the relevant socket to enter into the established status. Additionally let the data transfer application use the slowest transfer mode available (-r), to cope with very slow host, or high jitter caused by hosting VMs. Fixes: df62f2ec3df6 ("selftests/mptcp: add diag interface tests") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/258 Reported-and-tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> 19 February 2022, 12:28:00 UTC
3a14d08 nfp: flower: Fix a potential leak in nfp_tunnel_add_shared_mac() ida_simple_get() returns an id between min (0) and max (NFP_MAX_MAC_INDEX) inclusive. So NFP_MAX_MAC_INDEX (0xff) is a valid id. In order for the error handling path to work correctly, the 'invalid' value for 'ida_idx' should not be in the 0..NFP_MAX_MAC_INDEX range, inclusive. So set it to -1. Fixes: 20cce8865098 ("nfp: flower: enable MAC address sharing for offloadable devs") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220218131535.100258-1-simon.horman@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 19 February 2022, 05:08:14 UTC
ba88b55 MAINTAINERS: rmnet: Update email addresses Switch to the quicinc.com ids. Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Link: https://lore.kernel.org/r/1645174218-32632-1-git-send-email-quic_subashab@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 19 February 2022, 04:42:09 UTC
5a2aba7 net: mvpp2: always set port pcs ops Booting a MACCHIATObin with 5.17, the system OOPs with a null pointer deref when the network is started. This is caused by the pcs->ops structure being null in mcpp2_acpi_start() when it tries to call pcs_config(). Hoisting the code which sets pcs_gmac.ops and pcs_xlg.ops, assuring they are always set, fixes the problem. The OOPs looks like: [ 18.687760] Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000010 [ 18.698561] Mem abort info: [ 18.698564] ESR = 0x96000004 [ 18.698567] EC = 0x25: DABT (current EL), IL = 32 bits [ 18.709821] SET = 0, FnV = 0 [ 18.714292] EA = 0, S1PTW = 0 [ 18.718833] FSC = 0x04: level 0 translation fault [ 18.725126] Data abort info: [ 18.729408] ISV = 0, ISS = 0x00000004 [ 18.734655] CM = 0, WnR = 0 [ 18.738933] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000111bbf000 [ 18.745409] [0000000000000010] pgd=0000000000000000, p4d=0000000000000000 [ 18.752235] Internal error: Oops: 96000004 [#1] SMP [ 18.757134] Modules linked in: rfkill ip_set nf_tables nfnetlink qrtr sunrpc vfat fat omap_rng fuse zram xfs crct10dif_ce mvpp2 ghash_ce sbsa_gwdt phylink xhci_plat_hcd ahci_plam [ 18.773481] CPU: 0 PID: 681 Comm: NetworkManager Not tainted 5.17.0-0.rc3.89.fc36.aarch64 #1 [ 18.781954] Hardware name: Marvell Armada 7k/8k Family Board /Armada 7k/8k Family Board , BIOS EDK II Jun 4 2019 [ 18.795222] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 18.802213] pc : mvpp2_start_dev+0x2b0/0x300 [mvpp2] [ 18.807208] lr : mvpp2_start_dev+0x298/0x300 [mvpp2] [ 18.812197] sp : ffff80000b4732c0 [ 18.815522] x29: ffff80000b4732c0 x28: 0000000000000000 x27: ffffccab38ae57f8 [ 18.822689] x26: ffff6eeb03065a10 x25: ffff80000b473a30 x24: ffff80000b4735b8 [ 18.829855] x23: 0000000000000000 x22: 00000000000001e0 x21: ffff6eeb07b6ab68 [ 18.837021] x20: ffff6eeb07b6ab30 x19: ffff6eeb07b6a9c0 x18: 0000000000000014 [ 18.844187] x17: 00000000f6232bfe x16: ffffccab899b1dc0 x15: 000000006a30f9fa [ 18.851353] x14: 000000003b77bd50 x13: 000006dc896f0e8e x12: 001bbbfccfd0d3a2 [ 18.858519] x11: 0000000000001528 x10: 0000000000001548 x9 : ffffccab38ad0fb0 [ 18.865685] x8 : ffff80000b473330 x7 : 0000000000000000 x6 : 0000000000000000 [ 18.872851] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000b4732f8 [ 18.880017] x2 : 000000000000001a x1 : 0000000000000002 x0 : ffff6eeb07b6ab68 [ 18.887183] Call trace: [ 18.889637] mvpp2_start_dev+0x2b0/0x300 [mvpp2] [ 18.894279] mvpp2_open+0x134/0x2b4 [mvpp2] [ 18.898483] __dev_open+0x128/0x1e4 [ 18.901988] __dev_change_flags+0x17c/0x1d0 [ 18.906187] dev_change_flags+0x30/0x70 [ 18.910038] do_setlink+0x278/0xa7c [ 18.913540] __rtnl_newlink+0x44c/0x7d0 [ 18.917391] rtnl_newlink+0x5c/0x8c [ 18.920892] rtnetlink_rcv_msg+0x254/0x314 [ 18.925006] netlink_rcv_skb+0x48/0x10c [ 18.928858] rtnetlink_rcv+0x24/0x30 [ 18.932449] netlink_unicast+0x290/0x2f4 [ 18.936386] netlink_sendmsg+0x1d0/0x41c [ 18.940323] sock_sendmsg+0x60/0x70 [ 18.943825] ____sys_sendmsg+0x248/0x260 [ 18.947762] ___sys_sendmsg+0x74/0xa0 [ 18.951438] __sys_sendmsg+0x64/0xcc [ 18.955027] __arm64_sys_sendmsg+0x30/0x40 [ 18.959140] invoke_syscall+0x50/0x120 [ 18.962906] el0_svc_common.constprop.0+0x4c/0xf4 [ 18.967629] do_el0_svc+0x30/0x9c [ 18.970958] el0_svc+0x28/0xb0 [ 18.974025] el0t_64_sync_handler+0x10c/0x140 [ 18.978400] el0t_64_sync+0x1a4/0x1a8 [ 18.982078] Code: 52800004 b9416262 aa1503e0 52800041 (f94008a5) [ 18.988196] ---[ end trace 0000000000000000 ]--- Fixes: cff056322372 ("net: mvpp2: use .mac_select_pcs() interface") Suggested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Marcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/r/20220214231852.3331430-1-jeremy.linton@arm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 19 February 2022, 04:14:16 UTC
5950bdc ice: initialize local variable 'tlv' Clang static analysis reports this issues ice_common.c:5008:21: warning: The left expression of the compound assignment is an uninitialized value. The computed value will also be garbage ldo->phy_type_low |= ((u64)buf << (i * 16)); ~~~~~~~~~~~~~~~~~ ^ When called from ice_cfg_phy_fec() ldo is the uninitialized local variable tlv. So initialize. Fixes: ea78ce4dab05 ("ice: add link lenient and default override support") Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 18 February 2022, 21:28:39 UTC
ed22d9c ice: check the return of ice_ptp_gettimex64 Clang static analysis reports this issue time64.h:69:50: warning: The left operand of '+' is a garbage value set_normalized_timespec64(&ts_delta, lhs.tv_sec + rhs.tv_sec, ~~~~~~~~~~ ^ In ice_ptp_adjtime_nonatomic(), the timespec64 variable 'now' is set by ice_ptp_gettimex64(). This function can fail with -EBUSY, so 'now' can have a gargbage value. So check the return. Fixes: 06c16d89d2cb ("ice: register 1588 PTP clock device object for E810 devices") Signed-off-by: Tom Rix <trix@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 18 February 2022, 21:28:39 UTC
fadead8 ice: fix concurrent reset and removal of VFs Commit c503e63200c6 ("ice: Stop processing VF messages during teardown") introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is intended to prevent some issues with concurrently handling messages from VFs while tearing down the VFs. This change was motivated by crashes caused while tearing down and bringing up VFs in rapid succession. It turns out that the fix actually introduces issues with the VF driver caused because the PF no longer responds to any messages sent by the VF during its .remove routine. This results in the VF potentially removing its DMA memory before the PF has shut down the device queues. Additionally, the fix doesn't actually resolve concurrency issues within the ice driver. It is possible for a VF to initiate a reset just prior to the ice driver removing VFs. This can result in the remove task concurrently operating while the VF is being reset. This results in similar memory corruption and panics purportedly fixed by that commit. Fix this concurrency at its root by protecting both the reset and removal flows using the existing VF cfg_lock. This ensures that we cannot remove the VF while any outstanding critical tasks such as a virtchnl message or a reset are occurring. This locking change also fixes the root cause originally fixed by commit c503e63200c6 ("ice: Stop processing VF messages during teardown"), so we can simply revert it. Note that I kept these two changes together because simply reverting the original commit alone would leave the driver vulnerable to worse race conditions. Fixes: c503e63200c6 ("ice: Stop processing VF messages during teardown") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 18 February 2022, 21:28:38 UTC
932645c ice: fix setting l4 port flag when adding filter Accidentally filter flag for none encapsulated l4 port field is always set. Even if user wants to add encapsulated l4 port field. Remove this unnecessary flag setting. Fixes: 9e300987d4a81 ("ice: VXLAN and Geneve TC support") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 18 February 2022, 21:28:18 UTC
b70bc06 ice: Match on all profiles in slow-path In switchdev mode, slow-path rules need to match all protocols, in order to correctly redirect unfiltered or missed packets to the uplink. To set this up for the virtual function to uplink flow, the rule that redirects packets to the control VSI must have the tunnel type set to ICE_SW_TUN_AND_NON_TUN. As a result of that new tunnel type being set, ice_get_compat_fv_bitmap will select ICE_PROF_ALL. At that point all profiles would be selected for this rule, resulting in the desired behavior. Without this change slow-path would not work with tunnel protocols. Fixes: 8b032a55c1bd ("ice: low level support for tunnels") Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 18 February 2022, 21:22:06 UTC
b352c34 net: ll_temac: check the return value of devm_kmalloc() devm_kmalloc() returns a pointer to allocated memory on success, NULL on failure. While lp->indirect_lock is allocated by devm_kmalloc() without proper check. It is better to check the value of it to prevent potential wrong memory access. Fixes: f14f5c11f051 ("net: ll_temac: Support indirect_mutex share within TEMAC IP") Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 18 February 2022, 12:00:44 UTC
a1cdec5 net-timestamp: convert sk->sk_tskey to atomic_t UDP sendmsg() can be lockless, this is causing all kinds of data races. This patch converts sk->sk_tskey to remove one of these races. BUG: KCSAN: data-race in __ip_append_data / __ip_append_data read to 0xffff8881035d4b6c of 4 bytes by task 8877 on cpu 1: __ip_append_data+0x1c1/0x1de0 net/ipv4/ip_output.c:994 ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636 udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae write to 0xffff8881035d4b6c of 4 bytes by task 8880 on cpu 0: __ip_append_data+0x1d8/0x1de0 net/ipv4/ip_output.c:994 ip_make_skb+0x13f/0x2d0 net/ipv4/ip_output.c:1636 udp_sendmsg+0x12bd/0x14c0 net/ipv4/udp.c:1249 inet_sendmsg+0x5f/0x80 net/ipv4/af_inet.c:819 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmmsg+0x267/0x4c0 net/socket.c:2553 __do_sys_sendmmsg net/socket.c:2582 [inline] __se_sys_sendmmsg net/socket.c:2579 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x0000054d -> 0x0000054e Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 8880 Comm: syz-executor.5 Not tainted 5.17.0-rc2-syzkaller-00167-gdcb85f85fa6f-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 09c2d251b707 ("net-timestamp: add key to disambiguate concurrent datagrams") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> 18 February 2022, 11:14:52 UTC
e9da0b5 sr9700: sanity check for packet length A malicious device can leak heap data to user space providing bogus frame lengths. Introduce a sanity check. Signed-off-by: Oliver Neukum <oneukum@suse.com> Reviewed-by: Grant Grundler <grundler@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net> 18 February 2022, 11:05:08 UTC
2f131de net/sched: act_ct: Fix flow table lookup after ct clear or switching zones Flow table lookup is skipped if packet either went through ct clear action (which set the IP_CT_UNTRACKED flag on the packet), or while switching zones and there is already a connection associated with the packet. This will result in no SW offload of the connection, and the and connection not being removed from flow table with TCP teardown (fin/rst packet). To fix the above, remove these unneccary checks in flow table lookup. Fixes: 46475bb20f4b ("net/sched: act_ct: Software offload of established flows") Signed-off-by: Paul Blakey <paulb@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 18 February 2022, 11:02:48 UTC
4224cfd net-sysfs: add check for netdevice being present to speed_show When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 18 February 2022, 10:59:11 UTC
efe4186 drivers: hamradio: 6pack: fix UAF bug caused by mod_timer() When a 6pack device is detaching, the sixpack_close() will act to cleanup necessary resources. Although del_timer_sync() in sixpack_close() won't return if there is an active timer, one could use mod_timer() in sp_xmit_on_air() to wake up timer again by calling userspace syscall such as ax25_sendmsg(), ax25_connect() and ax25_ioctl(). This unexpected waked handler, sp_xmit_on_air(), realizes nothing about the undergoing cleanup and may still call pty_write() to use driver layer resources that have already been released. One of the possible race conditions is shown below: (USE) | (FREE) ax25_sendmsg() | ax25_queue_xmit() | ... | sp_xmit() | sp_encaps() | sixpack_close() sp_xmit_on_air() | del_timer_sync(&sp->tx_t) mod_timer(&sp->tx_t,...) | ... | unregister_netdev() | ... (wait a while) | tty_release() | tty_release_struct() | release_tty() sp_xmit_on_air() | tty_kref_put(tty_struct) //FREE pty_write(tty_struct) //USE | ... The corresponding fail log is shown below: =============================================================== BUG: KASAN: use-after-free in __run_timers.part.0+0x170/0x470 Write of size 8 at addr ffff88800a652ab8 by task swapper/2/0 ... Call Trace: ... queue_work_on+0x3f/0x50 pty_write+0xcd/0xe0pty_write+0xcd/0xe0 sp_xmit_on_air+0xb2/0x1f0 call_timer_fn+0x28/0x150 __run_timers.part.0+0x3c2/0x470 run_timer_softirq+0x3b/0x80 __do_softirq+0xf1/0x380 ... This patch reorders the del_timer_sync() after the unregister_netdev() to avoid UAF bugs. Because the unregister_netdev() is well synchronized, it flushs out any pending queues, waits the refcount of net_device decreases to zero and removes net_device from kernel. There is not any running routines after executing unregister_netdev(). Therefore, we could not arouse timer from userspace again. Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net> 18 February 2022, 10:58:17 UTC
7a2fb91 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2022-02-17 We've added 8 non-merge commits during the last 7 day(s) which contain a total of 8 files changed, 119 insertions(+), 15 deletions(-). The main changes are: 1) Add schedule points in map batch ops, from Eric. 2) Fix bpf_msg_push_data with len 0, from Felix. 3) Fix crash due to incorrect copy_map_value, from Kumar. 4) Fix crash due to out of bounds access into reg2btf_ids, from Kumar. 5) Fix a bpf_timer initialization issue with clang, from Yonghong. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Add schedule points in batch ops bpf: Fix crash due to out of bounds access into reg2btf_ids. selftests: bpf: Check bpf_msg_push_data return value bpf: Fix a bpf_timer initialization issue bpf: Emit bpf_timer in vmlinux BTF selftests/bpf: Add test for bpf_timer overwriting crash bpf: Fix crash due to incorrect copy_map_value bpf: Do not try bpf_msg_push_data with len 0 ==================== Link: https://lore.kernel.org/r/20220217190000.37925-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 20:01:55 UTC
8b97cae Merge tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and netfilter. Current release - regressions: - dsa: lantiq_gswip: fix use after free in gswip_remove() - smc: avoid overwriting the copies of clcsock callback functions Current release - new code bugs: - iwlwifi: - fix use-after-free when no FW is present - mei: fix the pskb_may_pull check in ipv4 - mei: retry mapping the shared area - mvm: don't feed the hardware RFKILL into iwlmei Previous releases - regressions: - ipv6: mcast: use rcu-safe version of ipv6_get_lladdr() - tipc: fix wrong publisher node address in link publications - iwlwifi: mvm: don't send SAR GEO command for 3160 devices, avoid FW assertion - bgmac: make idm and nicpm resource optional again - atl1c: fix tx timeout after link flap Previous releases - always broken: - vsock: remove vsock from connected table when connect is interrupted by a signal - ping: change destination interface checks to match raw sockets - crypto: af_alg - get rid of alg_memory_allocated to avoid confusing semantics (and null-deref) after SO_RESERVE_MEM was added - ipv6: make exclusive flowlabel checks per-netns - bonding: force carrier update when releasing slave - sched: limit TC_ACT_REPEAT loops - bridge: multicast: notify switchdev driver whenever MC processing gets disabled because of max entries reached - wifi: brcmfmac: fix crash in brcm_alt_fw_path when WLAN not found - iwlwifi: fix locking when "HW not ready" - phy: mediatek: remove PHY mode check on MT7531 - dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLAN - dsa: lan9303: - fix polarity of reset during probe - fix accelerated VLAN handling" * tag 'net-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (65 commits) bonding: force carrier update when releasing slave nfp: flower: netdev offload check for ip6gretap ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt ipv4: fix data races in fib_alias_hw_flags_set net: dsa: lan9303: add VLAN IDs to master device net: dsa: lan9303: handle hwaccel VLAN tags vsock: remove vsock from connected table when connect is interrupted by a signal Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname" ping: fix the dif and sdif check in ping_lookup net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 net: sched: limit TC_ACT_REPEAT loops tipc: fix wrong notification node addresses net: dsa: lantiq_gswip: fix use after free in gswip_remove() ipv6: per-netns exclusive flowlabel checks net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled CDC-NCM: avoid overflow in sanity checking mctp: fix use after free net: mscc: ocelot: fix use-after-free in ocelot_vlan_del() bonding: fix data-races around agg_select_timer dpaa2-eth: Initialize mutex used in one step timestamping path ... 17 February 2022, 19:33:59 UTC
a6ab75c bonding: force carrier update when releasing slave In __bond_release_one(), bond_set_carrier() is only called when bond device has no slave. Therefore, if we remove the up slave from a master with two slaves and keep the down slave, the master will remain up. Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond)) statement. Reproducer: $ insmod bonding.ko mode=0 miimon=100 max_bonds=2 $ ifconfig bond0 up $ ifenslave bond0 eth0 eth1 $ ifconfig eth0 down $ ifenslave -d bond0 eth1 $ cat /proc/net/bonding/bond0 Fixes: ff59c4563a8d ("[PATCH] bonding: support carrier state for master") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 18:55:21 UTC
75134f1 bpf: Add schedule points in batch ops syzbot reported various soft lockups caused by bpf batch operations. INFO: task kworker/1:1:27 blocked for more than 140 seconds. INFO: task hung in rcu_barrier Nothing prevents batch ops to process huge amount of data, we need to add schedule points in them. Note that maybe_wait_bpf_programs(map) calls from generic_map_delete_batch() can be factorized by moving the call after the loop. This will be done later in -next tree once we get this fix merged, unless there is strong opinion doing this optimization sooner. Fixes: aa2e93b8e58e ("bpf: Add generic support for update and delete batch ops") Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: Brian Vazquez <brianvv@google.com> Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com 17 February 2022, 18:48:26 UTC
a3580ac fs/file_table: fix adding missing kmemleak_not_leak() Commit b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file") fixed a regression, however it failed to add a kmemleak_not_leak(). Fixes: b42bc9a3c511 ("Fix regression due to "fs: move binfmt_misc sysctl to its own file") Reported-by: Tong Zhang <ztong0001@gmail.com> Cc: Tong Zhang <ztong0001@gmail.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 17 February 2022, 18:23:19 UTC
2dd3a8a Merge tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix corrupt inject files when only last branch option is enabled with ARM CoreSight ETM - Fix use-after-free for realloc(..., 0) in libsubcmd, found by gcc 12 - Defer freeing string after possible strlen() on it in the BPF loader, found by gcc 12 - Avoid early exit in 'perf trace' due SIGCHLD from non-workload processes - Fix arm64 perf_event_attr 'perf test's wrt --call-graph initialization - Fix libperf 32-bit build for 'perf test' wrt uint64_t printf - Fix perf_cpu_map__for_each_cpu macro in libperf, providing access to the CPU iterator - Sync linux/perf_event.h UAPI with the kernel sources - Update Jiri Olsa's email address in MAINTAINERS * tag 'perf-tools-fixes-for-v5.17-2022-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf bpf: Defer freeing string after possible strlen() on it perf test: Fix arm64 perf_event_attr tests wrt --call-graph initialization libsubcmd: Fix use-after-free for realloc(..., 0) libperf: Fix perf_cpu_map__for_each_cpu macro perf cs-etm: Fix corrupt inject files when only last branch option is enabled perf cs-etm: No-op refactor of synth opt usage libperf: Fix 32-bit build for tests uint64_t printf tools headers UAPI: Sync linux/perf_event.h with the kernel sources perf trace: Avoid early exit due SIGCHLD from non-workload processes MAINTAINERS: Update Jiri's email address 17 February 2022, 18:06:09 UTC
edbd6c6 Merge tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module fix from Luis Chamberlain: "Fixes module decompression when CONFIG_SYSFS=n The only fix trickled down for v5.17-rc cycle so far is the fix for module decompression when CONFIG_SYSFS=n. This was reported through 0-day" * tag 'modules-5.17-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: module: fix building with sysfs disabled 17 February 2022, 17:54:00 UTC
7dbcda5 nfp: flower: netdev offload check for ip6gretap IPv6 GRE tunnels are not being offloaded, this is caused by a missing netdev offload check. The functionality of IPv6 GRE tunnel offloading was previously added but this check was not included. Adding the ip6gretap check allows IPv6 GRE tunnels to be offloaded correctly. Fixes: f7536ffb0986 ("nfp: flower: Allow ipv6gretap interface for offloading") Signed-off-by: Danie du Toit <danie.dutoit@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20220217124820.40436-1-louis.peens@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 17:50:45 UTC
d95d632 ipv6: fix data-race in fib6_info_hw_flags_set / fib6_purge_rt Because fib6_info_hw_flags_set() is called without any synchronization, all accesses to gi6->offload, fi->trap and fi->offload_failed need some basic protection like READ_ONCE()/WRITE_ONCE(). BUG: KCSAN: data-race in fib6_info_hw_flags_set / fib6_purge_rt read to 0xffff8881087d5886 of 1 bytes by task 13953 on cpu 0: fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1007 [inline] fib6_purge_rt+0x4f/0x580 net/ipv6/ip6_fib.c:1033 fib6_del_route net/ipv6/ip6_fib.c:1983 [inline] fib6_del+0x696/0x890 net/ipv6/ip6_fib.c:2028 __ip6_del_rt net/ipv6/route.c:3876 [inline] ip6_del_rt+0x83/0x140 net/ipv6/route.c:3891 __ipv6_dev_ac_dec+0x2b5/0x370 net/ipv6/anycast.c:374 ipv6_dev_ac_dec net/ipv6/anycast.c:387 [inline] __ipv6_sock_ac_close+0x141/0x200 net/ipv6/anycast.c:207 ipv6_sock_ac_close+0x79/0x90 net/ipv6/anycast.c:220 inet6_release+0x32/0x50 net/ipv6/af_inet6.c:476 __sock_release net/socket.c:650 [inline] sock_close+0x6c/0x150 net/socket.c:1318 __fput+0x295/0x520 fs/file_table.c:280 ____fput+0x11/0x20 fs/file_table.c:313 task_work_run+0x8e/0x110 kernel/task_work.c:164 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:175 [inline] exit_to_user_mode_prepare+0x160/0x190 kernel/entry/common.c:207 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline] syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300 do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86 entry_SYSCALL_64_after_hwframe+0x44/0xae write to 0xffff8881087d5886 of 1 bytes by task 1912 on cpu 1: fib6_info_hw_flags_set+0x155/0x3b0 net/ipv6/route.c:6230 nsim_fib6_rt_hw_flags_set drivers/net/netdevsim/fib.c:668 [inline] nsim_fib6_rt_add drivers/net/netdevsim/fib.c:691 [inline] nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:756 [inline] nsim_fib6_event drivers/net/netdevsim/fib.c:853 [inline] nsim_fib_event drivers/net/netdevsim/fib.c:886 [inline] nsim_fib_event_work+0x284f/0x2cf0 drivers/net/netdevsim/fib.c:1477 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 worker_thread+0x616/0xa70 kernel/workqueue.c:2454 kthread+0x2c7/0x2e0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 value changed: 0x22 -> 0x2a Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 1912 Comm: kworker/1:3 Not tainted 5.16.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events nsim_fib_event_work Fixes: 0c5fcf9e249e ("IPv6: Add "offload failed" indication to routes") Fixes: bb3c4ab93e44 ("ipv6: Add "offload" and "trap" indications to routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amit Cohen <amcohen@nvidia.com> Cc: Ido Schimmel <idosch@nvidia.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220216173217.3792411-2-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 17:48:24 UTC
9fcf986 ipv4: fix data races in fib_alias_hw_flags_set fib_alias_hw_flags_set() can be used by concurrent threads, and is only RCU protected. We need to annotate accesses to following fields of struct fib_alias: offload, trap, offload_failed Because of READ_ONCE()WRITE_ONCE() limitations, make these field u8. BUG: KCSAN: data-race in fib_alias_hw_flags_set / fib_alias_hw_flags_set read to 0xffff888134224a6a of 1 bytes by task 2013 on cpu 1: fib_alias_hw_flags_set+0x28a/0x470 net/ipv4/fib_trie.c:1050 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline] nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline] nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline] nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline] nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline] nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 process_scheduled_works kernel/workqueue.c:2370 [inline] worker_thread+0x7df/0xa70 kernel/workqueue.c:2456 kthread+0x1bf/0x1e0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 write to 0xffff888134224a6a of 1 bytes by task 4872 on cpu 0: fib_alias_hw_flags_set+0x2d5/0x470 net/ipv4/fib_trie.c:1054 nsim_fib4_rt_hw_flags_set drivers/net/netdevsim/fib.c:350 [inline] nsim_fib4_rt_add drivers/net/netdevsim/fib.c:367 [inline] nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:429 [inline] nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline] nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline] nsim_fib_event_work+0x1852/0x2cf0 drivers/net/netdevsim/fib.c:1477 process_one_work+0x3f6/0x960 kernel/workqueue.c:2307 process_scheduled_works kernel/workqueue.c:2370 [inline] worker_thread+0x7df/0xa70 kernel/workqueue.c:2456 kthread+0x1bf/0x1e0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 value changed: 0x00 -> 0x02 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 4872 Comm: kworker/0:0 Not tainted 5.17.0-rc3-syzkaller-00188-g1d41d2e82623-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events nsim_fib_event_work Fixes: 90b93f1b31f8 ("ipv4: Add "offload" and "trap" indications to routes") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220216173217.3792411-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 17:48:24 UTC
430065e net: dsa: lan9303: add VLAN IDs to master device If the master device does VLAN filtering, the IDs used by the switch must be added for any frames to be received. Do this in the port_enable() function, and remove them in port_disable(). Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 17:32:13 UTC
017b355 net: dsa: lan9303: handle hwaccel VLAN tags Check for a hwaccel VLAN tag on rx and use it if present. Otherwise, use __skb_vlan_pop() like the other tag parsers do. This fixes the case where the VLAN tag has already been consumed by the master. Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303") Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 17:32:06 UTC
80d47f5 mm: don't try to NUMA-migrate COW pages that have other uses Oded Gabbay reports that enabling NUMA balancing causes corruption with his Gaudi accelerator test load: "All the details are in the bug, but the bottom line is that somehow, this patch causes corruption when the numa balancing feature is enabled AND we don't use process affinity AND we use GUP to pin pages so our accelerator can DMA to/from system memory. Either disabling numa balancing, using process affinity to bind to specific numa-node or reverting this patch causes the bug to disappear" and Oded bisected the issue to commit 09854ba94c6a ("mm: do_wp_page() simplification"). Now, the NUMA balancing shouldn't actually be changing the writability of a page, and as such shouldn't matter for COW. But it appears it does. Suspicious. However, regardless of that, the condition for enabling NUMA faults in change_pte_range() is nonsensical. It uses "page_mapcount(page)" to decide if a COW page should be NUMA-protected or not, and that makes absolutely no sense. The number of mappings a page has is irrelevant: not only does GUP get a reference to a page as in Oded's case, but the other mappings migth be paged out and the only reference to them would be in the page count. Since we should never try to NUMA-balance a page that we can't move anyway due to other references, just fix the code to use 'page_count()'. Oded confirms that that fixes his issue. Now, this does imply that something in NUMA balancing ends up changing page protections (other than the obvious one of making the page inaccessible to get the NUMA faulting information). Otherwise the COW simplification wouldn't matter - since doing the GUP on the page would make sure it's writable. The cause of that permission change would be good to figure out too, since it clearly results in spurious COW events - but fixing the nonsensical test that just happened to work before is obviously the CorrectThing(tm) to do regardless. Fixes: 09854ba94c6a ("mm: do_wp_page() simplification") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215616 Link: https://lore.kernel.org/all/CAFCwf10eNmwq2wD71xjUhqkvv5+_pJMR1nPug2RqNDcFT4H86Q@mail.gmail.com/ Reported-and-tested-by: Oded Gabbay <oded.gabbay@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> 17 February 2022, 16:57:47 UTC
b920849 vsock: remove vsock from connected table when connect is interrupted by a signal vsock_connect() expects that the socket could already be in the TCP_ESTABLISHED state when the connecting task wakes up with a signal pending. If this happens the socket will be in the connected table, and it is not removed when the socket state is reset. In this situation it's common for the process to retry connect(), and if the connection is successful the socket will be added to the connected table a second time, corrupting the list. Prevent this by calling vsock_remove_connected() if a signal is received while waiting for a connection. This is harmless if the socket is not in the connected table, and if it is in the table then removing it will prevent list corruption from a double add. Note for backporting: this patch requires d5afa82c977e ("vsock: correct removal of socket from the list"), which is in all current stable trees except 4.9.y. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Seth Forshee <sforshee@digitalocean.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20220217141312.2297547-1-sforshee@digitalocean.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 16:56:02 UTC
6aba04e Revert "net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname" This reverts commit 3710e80952cf2dc48257ac9f145b117b5f74e0a5. Since idm_base and nicpm_base are still optional resources not present on all platforms, this breaks the driver for everything except Northstar 2 (which has both). The same change was already reverted once with 755f5738ff98 ("net: broadcom: fix a mistake about ioremap resource"). So let's do it again. Fixes: 3710e80952cf ("net: ethernet: bgmac: Use devm_platform_ioremap_resource_byname") Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> [florian: Added comments to explain the resources are optional] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220216184634.2032460-1-f.fainelli@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 16:45:34 UTC
35a79e6 ping: fix the dif and sdif check in ping_lookup When 'ping' changes to use PING socket instead of RAW socket by: # sysctl -w net.ipv4.ping_group_range="0 100" There is another regression caused when matching sk_bound_dev_if and dif, RAW socket is using inet_iif() while PING socket lookup is using skb->dev->ifindex, the cmd below fails due to this: # ip link add dummy0 type dummy # ip link set dummy0 up # ip addr add 192.168.111.1/24 dev dummy0 # ping -I dummy0 192.168.111.1 -c1 The issue was also reported on: https://github.com/iputils/iputils/issues/104 But fixed in iputils in a wrong way by not binding to device when destination IP is on device, and it will cause some of kselftests to fail, as Jianlin noticed. This patch is to use inet(6)_iif and inet(6)_sdif to get dif and sdif for PING socket, and keep consistent with RAW socket. Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 17 February 2022, 15:01:20 UTC
21e8a96 net: usb: cdc_mbim: avoid altsetting toggling for Telit FN990 Add quirk CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE for Telit FN990 0x1071 composition in order to avoid bind error. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 17 February 2022, 14:50:43 UTC
31ded15 perf bpf: Defer freeing string after possible strlen() on it This was detected by the gcc in Fedora Rawhide's gcc: 50 11.01 fedora:rawhide : FAIL gcc version 12.0.1 20220205 (Red Hat 12.0.1-0) (GCC) inlined from 'bpf__config_obj' at util/bpf-loader.c:1242:9: util/bpf-loader.c:1225:34: error: pointer 'map_opt' may be used after 'free' [-Werror=use-after-free] 1225 | *key_scan_pos += strlen(map_opt); | ^~~~~~~~~~~~~~~ util/bpf-loader.c:1223:9: note: call to 'free' here 1223 | free(map_name); | ^~~~~~~~~~~~~~ cc1: all warnings being treated as errors So do the calculations on the pointer before freeing it. Fixes: 04f9bf2bac72480c ("perf bpf-loader: Add missing '*' for key_scan_pos") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang ShaoBo <bobo.shaobowang@huawei.com> Link: https://lore.kernel.org/lkml/Yg1VtQxKrPpS3uNA@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> 17 February 2022, 10:15:09 UTC
5740d06 net: sched: limit TC_ACT_REPEAT loops We have been living dangerously, at the mercy of malicious users, abusing TC_ACT_REPEAT, as shown by this syzpot report [1]. Add an arbitrary limit (32) to the number of times an action can return TC_ACT_REPEAT. v2: switch the limit to 32 instead of 10. Use net_warn_ratelimited() instead of pr_err_once(). [1] (C repro available on demand) rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 1-...!: (10500 ticks this GP) idle=021/1/0x4000000000000000 softirq=5592/5592 fqs=0 (t=10502 jiffies g=5305 q=190) rcu: rcu_preempt kthread timer wakeup didn't happen for 10502 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 rcu: Possible timer handling issue on cpu=0 timer-softirq=3527 rcu: rcu_preempt kthread starved for 10505 jiffies! g5305 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:I stack:29344 pid: 14 ppid: 2 flags:0x00004000 Call Trace: <TASK> context_switch kernel/sched/core.c:4986 [inline] __schedule+0xab2/0x4db0 kernel/sched/core.c:6295 schedule+0xd2/0x260 kernel/sched/core.c:6368 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1963 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2136 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 1 to CPUs 0: NMI backtrace for cpu 0 CPU: 0 PID: 3646 Comm: syz-executor358 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rep_nop arch/x86/include/asm/vdso/processor.h:13 [inline] RIP: 0010:cpu_relax arch/x86/include/asm/vdso/processor.h:18 [inline] RIP: 0010:pv_wait_head_or_lock kernel/locking/qspinlock_paravirt.h:437 [inline] RIP: 0010:__pv_queued_spin_lock_slowpath+0x3b8/0xb40 kernel/locking/qspinlock.c:508 Code: 48 89 eb c6 45 01 01 41 bc 00 80 00 00 48 c1 e9 03 83 e3 07 41 be 01 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 2c 01 eb 0c <f3> 90 41 83 ec 01 0f 84 72 04 00 00 41 0f b6 45 00 38 d8 7f 08 84 RSP: 0018:ffffc9000283f1b0 EFLAGS: 00000206 RAX: 0000000000000003 RBX: 0000000000000000 RCX: 1ffff1100fc0071e RDX: 0000000000000001 RSI: 0000000000000201 RDI: 0000000000000000 RBP: ffff88807e0038f0 R08: 0000000000000001 R09: ffffffff8ffbf9ff R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000004c1e R13: ffffed100fc0071e R14: 0000000000000001 R15: ffff8880b9c3aa80 FS: 00005555562bf300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffdbfef12b8 CR3: 00000000723c2000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:591 [inline] queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:51 [inline] queued_spin_lock include/asm-generic/qspinlock.h:85 [inline] do_raw_spin_lock+0x200/0x2b0 kernel/locking/spinlock_debug.c:115 spin_lock_bh include/linux/spinlock.h:354 [inline] sch_tree_lock include/net/sch_generic.h:610 [inline] sch_tree_lock include/net/sch_generic.h:605 [inline] prio_tune+0x3b9/0xb50 net/sched/sch_prio.c:211 prio_init+0x5c/0x80 net/sched/sch_prio.c:244 qdisc_create.constprop.0+0x44a/0x10f0 net/sched/sch_api.c:1253 tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5594 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f7ee98aae99 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffdbfef12d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ffdbfef1300 RCX: 00007f7ee98aae99 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d R10: 000000000000000d R11: 0000000000000246 R12: 00007ffdbfef12f0 R13: 00000000000f4240 R14: 000000000004ca47 R15: 00007ffdbfef12e4 </TASK> INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 2.293 msecs NMI backtrace for cpu 1 CPU: 1 PID: 3260 Comm: kworker/1:3 Not tainted 5.17.0-rc3-syzkaller-00149-gbf8e59fd315f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: mld mld_ifc_work Call Trace: <IRQ> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343 print_cpu_stall kernel/rcu/tree_stall.h:604 [inline] check_cpu_stall kernel/rcu/tree_stall.h:688 [inline] rcu_pending kernel/rcu/tree.c:3919 [inline] rcu_sched_clock_irq.cold+0x5c/0x759 kernel/rcu/tree.c:2617 update_process_times+0x16d/0x200 kernel/time/timer.c:1785 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 RIP: 0010:__sanitizer_cov_trace_const_cmp4+0xc/0x70 kernel/kcov.c:286 Code: 00 00 00 48 89 7c 30 e8 48 89 4c 30 f0 4c 89 54 d8 20 48 89 10 5b c3 0f 1f 80 00 00 00 00 41 89 f8 bf 03 00 00 00 4c 8b 14 24 <89> f1 65 48 8b 34 25 00 70 02 00 e8 14 f9 ff ff 84 c0 74 4b 48 8b RSP: 0018:ffffc90002c5eea8 EFLAGS: 00000246 RAX: 0000000000000007 RBX: ffff88801c625800 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: ffff8880137d3100 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff874fcd88 R11: 0000000000000000 R12: ffff88801d692dc0 R13: ffff8880137d3104 R14: 0000000000000000 R15: ffff88801d692de8 tcf_police_act+0x358/0x11d0 net/sched/act_police.c:256 tcf_action_exec net/sched/act_api.c:1049 [inline] tcf_action_exec+0x1a6/0x530 net/sched/act_api.c:1026 tcf_exts_exec include/net/pkt_cls.h:326 [inline] route4_classify+0xef0/0x1400 net/sched/cls_route.c:179 __tcf_classify net/sched/cls_api.c:1549 [inline] tcf_classify+0x3e8/0x9d0 net/sched/cls_api.c:1615 prio_classify net/sched/sch_prio.c:42 [inline] prio_enqueue+0x3a7/0x790 net/sched/sch_prio.c:75 dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3668 __dev_xmit_skb net/core/dev.c:3756 [inline] __dev_queue_xmit+0x1f61/0x3660 net/core/dev.c:4081 neigh_hh_output include/net/neighbour.h:533 [inline] neigh_output include/net/neighbour.h:547 [inline] ip_finish_output2+0x14dc/0x2170 net/ipv4/ip_output.c:228 __ip_finish_output net/ipv4/ip_output.c:306 [inline] __ip_finish_output+0x396/0x650 net/ipv4/ip_output.c:288 ip_finish_output+0x32/0x200 net/ipv4/ip_output.c:316 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip_output+0x196/0x310 net/ipv4/ip_output.c:430 dst_output include/net/dst.h:451 [inline] ip_local_out+0xaf/0x1a0 net/ipv4/ip_output.c:126 iptunnel_xmit+0x628/0xa50 net/ipv4/ip_tunnel_core.c:82 geneve_xmit_skb drivers/net/geneve.c:966 [inline] geneve_xmit+0x10c8/0x3530 drivers/net/geneve.c:1077 __netdev_start_xmit include/linux/netdevice.h:4683 [inline] netdev_start_xmit include/linux/netdevice.h:4697 [inline] xmit_one net/core/dev.c:3473 [inline] dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3489 __dev_queue_xmit+0x2985/0x3660 net/core/dev.c:4116 neigh_hh_output include/net/neighbour.h:533 [inline] neigh_output include/net/neighbour.h:547 [inline] ip6_finish_output2+0xf7a/0x14f0 net/ipv6/ip6_output.c:126 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline] __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170 ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201 NF_HOOK_COND include/linux/netfilter.h:296 [inline] ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224 dst_output include/net/dst.h:451 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] mld_sendpack+0x9a3/0xe40 net/ipv6/mcast.c:1826 mld_send_cr net/ipv6/mcast.c:2127 [inline] mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2659 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307 worker_thread+0x657/0x1110 kernel/workqueue.c:2454 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> ---------------- Code disassembly (best guess): 0: 48 89 eb mov %rbp,%rbx 3: c6 45 01 01 movb $0x1,0x1(%rbp) 7: 41 bc 00 80 00 00 mov $0x8000,%r12d d: 48 c1 e9 03 shr $0x3,%rcx 11: 83 e3 07 and $0x7,%ebx 14: 41 be 01 00 00 00 mov $0x1,%r14d 1a: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 21: fc ff df 24: 4c 8d 2c 01 lea (%rcx,%rax,1),%r13 28: eb 0c jmp 0x36 * 2a: f3 90 pause <-- trapping instruction 2c: 41 83 ec 01 sub $0x1,%r12d 30: 0f 84 72 04 00 00 je 0x4a8 36: 41 0f b6 45 00 movzbl 0x0(%r13),%eax 3b: 38 d8 cmp %bl,%al 3d: 7f 08 jg 0x47 3f: 84 .byte 0x84 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220215235305.3272331-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 04:45:55 UTC
c08e584 tipc: fix wrong notification node addresses The previous bug fix had an unfortunate side effect that broke distribution of binding table entries between nodes. The updated tipc_sock_addr struct is also used further down in the same function, and there the old value is still the correct one. Fixes: 032062f363b4 ("tipc: fix wrong publisher node address in link publications") Signed-off-by: Jon Maloy <jmaloy@redhat.com> Link: https://lore.kernel.org/r/20220216020009.3404578-1-jmaloy@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 04:44:40 UTC
8c6ae46 net: dsa: lantiq_gswip: fix use after free in gswip_remove() of_node_put(priv->ds->slave_mii_bus->dev.of_node) should be done before mdiobus_free(priv->ds->slave_mii_bus). Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Fixes: 0d120dfb5d67 ("net: dsa: lantiq_gswip: don't use devres for mdiobus") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/1644921768-26477-1-git-send-email-khoroshilov@ispras.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 04:39:13 UTC
0b0dff5 ipv6: per-netns exclusive flowlabel checks Ipv6 flowlabels historically require a reservation before use. Optionally in exclusive mode (e.g., user-private). Commit 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist") introduced a fastpath that avoids this check when no exclusive leases exist in the system, and thus any flowlabel use will be granted. That allows skipping the control operation to reserve a flowlabel entirely. Though with a warning if the fast path fails: This is an optimization. Robust applications still have to revert to requesting leases if the fast path fails due to an exclusive lease. Still, this is subtle. Better isolate network namespaces from each other. Flowlabels are per-netns. Also record per-netns whether exclusive leases are in use. Then behavior does not change based on activity in other netns. Changes v2 - wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist") Link: https://lore.kernel.org/netdev/MWHPR2201MB1072BCCCFCE779E4094837ACD0329@MWHPR2201MB1072.namprd22.prod.outlook.com/ Reported-by: Congyu Liu <liu3101@purdue.edu> Signed-off-by: Willem de Bruijn <willemb@google.com> Tested-by: Congyu Liu <liu3101@purdue.edu> Link: https://lore.kernel.org/r/20220215160037.1976072-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 04:37:47 UTC
c832962 net: bridge: multicast: notify switchdev driver whenever MC processing gets disabled Whenever bridge driver hits the max capacity of MDBs, it disables the MC processing (by setting corresponding bridge option), but never notifies switchdev about such change (the notifiers are called only upon explicit setting of this option, through the registered netlink interface). This could lead to situation when Software MDB processing gets disabled, but this event never gets offloaded to the underlying Hardware. Fix this by adding a notify message in such case. Fixes: 147c1e9b902c ("switchdev: bridge: Offload multicast disabled") Signed-off-by: Oleksandr Mazur <oleksandr.mazur@plvision.eu> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20220215165303.31908-1-oleksandr.mazur@plvision.eu Signed-off-by: Jakub Kicinski <kuba@kernel.org> 17 February 2022, 04:35:00 UTC
back to top