sort by:
Revision Author Date Message Commit Date
c189716 net/mlx5: Consider RoCE cap before init RDMA resources Check if RoCE is supported by the device before enable it in the vport context and create all the RDMA steering objects. Fixes: 80f09dfc237f ("net/mlx5: Eswitch, enable RoCE loopback traffic") Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 10 June 2021, 00:20:04 UTC
a3e5fd9 net/mlx5e: Fix page reclaim for dead peer hairpin When adding a hairpin flow, a firmware-side send queue is created for the peer net device, which claims some host memory pages for its internal ring buffer. If the peer net device is removed/unbound before the hairpin flow is deleted, then the send queue is not destroyed which leads to a stack trace on pci device remove: [ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource [ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110 [ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0 [ 748.002171] ------------[ cut here ]------------ [ 748.001177] FW pages counter is 4 after reclaiming all pages [ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core] [ +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core] [ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1 [ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core] [ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9 [ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286 [ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000 [ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51 [ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8 [ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30 [ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000 [ 748.001429] FS: 00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000 [ 748.001695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0 [ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 748.001654] Call Trace: [ 748.000576] ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core] [ 748.001416] ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core] [ 748.001354] ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core] [ 748.001203] mlx5_function_teardown+0x30/0x60 [mlx5_core] [ 748.001275] mlx5_uninit_one+0xa7/0xc0 [mlx5_core] [ 748.001200] remove_one+0x5f/0xc0 [mlx5_core] [ 748.001075] pci_device_remove+0x9f/0x1d0 [ 748.000833] device_release_driver_internal+0x1e0/0x490 [ 748.001207] unbind_store+0x19f/0x200 [ 748.000942] ? sysfs_file_ops+0x170/0x170 [ 748.001000] kernfs_fop_write_iter+0x2bc/0x450 [ 748.000970] new_sync_write+0x373/0x610 [ 748.001124] ? new_sync_read+0x600/0x600 [ 748.001057] ? lock_acquire+0x4d6/0x700 [ 748.000908] ? lockdep_hardirqs_on_prepare+0x400/0x400 [ 748.001126] ? fd_install+0x1c9/0x4d0 [ 748.000951] vfs_write+0x4d0/0x800 [ 748.000804] ksys_write+0xf9/0x1d0 [ 748.000868] ? __x64_sys_read+0xb0/0xb0 [ 748.000811] ? filp_open+0x50/0x50 [ 748.000919] ? syscall_enter_from_user_mode+0x1d/0x50 [ 748.001223] do_syscall_64+0x3f/0x80 [ 748.000892] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 748.001026] RIP: 0033:0x7f58bcfb22f7 [ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7 [ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003 [ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001 [ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d [ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700 [ 748.001564] irq event stamp: 0 [ 748.000787] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0 [ 748.001854] softirqs last enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0 [ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 748.001492] ---[ end trace a6fabd773d1c51ae ]--- Fix by destroying the send queue of a hairpin peer net device that is being removed/unbound, which returns the allocated ring buffer pages to the host. Fixes: 4d8fcf216c90 ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 10 June 2021, 00:20:03 UTC
8ad893e net/mlx5e: Remove dependency in IPsec initialization flows Currently, IPsec feature is disabled because mlx5e_build_nic_netdev is required to be called after mlx5e_ipsec_init. This requirement is invalid as mlx5e_build_nic_netdev and mlx5e_ipsec_init initialize independent resources. Remove ipsec pointer check in mlx5e_build_nic_netdev so that the two functions can be called at any order. Fixes: 547eede070eb ("net/mlx5e: IPSec, Innova IPSec offload infrastructure") Signed-off-by: Huy Nguyen <huyn@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 10 June 2021, 00:20:03 UTC
fb1a313 net/mlx5e: Fix use-after-free of encap entry in neigh update handler Function mlx5e_rep_neigh_update() wasn't updated to accommodate rtnl lock removal from TC filter update path and properly handle concurrent encap entry insertion/deletion which can lead to following use-after-free: [23827.464923] ================================================================== [23827.469446] BUG: KASAN: use-after-free in mlx5e_encap_take+0x72/0x140 [mlx5_core] [23827.470971] Read of size 4 at addr ffff8881d132228c by task kworker/u20:6/21635 [23827.472251] [23827.472615] CPU: 9 PID: 21635 Comm: kworker/u20:6 Not tainted 5.13.0-rc3+ #5 [23827.473788] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [23827.475639] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core] [23827.476731] Call Trace: [23827.477260] dump_stack+0xbb/0x107 [23827.477906] print_address_description.constprop.0+0x18/0x140 [23827.478896] ? mlx5e_encap_take+0x72/0x140 [mlx5_core] [23827.479879] ? mlx5e_encap_take+0x72/0x140 [mlx5_core] [23827.480905] kasan_report.cold+0x7c/0xd8 [23827.481701] ? mlx5e_encap_take+0x72/0x140 [mlx5_core] [23827.482744] kasan_check_range+0x145/0x1a0 [23827.493112] mlx5e_encap_take+0x72/0x140 [mlx5_core] [23827.494054] ? mlx5e_tc_tun_encap_info_equal_generic+0x140/0x140 [mlx5_core] [23827.495296] mlx5e_rep_neigh_update+0x41e/0x5e0 [mlx5_core] [23827.496338] ? mlx5e_rep_neigh_entry_release+0xb80/0xb80 [mlx5_core] [23827.497486] ? read_word_at_a_time+0xe/0x20 [23827.498250] ? strscpy+0xa0/0x2a0 [23827.498889] process_one_work+0x8ac/0x14e0 [23827.499638] ? lockdep_hardirqs_on_prepare+0x400/0x400 [23827.500537] ? pwq_dec_nr_in_flight+0x2c0/0x2c0 [23827.501359] ? rwlock_bug.part.0+0x90/0x90 [23827.502116] worker_thread+0x53b/0x1220 [23827.502831] ? process_one_work+0x14e0/0x14e0 [23827.503627] kthread+0x328/0x3f0 [23827.504254] ? _raw_spin_unlock_irq+0x24/0x40 [23827.505065] ? __kthread_bind_mask+0x90/0x90 [23827.505912] ret_from_fork+0x1f/0x30 [23827.506621] [23827.506987] Allocated by task 28248: [23827.507694] kasan_save_stack+0x1b/0x40 [23827.508476] __kasan_kmalloc+0x7c/0x90 [23827.509197] mlx5e_attach_encap+0xde1/0x1d40 [mlx5_core] [23827.510194] mlx5e_tc_add_fdb_flow+0x397/0xc40 [mlx5_core] [23827.511218] __mlx5e_add_fdb_flow+0x519/0xb30 [mlx5_core] [23827.512234] mlx5e_configure_flower+0x191c/0x4870 [mlx5_core] [23827.513298] tc_setup_cb_add+0x1d5/0x420 [23827.514023] fl_hw_replace_filter+0x382/0x6a0 [cls_flower] [23827.514975] fl_change+0x2ceb/0x4a51 [cls_flower] [23827.515821] tc_new_tfilter+0x89a/0x2070 [23827.516548] rtnetlink_rcv_msg+0x644/0x8c0 [23827.517300] netlink_rcv_skb+0x11d/0x340 [23827.518021] netlink_unicast+0x42b/0x700 [23827.518742] netlink_sendmsg+0x743/0xc20 [23827.519467] sock_sendmsg+0xb2/0xe0 [23827.520131] ____sys_sendmsg+0x590/0x770 [23827.520851] ___sys_sendmsg+0xd8/0x160 [23827.521552] __sys_sendmsg+0xb7/0x140 [23827.522238] do_syscall_64+0x3a/0x70 [23827.522907] entry_SYSCALL_64_after_hwframe+0x44/0xae [23827.523797] [23827.524163] Freed by task 25948: [23827.524780] kasan_save_stack+0x1b/0x40 [23827.525488] kasan_set_track+0x1c/0x30 [23827.526187] kasan_set_free_info+0x20/0x30 [23827.526968] __kasan_slab_free+0xed/0x130 [23827.527709] slab_free_freelist_hook+0xcf/0x1d0 [23827.528528] kmem_cache_free_bulk+0x33a/0x6e0 [23827.529317] kfree_rcu_work+0x55f/0xb70 [23827.530024] process_one_work+0x8ac/0x14e0 [23827.530770] worker_thread+0x53b/0x1220 [23827.531480] kthread+0x328/0x3f0 [23827.532114] ret_from_fork+0x1f/0x30 [23827.532785] [23827.533147] Last potentially related work creation: [23827.534007] kasan_save_stack+0x1b/0x40 [23827.534710] kasan_record_aux_stack+0xab/0xc0 [23827.535492] kvfree_call_rcu+0x31/0x7b0 [23827.536206] mlx5e_tc_del_fdb_flow+0x577/0xef0 [mlx5_core] [23827.537305] mlx5e_flow_put+0x49/0x80 [mlx5_core] [23827.538290] mlx5e_delete_flower+0x6d1/0xe60 [mlx5_core] [23827.539300] tc_setup_cb_destroy+0x18e/0x2f0 [23827.540144] fl_hw_destroy_filter+0x1d2/0x310 [cls_flower] [23827.541148] __fl_delete+0x4dc/0x660 [cls_flower] [23827.541985] fl_delete+0x97/0x160 [cls_flower] [23827.542782] tc_del_tfilter+0x7ab/0x13d0 [23827.543503] rtnetlink_rcv_msg+0x644/0x8c0 [23827.544257] netlink_rcv_skb+0x11d/0x340 [23827.544981] netlink_unicast+0x42b/0x700 [23827.545700] netlink_sendmsg+0x743/0xc20 [23827.546424] sock_sendmsg+0xb2/0xe0 [23827.547084] ____sys_sendmsg+0x590/0x770 [23827.547850] ___sys_sendmsg+0xd8/0x160 [23827.548606] __sys_sendmsg+0xb7/0x140 [23827.549303] do_syscall_64+0x3a/0x70 [23827.549969] entry_SYSCALL_64_after_hwframe+0x44/0xae [23827.550853] [23827.551217] The buggy address belongs to the object at ffff8881d1322200 [23827.551217] which belongs to the cache kmalloc-256 of size 256 [23827.553341] The buggy address is located 140 bytes inside of [23827.553341] 256-byte region [ffff8881d1322200, ffff8881d1322300) [23827.555747] The buggy address belongs to the page: [23827.556847] page:00000000898762aa refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d1320 [23827.558651] head:00000000898762aa order:2 compound_mapcount:0 compound_pincount:0 [23827.559961] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff) [23827.561243] raw: 002ffff800010200 dead000000000100 dead000000000122 ffff888100042b40 [23827.562653] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [23827.564112] page dumped because: kasan: bad access detected [23827.565439] [23827.565932] Memory state around the buggy address: [23827.566917] ffff8881d1322180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [23827.568485] ffff8881d1322200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [23827.569818] >ffff8881d1322280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [23827.571143] ^ [23827.571879] ffff8881d1322300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [23827.573283] ffff8881d1322380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [23827.574654] ================================================================== Most of the necessary logic is already correctly implemented by mlx5e_get_next_valid_encap() helper that is used in neigh stats update handler. Make the handler generic by renaming it to mlx5e_get_next_matching_encap() and use callback to test whether flow is matching instead of hardcoded check for 'valid' flag value. Implement mlx5e_get_next_valid_encap() by calling mlx5e_get_next_matching_encap() with callback that tests encap MLX5_ENCAP_ENTRY_VALID flag. Implement new mlx5e_get_next_init_encap() helper by calling mlx5e_get_next_matching_encap() with callback that tests encap completion result to be non-error and use it in mlx5e_rep_neigh_update() to safely iterate over nhe->encap_list. Remove encap completion logic from mlx5e_rep_update_flows() since the encap entries passed to this function are already guaranteed to be properly initialized by similar code in mlx5e_get_next_init_encap(). Fixes: 2a1f1768fa17 ("net/mlx5e: Refactor neigh update for concurrent execution") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 10 June 2021, 00:20:03 UTC
2bf8d2a net/mlx5e: Fix an error code in mlx5e_arfs_create_tables() When the code execute 'if (!priv->fs.arfs->wq)', the value of err is 0. So, we use -ENOMEM to indicate that the function create_singlethread_workqueue() return NULL. Clean up smatch warning: drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c:373 mlx5e_arfs_create_tables() warn: missing error code 'err'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: f6755b80d693 ("net/mlx5e: Dynamic alloc arfs table for netdev when needed") Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 10 June 2021, 00:20:02 UTC
13c62f5 net/sched: act_ct: handle DNAT tuple collision This this the counterpart of 8aa7b526dc0b ("openvswitch: handle DNAT tuple collision") for act_ct. From that commit changelog: """ With multiple DNAT rules it's possible that after destination translation the resulting tuples collide. ... Netfilter handles this case by allocating a null binding for SNAT at egress by default. Perform the same operation in openvswitch for DNAT if no explicit SNAT is requested by the user and allocate a null binding for SNAT for packets in the "original" direction. """ Fixes: 95219afbb980 ("act_ct: support asymmetric conntrack") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 09 June 2021, 22:34:51 UTC
d2e381c rtnetlink: Fix regression in bridge VLAN configuration Cited commit started returning errors when notification info is not filled by the bridge driver, resulting in the following regression: # ip link add name br1 type bridge vlan_filtering 1 # bridge vlan add dev br1 vid 555 self pvid untagged RTNETLINK answers: Invalid argument As long as the bridge driver does not fill notification info for the bridge device itself, an empty notification should not be considered as an error. This is explained in commit 59ccaaaa49b5 ("bridge: dont send notification when skb->len == 0 in rtnl_bridge_notify"). Fix by removing the error and add a comment to avoid future bugs. Fixes: a8db57c1d285 ("rtnetlink: Fix missing error code in rtnl_bridge_notify()") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> 09 June 2021, 21:58:26 UTC
93124d4 Merge tag 'mac80211-for-net-2021-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes berg says: ==================== A fair number of fixes: * fix more fallout from RTNL locking changes * fixes for some of the bugs found by syzbot * drop multicast fragments in mac80211 to align with the spec and what drivers are doing now * fix NULL-ptr deref in radiotap injection ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 09 June 2021, 21:46:21 UTC
a8b897c udp: fix race between close() and udp_abort() Kaustubh reported and diagnosed a panic in udp_lib_lookup(). The root cause is udp_abort() racing with close(). Both racing functions acquire the socket lock, but udp{v6}_destroy_sock() release it before performing destructive actions. We can't easily extend the socket lock scope to avoid the race, instead use the SOCK_DEAD flag to prevent udp_abort from doing any action when the critical race happens. Diagnosed-and-tested-by: Kaustubh Pandey <kapandey@codeaurora.org> Fixes: 5d77dca82839 ("net: diag: support SOCK_DESTROY for UDP sockets") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> 09 June 2021, 21:08:41 UTC
dcd01ee inet: annotate data race in inet_send_prepare() and inet_dgram_connect() Both functions are known to be racy when reading inet_num as we do not want to grab locks for the common case the socket has been bound already. The race is resolved in inet_autobind() by reading again inet_num under the socket lock. syzbot reported: BUG: KCSAN: data-race in inet_send_prepare / udp_lib_get_port write to 0xffff88812cba150e of 2 bytes by task 24135 on cpu 0: udp_lib_get_port+0x4b2/0xe20 net/ipv4/udp.c:308 udp_v6_get_port+0x5e/0x70 net/ipv6/udp.c:89 inet_autobind net/ipv4/af_inet.c:183 [inline] inet_send_prepare+0xd0/0x210 net/ipv4/af_inet.c:807 inet6_sendmsg+0x29/0x80 net/ipv6/af_inet6.c:639 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88812cba150e of 2 bytes by task 24132 on cpu 1: inet_send_prepare+0x21/0x210 net/ipv4/af_inet.c:806 inet6_sendmsg+0x29/0x80 net/ipv6/af_inet6.c:639 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516 do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0x0000 -> 0x9db4 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 24132 Comm: syz-executor.2 Not tainted 5.13.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> 09 June 2021, 20:59:53 UTC
80ec82e net: ethtool: clear heap allocations for ethtool function Several ethtool functions leave heap uncleared (potentially) by drivers. This will leave the unused portion of heap unchanged and might copy the full contents back to userspace. Signed-off-by: Austin Kim <austindh.kim@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 09 June 2021, 20:53:31 UTC
a979954 mac80211: drop multicast fragments These are not permitted by the spec, just drop them. Link: https://lore.kernel.org/r/20210609161305.23def022b750.Ibd6dd3cdce573dae262fcdc47f8ac52b883a9c50@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> 09 June 2021, 14:17:45 UTC
f5baf28 mac80211: move interface shutdown out of wiphy lock When reconfiguration fails, we shut down everything, but we cannot call cfg80211_shutdown_all_interfaces() with the wiphy mutex held. Since cfg80211 now calls it on resume errors, we only need to do likewise for where we call reconfig (whether directly or indirectly), but not under the wiphy lock. Cc: stable@vger.kernel.org Fixes: 2fe8ef106238 ("cfg80211: change netdev registration/unregistration semantics") Link: https://lore.kernel.org/r/20210608113226.78233c80f548.Iecc104aceb89f0568f50e9670a9cb191a1c8887b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> 09 June 2021, 14:09:21 UTC
65bec83 cfg80211: shut down interfaces on failed resume If resume fails, we should shut down all interfaces as the hardware is probably dead. This was/is already done now in mac80211, but we need to change that due to locking issues, so move it here and do it without the wiphy lock held. Cc: stable@vger.kernel.org Fixes: 2fe8ef106238 ("cfg80211: change netdev registration/unregistration semantics") Link: https://lore.kernel.org/r/20210608113226.d564ca69de7c.I2e3c3e5d410b72a4f63bade4fb075df041b3d92f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> 09 June 2021, 14:09:20 UTC
43076c1 cfg80211: fix phy80211 symlink creation When I moved around the code here, I neglected that we could still call register_netdev() or similar without the wiphy mutex held, which then calls cfg80211_register_wdev() - that's also done from cfg80211_register_netdevice(), but the phy80211 symlink creation was only there. Now, the symlink isn't needed for a *pure* wdev, but a netdev not registered via cfg80211_register_wdev() should still have the symlink, so move the creation to the right place. Cc: stable@vger.kernel.org Fixes: 2fe8ef106238 ("cfg80211: change netdev registration/unregistration semantics") Link: https://lore.kernel.org/r/20210608113226.a5dc4c1e488c.Ia42fe663cefe47b0883af78c98f284c5555bbe5d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> 09 June 2021, 14:09:20 UTC
adaed1b mac80211: fix 'reset' debugfs locking cfg80211 now calls suspend/resume with the wiphy lock held, and while there's a problem with that needing to be fixed, we should do the same in debugfs. Cc: stable@vger.kernel.org Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") Link: https://lore.kernel.org/r/20210608113226.14020430e449.I78e19db0a55a8295a376e15ac4cf77dbb4c6fb51@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> 09 June 2021, 14:09:18 UTC
f2386cf net: lantiq: disable interrupt before sheduling NAPI This patch fixes TX hangs with threaded NAPI enabled. The scheduled NAPI seems to be executed in parallel with the interrupt on second thread. Sometimes it happens that ltq_dma_disable_irq() is executed after xrx200_tx_housekeeping(). The symptom is that TX interrupts are disabled in the DMA controller. As a result, the TX hangs after a few seconds of the iperf test. Scheduling NAPI after disabling interrupts fixes this issue. Tested on Lantiq xRX200 (BT Home Hub 5A). Fixes: 9423361da523 ("net: lantiq: Disable IRQs only if NAPI gets scheduled ") Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Acked-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net> 09 June 2021, 02:16:32 UTC
504fd6a net: ena: fix DMA mapping function issues in XDP This patch fixes several bugs found when (DMA/LLQ) mapping a packet for transmission. The mapping procedure makes the transmitted packet accessible by the device. When using LLQ, this requires copying the packet's header to push header (which would be passed to LLQ) and creating DMA mapping for the payload (if the packet doesn't fit the maximum push length). When not using LLQ, we map the whole packet with DMA. The following bugs are fixed in the code: 1. Add support for non-LLQ machines: The ena_xdp_tx_map_frame() function assumed that LLQ is supported, and never mapped the whole packet using DMA. On some instances, which don't support LLQ, this causes loss of traffic. 2. Wrong DMA buffer length passed to device: When using LLQ, the first 'tx_max_header_size' bytes of the packet would be copied to push header. The rest of the packet would be copied to a DMA'd buffer. 3. Freeing the XDP buffer twice in case of a mapping error: In case a buffer DMA mapping fails, the function uses xdp_return_frame_rx_napi() to free the RX buffer and returns from the function with an error. XDP frames that fail to xmit get freed by the kernel and so there is no need for this call. Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shay Agroskin <shayagr@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> 08 June 2021, 23:41:02 UTC
1650bdb net: dsa: felix: re-enable TX flow control in ocelot_port_flush() Because flow control is set up statically in ocelot_init_port(), and not in phylink_mac_link_up(), what happens is that after the blamed commit, the flow control remains disabled after the port flushing procedure. Fixes: eb4733d7cffc ("net: dsa: felix: implement port flushing on .phylink_mac_link_down") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> 08 June 2021, 23:34:23 UTC
49bfcbf net: rds: fix memory leak in rds_recvmsg Syzbot reported memory leak in rds. The problem was in unputted refcount in case of error. int rds_recvmsg(struct socket *sock, struct msghdr *msg, size_t size, int msg_flags) { ... if (!rds_next_incoming(rs, &inc)) { ... } After this "if" inc refcount incremented and if (rds_cmsg_recv(inc, msg, rs)) { ret = -EFAULT; goto out; } ... out: return ret; } in case of rds_cmsg_recv() fail the refcount won't be decremented. And it's easy to see from ftrace log, that rds_inc_addref() don't have rds_inc_put() pair in rds_recvmsg() after rds_cmsg_recv() 1) | rds_recvmsg() { 1) 3.721 us | rds_inc_addref(); 1) 3.853 us | rds_message_inc_copy_to_user(); 1) + 10.395 us | rds_cmsg_recv(); 1) + 34.260 us | } Fixes: bdbe6fbc6a2f ("RDS: recv.c") Reported-and-tested-by: syzbot+5134cdf021c4ed5aaa5f@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> 08 June 2021, 23:32:17 UTC
df693f1 Merge tag 'batadv-net-pullrequest-20210608' of git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Avoid WARN_ON timing related checks, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 08 June 2021, 19:11:21 UTC
9bb392f vrf: fix maximum MTU My initial goal was to fix the default MTU, which is set to 65536, ie above the maximum defined in the driver: 65535 (ETH_MAX_MTU). In fact, it's seems more consistent, wrt min_mtu, to set the max_mtu to IP6_MAX_MTU (65535 + sizeof(struct ipv6hdr)) and use it by default. Let's also, for consistency, set the mtu in vrf_setup(). This function calls ether_setup(), which set the mtu to 1500. Thus, the whole mtu config is done in the same function. Before the patch: $ ip link add blue type vrf table 1234 $ ip link list blue 9: blue: <NOARP,MASTER> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether fa:f5:27:70:24:2a brd ff:ff:ff:ff:ff:ff $ ip link set dev blue mtu 65535 $ ip link set dev blue mtu 65536 Error: mtu greater than device maximum. Fixes: 5055376a3b44 ("net: vrf: Fix ping failed when vrf mtu is set to 0") CC: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 08 June 2021, 18:46:23 UTC
d439aa3 net: appletalk: fix the usage of preposition The preposition "for" should be changed to preposition "of". Signed-off-by: gushengxian <gushengxian@yulong.com> Signed-off-by: David S. Miller <davem@davemloft.net> 08 June 2021, 18:37:41 UTC
5ac6b19 net: ipv4: Remove unneed BUG() function When 'nla_parse_nested_deprecated' failed, it's no need to BUG() here, return -EINVAL is ok. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 08 June 2021, 18:36:48 UTC
d612c3f net: ipv4: fix memory leak in netlbl_cipsov4_add_std Reported by syzkaller: BUG: memory leak unreferenced object 0xffff888105df7000 (size 64): comm "syz-executor842", pid 360, jiffies 4294824824 (age 22.546s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000e67ed558>] kmalloc include/linux/slab.h:590 [inline] [<00000000e67ed558>] kzalloc include/linux/slab.h:720 [inline] [<00000000e67ed558>] netlbl_cipsov4_add_std net/netlabel/netlabel_cipso_v4.c:145 [inline] [<00000000e67ed558>] netlbl_cipsov4_add+0x390/0x2340 net/netlabel/netlabel_cipso_v4.c:416 [<0000000006040154>] genl_family_rcv_msg_doit.isra.0+0x20e/0x320 net/netlink/genetlink.c:739 [<00000000204d7a1c>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] [<00000000204d7a1c>] genl_rcv_msg+0x2bf/0x4f0 net/netlink/genetlink.c:800 [<00000000c0d6a995>] netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2504 [<00000000d78b9d2c>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 [<000000009733081b>] netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] [<000000009733081b>] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1340 [<00000000d5fd43b8>] netlink_sendmsg+0x789/0xc70 net/netlink/af_netlink.c:1929 [<000000000a2d1e40>] sock_sendmsg_nosec net/socket.c:654 [inline] [<000000000a2d1e40>] sock_sendmsg+0x139/0x170 net/socket.c:674 [<00000000321d1969>] ____sys_sendmsg+0x658/0x7d0 net/socket.c:2350 [<00000000964e16bc>] ___sys_sendmsg+0xf8/0x170 net/socket.c:2404 [<000000001615e288>] __sys_sendmsg+0xd3/0x190 net/socket.c:2433 [<000000004ee8b6a5>] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:47 [<00000000171c7cee>] entry_SYSCALL_64_after_hwframe+0x44/0xae The memory of doi_def->map.std pointing is allocated in netlbl_cipsov4_add_std, but no place has freed it. It should be freed in cipso_v4_doi_free which frees the cipso DOI resource. Fixes: 96cb8e3313c7a ("[NetLabel]: CIPSOv4 and Unlabeled packet integration") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Nanyong Sun <sunnanyong@huawei.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net> 08 June 2021, 18:36:04 UTC
d5befb2 mac80211: fix deadlock in AP/VLAN handling Syzbot reports that when you have AP_VLAN interfaces that are up and close the AP interface they belong to, we get a deadlock. No surprise - since we dev_close() them with the wiphy mutex held, which goes back into the netdev notifier in cfg80211 and tries to acquire the wiphy mutex there. To fix this, we need to do two things: 1) prevent changing iftype while AP_VLANs are up, we can't easily fix this case since cfg80211 already calls us with the wiphy mutex held, but change_interface() is relatively rare in drivers anyway, so changing iftype isn't used much (and userspace has to fall back to down/change/up anyway) 2) pull the dev_close() loop over VLANs out of the wiphy mutex section in the normal stop case Cc: stable@vger.kernel.org Reported-by: syzbot+452ea4fbbef700ff0a56@syzkaller.appspotmail.com Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") Link: https://lore.kernel.org/r/20210517160322.9b8f356c0222.I392cb0e2fa5a1a94cf2e637555d702c7e512c1ff@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com> 08 June 2021, 09:33:07 UTC
7a6b1ab neighbour: allow NUD_NOARP entries to be forced GCed IFF_POINTOPOINT interfaces use NUD_NOARP entries for IPv6. It's possible to fill up the neighbour table with enough entries that it will overflow for valid connections after that. This behaviour is more prevalent after commit 58956317c8de ("neighbor: Improve garbage collection") is applied, as it prevents removal from entries that are not NUD_FAILED, unless they are more than 5s old. Fixes: 58956317c8de (neighbor: Improve garbage collection) Reported-by: Kasper Dupont <kasperd@gjkwv.06.feb.2021.kasperd.net> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> 07 June 2021, 22:25:47 UTC
a47c397 revert "net: kcm: fix memory leak in kcm_sendmsg" In commit c47cc304990a ("net: kcm: fix memory leak in kcm_sendmsg") I misunderstood the root case of the memory leak and came up with completely broken fix. So, simply revert this commit to avoid GPF reported by syzbot. Im so sorry for this situation. Fixes: c47cc304990a ("net: kcm: fix memory leak in kcm_sendmsg") Reported-by: syzbot+65badd5e74ec62cb67dc@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 June 2021, 20:34:37 UTC
aaab307 Merge branch 'mlxsw-fixes' Merge branch 'mlxsw-fixes' Ido Schimmel says: ==================== mlxsw: Thermal and qdisc fixes Patches #1-#2 fix wrong validation of burst size in qdisc code and a user triggerable WARN_ON(). Patch #3 fixes a regression in thermal monitoring of transceiver modules and gearboxes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 07 June 2021, 20:12:08 UTC
2fd8d84 mlxsw: core: Set thermal zone polling delay argument to real value at init Thermal polling delay argument for modules and gearboxes thermal zones used to be initialized with zero value, while actual delay was used to be set by mlxsw_thermal_set_mode() by thermal operation callback set_mode(). After operations set_mode()/get_mode() have been removed by cited commits, modules and gearboxes thermal zones always have polling time set to zero and do not perform temperature monitoring. Set non-zero "polling_delay" in thermal_zone_device_register() routine, thus, the relevant thermal zones will perform thermal monitoring. Cc: Andrzej Pietrasiewicz <andrzej.p@collabora.com> Fixes: 5d7bd8aa7c35 ("thermal: Simplify or eliminate unnecessary set_mode() methods") Fixes: 1ee14820fd8e ("thermal: remove get_mode() operation of drivers") Signed-off-by: Mykola Kostenok <c_mykolak@nvidia.com> Acked-by: Vadim Pasternak <vadimp@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 June 2021, 20:11:41 UTC
d566ed0 mlxsw: spectrum_qdisc: Pass handle, not band number to find_class() In mlxsw Qdisc offload, find_class() is an operation that yields a qdisc offload descriptor given a parental qdisc descriptor and a class handle. In __mlxsw_sp_qdisc_ets_graft() however, a band number is passed to that function instead of a handle. This can lead to a trigger of a WARN_ON with the following splat: WARNING: CPU: 3 PID: 808 at drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c:1356 __mlxsw_sp_qdisc_ets_graft+0x115/0x130 [mlxsw_spectrum] [...] Call Trace: mlxsw_sp_setup_tc_prio+0xe3/0x100 [mlxsw_spectrum] qdisc_offload_graft_helper+0x35/0xa0 prio_graft+0x176/0x290 [sch_prio] qdisc_graft+0xb3/0x540 tc_modify_qdisc+0x56a/0x8a0 rtnetlink_rcv_msg+0x12c/0x370 netlink_rcv_skb+0x49/0xf0 netlink_unicast+0x1f6/0x2b0 netlink_sendmsg+0x1fb/0x410 ____sys_sendmsg+0x1f3/0x220 ___sys_sendmsg+0x70/0xb0 __sys_sendmsg+0x54/0xa0 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Since the parent handle is not passed with the offload information, compute it from the band number and qdisc handle. Fixes: 28052e618b04 ("mlxsw: spectrum_qdisc: Track children per qdisc") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 June 2021, 20:11:41 UTC
306b922 mlxsw: reg: Spectrum-3: Enforce lowest max-shaper burst size of 11 A max-shaper is the HW component responsible for delaying egress traffic above a configured transmission rate. Burst size is the amount of traffic that is allowed to pass without accounting. The burst size value needs to be such that it can be expressed as 2^BS * 512 bits, where BS lies in a certain ASIC-dependent range. mlxsw enforces that this holds before attempting to configure the shaper. The assumption for Spectrum-3 was that the lower limit of BS would be 5, like for Spectrum-1. But as of now, the limit is still 11. Therefore fix the driver accordingly, so that incorrect values are rejected early with a proper message. Fixes: 23effa2479ba ("mlxsw: reg: Add max_shaper_bs to QoS ETS Element Configuration") Reported-by: Maksym Yaremchuk <maksymy@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 June 2021, 20:11:41 UTC
51c96a5 ethtool: Fix NULL pointer dereference during module EEPROM dump When get_module_eeprom_by_page() is not implemented by the driver, NULL pointer dereference can occur [1]. Fix by testing if get_module_eeprom_by_page() is implemented instead of get_module_info(). [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] CPU: 0 PID: 251 Comm: ethtool Not tainted 5.13.0-rc3-custom-00940-g3822d0670c9d #989 Call Trace: eeprom_prepare_data+0x101/0x2d0 ethnl_default_doit+0xc2/0x290 genl_family_rcv_msg_doit+0xdc/0x140 genl_rcv_msg+0xd7/0x1d0 netlink_rcv_skb+0x49/0xf0 genl_rcv+0x1f/0x30 netlink_unicast+0x1f6/0x2c0 netlink_sendmsg+0x1f9/0x400 __sys_sendto+0xe1/0x130 __x64_sys_sendto+0x1b/0x20 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c97a31f66ebc ("ethtool: wire in generic SFP module access") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> 07 June 2021, 20:10:34 UTC
3822d06 cxgb4: avoid link re-train during TC-MQPRIO configuration When configuring TC-MQPRIO offload, only turn off netdev carrier and don't bring physical link down in hardware. Otherwise, when the physical link is brought up again after configuration, it gets re-trained and stalls ongoing traffic. Also, when firmware is no longer accessible or crashed, avoid sending FLOWC and waiting for reply that will never come. Fix following hung_task_timeout_secs trace seen in these cases. INFO: task tc:20807 blocked for more than 122 seconds. Tainted: G S 5.13.0-rc3+ #122 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:tc state:D stack:14768 pid:20807 ppid: 19366 flags:0x00000000 Call Trace: __schedule+0x27b/0x6a0 schedule+0x37/0xa0 schedule_preempt_disabled+0x5/0x10 __mutex_lock.isra.14+0x2a0/0x4a0 ? netlink_lookup+0x120/0x1a0 ? rtnl_fill_ifinfo+0x10f0/0x10f0 __netlink_dump_start+0x70/0x250 rtnetlink_rcv_msg+0x28b/0x380 ? rtnl_fill_ifinfo+0x10f0/0x10f0 ? rtnl_calcit.isra.42+0x120/0x120 netlink_rcv_skb+0x4b/0xf0 netlink_unicast+0x1a0/0x280 netlink_sendmsg+0x216/0x440 sock_sendmsg+0x56/0x60 __sys_sendto+0xe9/0x150 ? handle_mm_fault+0x6d/0x1b0 ? do_user_addr_fault+0x1c5/0x620 __x64_sys_sendto+0x1f/0x30 do_syscall_64+0x3c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f7f73218321 RSP: 002b:00007ffd19626208 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 000055b7c0a8b240 RCX: 00007f7f73218321 RDX: 0000000000000028 RSI: 00007ffd19626210 RDI: 0000000000000003 RBP: 000055b7c08680ff R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000055b7c085f5f6 R13: 000055b7c085f60a R14: 00007ffd19636470 R15: 00007ffd196262a0 Fixes: b1396c2bd675 ("cxgb4: parse and configure TC-MQPRIO offload") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:45:13 UTC
944d671 sch_htb: fix refcount leak in htb_parent_to_leaf_offload The commit ae81feb7338c ("sch_htb: fix null pointer dereference on a null new_q") fixes a NULL pointer dereference bug, but it is not correct. Because htb_graft_helper properly handles the case when new_q is NULL, and after the previous patch by skipping this call which creates an inconsistency : dev_queue->qdisc will still point to the old qdisc, but cl->parent->leaf.q will point to the new one (which will be noop_qdisc, because new_q was NULL). The code is based on an assumption that these two pointers are the same, so it can lead to refcount leaks. The correct fix is to add a NULL pointer check to protect qdisc_refcount_inc inside htb_parent_to_leaf_offload. Fixes: ae81feb7338c ("sch_htb: fix null pointer dereference on a null new_q") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Suggested-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:44:18 UTC
26821ec Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2021-06-04 This series contains updates to virtchnl header file and ice driver. Brett fixes VF being unable to request a different number of queues then allocated and adds clearing of VF_MBX_ATQLEN register for VF reset. Haiyue handles error of rebuilding VF VSI during reset. Paul fixes reporting of autoneg to use the PHY capabilities. Dave allows LLDP packets without priority of TC_PRIO_CONTROL to be transmitted. Geert Uytterhoeven adds explicit padding to virtchnl_proto_hdrs structure in the virtchnl header file. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:27:07 UTC
6fd815b Merge branch 'wireguard-fixes' Jason A. Donenfeld says: ==================== wireguard fixes for 5.13-rc5 Here are bug fixes to WireGuard for 5.13-rc5: 1-2,6) These are small, trivial tweaks to our test harness. 3) Linus thinks -O3 is still dangerous to enable. The code gen wasn't so much different with -O2 either. 4) We were accidentally calling synchronize_rcu instead of synchronize_net while holding the rtnl_lock, resulting in some rather large stalls that hit production machines. 5) Peer allocation was wasting literally hundreds of megabytes on real world deployments, due to oddly sized large objects not fitting nicely into a kmalloc slab. 7-9) We move from an insanely expensive O(n) algorithm to a fast O(1) algorithm, and cleanup a massive memory leak in the process, in which allowed ips churn would leave danging nodes hanging around without cleanup until the interface was removed. The O(1) algorithm eliminates packet stalls and high latency issues, in addition to bringing operations that took as much as 10 minutes down to less than a second. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
bf7b042 wireguard: allowedips: free empty intermediate nodes when removing single node When removing single nodes, it's possible that that node's parent is an empty intermediate node, in which case, it too should be removed. Otherwise the trie fills up and never is fully emptied, leading to gradual memory leaks over time for tries that are modified often. There was originally code to do this, but was removed during refactoring in 2016 and never reworked. Now that we have proper parent pointers from the previous commits, we can implement this properly. In order to reduce branching and expensive comparisons, we want to keep the double pointer for parent assignment (which lets us easily chain up to the root), but we still need to actually get the parent's base address. So encode the bit number into the last two bits of the pointer, and pack and unpack it as needed. This is a little bit clumsy but is the fastest and less memory wasteful of the compromises. Note that we align the root struct here to a minimum of 4, because it's embedded into a larger struct, and we're relying on having the bottom two bits for our flag, which would only be 16-bit aligned on m68k. The existing macro-based helpers were a bit unwieldy for adding the bit packing to, so this commit replaces them with safer and clearer ordinary functions. We add a test to the randomized/fuzzer part of the selftests, to free the randomized tries by-peer, refuzz it, and repeat, until it's supposed to be empty, and then then see if that actually resulted in the whole thing being emptied. That combined with kmemcheck should hopefully make sure this commit is doing what it should. Along the way this resulted in various other cleanups of the tests and fixes for recent graphviz. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
dc680de wireguard: allowedips: allocate nodes in kmem_cache The previous commit moved from O(n) to O(1) for removal, but in the process introduced an additional pointer member to a struct that increased the size from 60 to 68 bytes, putting nodes in the 128-byte slab. With deployed systems having as many as 2 million nodes, this represents a significant doubling in memory usage (128 MiB -> 256 MiB). Fix this by using our own kmem_cache, that's sized exactly right. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Matthew Wilcox <willy@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
f634f41 wireguard: allowedips: remove nodes in O(1) Previously, deleting peers would require traversing the entire trie in order to rebalance nodes and safely free them. This meant that removing 1000 peers from a trie with a half million nodes would take an extremely long time, during which we're holding the rtnl lock. Large-scale users were reporting 200ms latencies added to the networking stack as a whole every time their userspace software would queue up significant removals. That's a serious situation. This commit fixes that by maintaining a double pointer to the parent's bit pointer for each node, and then using the already existing node list belonging to each peer to go directly to the node, fix up its pointers, and free it with RCU. This means removal is O(1) instead of O(n), and we don't use gobs of stack. The removal algorithm has the same downside as the code that it fixes: it won't collapse needlessly long runs of fillers. We can enhance that in the future if it ever becomes a problem. This commit documents that limitation with a TODO comment in code, a small but meaningful improvement over the prior situation. Currently the biggest flaw, which the next commit addresses, is that because this increases the node size on 64-bit machines from 60 bytes to 68 bytes. 60 rounds up to 64, but 68 rounds up to 128. So we wind up using twice as much memory per node, because of power-of-two allocations, which is a big bummer. We'll need to figure something out there. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
46cfe8e wireguard: allowedips: initialize list head in selftest The randomized trie tests weren't initializing the dummy peer list head, resulting in a NULL pointer dereference when used. Fix this by initializing it in the randomized trie test, just like we do for the static unit test. While we're at it, all of the other strings like this have the word "self-test", so add it to the missing place here. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
a4e9f8e wireguard: peer: allocate in kmem_cache With deployments having upwards of 600k peers now, this somewhat heavy structure could benefit from more fine-grained allocations. Specifically, instead of using a 2048-byte slab for a 1544-byte object, we can now use 1544-byte objects directly, thus saving almost 25% per-peer, or with 600k peers, that's a savings of 303 MiB. This also makes wireguard's memory usage more transparent in tools like slabtop and /proc/slabinfo. Fixes: 8b5553ace83c ("wireguard: queueing: get rid of per-peer ring buffers") Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Matthew Wilcox <willy@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
24b70ee wireguard: use synchronize_net rather than synchronize_rcu Many of the synchronization points are sometimes called under the rtnl lock, which means we should use synchronize_net rather than synchronize_rcu. Under the hood, this expands to using the expedited flavor of function in the event that rtnl is held, in order to not stall other concurrent changes. This fixes some very, very long delays when removing multiple peers at once, which would cause some operations to take several minutes. Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
cc5060c wireguard: do not use -O3 Apparently, various versions of gcc have O3-related miscompiles. Looking at the difference between -O2 and -O3 for gcc 11 doesn't indicate miscompiles, but the difference also doesn't seem so significant for performance that it's worth risking. Link: https://lore.kernel.org/lkml/CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKWTTsA@mail.gmail.com/ Link: https://lore.kernel.org/lkml/CAHmME9otB5Wwxp7H8bR_i2uH2esEMvoBMC8uEXBMH9p0q1s6Bw@mail.gmail.com/ Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
f8873d1 wireguard: selftests: make sure rp_filter is disabled on vethc Some distros may enable strict rp_filter by default, which will prevent vethc from receiving the packets with an unrouteable reverse path address. Reported-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
acf2492 wireguard: selftests: remove old conntrack kconfig value On recent kernels, this config symbol is no longer used. Reported-by: Rui Salvaterra <rsalvaterra@gmail.com> Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net> 04 June 2021, 21:25:14 UTC
519d8ab virtchnl: Add missing padding to virtchnl_proto_hdrs On m68k (Coldfire M547x): CC drivers/net/ethernet/intel/i40e/i40e_main.o In file included from drivers/net/ethernet/intel/i40e/i40e_prototype.h:9, from drivers/net/ethernet/intel/i40e/i40e.h:41, from drivers/net/ethernet/intel/i40e/i40e_main.c:12: include/linux/avf/virtchnl.h:153:36: warning: division by zero [-Wdiv-by-zero] 153 | { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) } | ^ include/linux/avf/virtchnl.h:844:1: note: in expansion of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’ 844 | VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs); | ^~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/avf/virtchnl.h:844:33: error: enumerator value for ‘virtchnl_static_assert_virtchnl_proto_hdrs’ is not an integer constant 844 | VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs); | ^~~~~~~~~~~~~~~~~~~ On m68k, integers are aligned on addresses that are multiples of two, not four, bytes. Hence the size of a structure containing integers may not be divisible by 4. Fix this by adding explicit padding. Fixes: 1f7ea1cd6a374842 ("ice: Enable FDIR Configure for AVF") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 04 June 2021, 14:37:49 UTC
f9f8320 ice: Allow all LLDP packets from PF to Tx Currently in the ice driver, the check whether to allow a LLDP packet to egress the interface from the PF_VSI is being based on the SKB's priority field. It checks to see if the packets priority is equal to TC_PRIO_CONTROL. Injected LLDP packets do not always meet this condition. SCAPY defaults to a sk_buff->protocol value of ETH_P_ALL (0x0003) and does not set the priority field. There will be other injection methods (even ones used by end users) that will not correctly configure the socket so that SKB fields are correctly populated. Then ethernet header has to have to correct value for the protocol though. Add a check to also allow packets whose ethhdr->h_proto matches ETH_P_LLDP (0x88CC). Fixes: 0c3a6101ff2d ("ice: Allow egress control packets from PF_VSI") Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 04 June 2021, 14:37:48 UTC
5cd349c ice: report supported and advertised autoneg using PHY capabilities Ethtool incorrectly reported supported and advertised auto-negotiation settings for a backplane PHY image which did not support auto-negotiation. This can occur when using media or PHY type for reporting ethtool supported and advertised auto-negotiation settings. Remove setting supported and advertised auto-negotiation settings based on PHY type in ice_phy_type_to_ethtool(), and MAC type in ice_get_link_ksettings(). Ethtool supported and advertised auto-negotiation settings should be based on the PHY image using the AQ command get PHY capabilities with media. Add setting supported and advertised auto-negotiation settings based get PHY capabilities with media in ice_get_link_ksettings(). Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 04 June 2021, 14:37:48 UTC
c7ee6ce ice: handle the VF VSI rebuild failure VSI rebuild can be failed for LAN queue config, then the VF's VSI will be NULL, the VF reset should be stopped with the VF entering into the disable state. Fixes: 12bb018c538c ("ice: Refactor VF reset") Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 04 June 2021, 14:37:48 UTC
8679f07 ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared Some AVF drivers expect the VF_MBX_ATQLEN register to be cleared for any type of VFR/VFLR. Fix this by clearing the VF_MBX_ATQLEN register at the same time as VF_MBX_ARQLEN. Fixes: 82ba01282cf8 ("ice: clear VF ARQLEN register on reset") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 04 June 2021, 14:37:48 UTC
f045769 ice: Fix allowing VF to request more/less queues via virtchnl Commit 12bb018c538c ("ice: Refactor VF reset") caused a regression that removes the ability for a VF to request a different amount of queues via VIRTCHNL_OP_REQUEST_QUEUES. This prevents VF drivers to either increase or decrease the number of queue pairs they are allocated. Fix this by using the variable vf->num_req_qs when determining the vf->num_vf_qs during VF VSI creation. Fixes: 12bb018c538c ("ice: Refactor VF reset") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 04 June 2021, 14:37:48 UTC
579028d Merge tag 'for-net-2021-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth bluetooth pull request for net: - Fixes UAF and CVE-2021-3564 - Fix VIRTIO_ID_BT to use an unassigned ID - Fix firmware loading on some Intel Controllers Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:32:21 UTC
1a80242 virtio-net: fix for skb_over_panic inside big mode In virtio-net's large packet mode, there is a hole in the space behind buf. hdr_padded_len - hdr_len We must take this into account when calculating tailroom. [ 44.544385] skb_put.cold (net/core/skbuff.c:5254 (discriminator 1) net/core/skbuff.c:5252 (discriminator 1)) [ 44.544864] page_to_skb (drivers/net/virtio_net.c:485) [ 44.545361] receive_buf (drivers/net/virtio_net.c:849 drivers/net/virtio_net.c:1131) [ 44.545870] ? netif_receive_skb_list_internal (net/core/dev.c:5714) [ 44.546628] ? dev_gro_receive (net/core/dev.c:6103) [ 44.547135] ? napi_complete_done (./include/linux/list.h:35 net/core/dev.c:5867 net/core/dev.c:5862 net/core/dev.c:6565) [ 44.547672] virtnet_poll (drivers/net/virtio_net.c:1427 drivers/net/virtio_net.c:1525) [ 44.548251] __napi_poll (net/core/dev.c:6985) [ 44.548744] net_rx_action (net/core/dev.c:7054 net/core/dev.c:7139) [ 44.549264] __do_softirq (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:560) [ 44.549762] irq_exit_rcu (kernel/softirq.c:433 kernel/softirq.c:637 kernel/softirq.c:649) [ 44.551384] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 13)) [ 44.551991] ? asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) [ 44.552654] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) Fixes: fb32856b16ad ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: Corentin Noël <corentin.noel@collabora.com> Tested-by: Corentin Noël <corentin.noel@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:29:04 UTC
e31d57c Merge tag 'ieee802154-for-davem-2021-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan Stefan Schmidt says: ==================== An update from ieee802154 for your *net* tree. This time we have fixes for the ieee802154 netlink code, as well as a driver fix. Zhen Lei, Wei Yongjun and Yang Li each had a patch to cleanup some return code handling ensuring we actually get a real error code when things fails. Dan Robertson fixed a potential null dereference in our netlink handling. Andy Shevchenko removed of_match_ptr()usage in the mrf24j40 driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:21:58 UTC
821bbf7 ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions Reported by syzbot: HEAD commit: 90c911ad Merge tag 'fixes' of git://git.kernel.org/pub/scm.. git tree: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master dashboard link: https://syzkaller.appspot.com/bug?extid=123aa35098fd3c000eb7 compiler: Debian clang version 11.0.1-2 ================================================================== BUG: KASAN: slab-out-of-bounds in fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline] BUG: KASAN: slab-out-of-bounds in fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732 Read of size 8 at addr ffff8880145c78f8 by task syz-executor.4/17760 CPU: 0 PID: 17760 Comm: syz-executor.4 Not tainted 5.12.0-rc8-syzkaller #0 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x202/0x31e lib/dump_stack.c:120 print_address_description+0x5f/0x3b0 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report+0x15c/0x200 mm/kasan/report.c:416 fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline] fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732 fib6_nh_release+0x9a/0x430 net/ipv6/route.c:3536 fib6_info_destroy_rcu+0xcb/0x1c0 net/ipv6/ip6_fib.c:174 rcu_do_batch kernel/rcu/tree.c:2559 [inline] rcu_core+0x8f6/0x1450 kernel/rcu/tree.c:2794 __do_softirq+0x372/0x7a6 kernel/softirq.c:345 invoke_softirq kernel/softirq.c:221 [inline] __irq_exit_rcu+0x22c/0x260 kernel/softirq.c:422 irq_exit_rcu+0x5/0x20 kernel/softirq.c:434 sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1100 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632 RIP: 0010:lock_acquire+0x1f6/0x720 kernel/locking/lockdep.c:5515 Code: f6 84 24 a1 00 00 00 02 0f 85 8d 02 00 00 f7 c3 00 02 00 00 49 bd 00 00 00 00 00 fc ff df 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 3d 00 00 00 00 00 4b c7 44 3d 09 00 00 00 00 43 c7 44 3d RSP: 0018:ffffc90009e06560 EFLAGS: 00000206 RAX: 1ffff920013c0cc0 RBX: 0000000000000246 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffc90009e066e0 R08: dffffc0000000000 R09: fffffbfff1f992b1 R10: fffffbfff1f992b1 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000000 R15: 1ffff920013c0cb4 rcu_lock_acquire+0x2a/0x30 include/linux/rcupdate.h:267 rcu_read_lock include/linux/rcupdate.h:656 [inline] ext4_get_group_info+0xea/0x340 fs/ext4/ext4.h:3231 ext4_mb_prefetch+0x123/0x5d0 fs/ext4/mballoc.c:2212 ext4_mb_regular_allocator+0x8a5/0x28f0 fs/ext4/mballoc.c:2379 ext4_mb_new_blocks+0xc6e/0x24f0 fs/ext4/mballoc.c:4982 ext4_ext_map_blocks+0x2be3/0x7210 fs/ext4/extents.c:4238 ext4_map_blocks+0xab3/0x1cb0 fs/ext4/inode.c:638 ext4_getblk+0x187/0x6c0 fs/ext4/inode.c:848 ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:900 ext4_append+0x1a4/0x360 fs/ext4/namei.c:67 ext4_init_new_dir+0x337/0xa10 fs/ext4/namei.c:2768 ext4_mkdir+0x4b8/0xc00 fs/ext4/namei.c:2814 vfs_mkdir+0x45b/0x640 fs/namei.c:3819 ovl_do_mkdir fs/overlayfs/overlayfs.h:161 [inline] ovl_mkdir_real+0x53/0x1a0 fs/overlayfs/dir.c:146 ovl_create_real+0x280/0x490 fs/overlayfs/dir.c:193 ovl_workdir_create+0x425/0x600 fs/overlayfs/super.c:788 ovl_make_workdir+0xed/0x1140 fs/overlayfs/super.c:1355 ovl_get_workdir fs/overlayfs/super.c:1492 [inline] ovl_fill_super+0x39ee/0x5370 fs/overlayfs/super.c:2035 mount_nodev+0x52/0xe0 fs/super.c:1413 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1497 do_new_mount fs/namespace.c:2903 [inline] path_mount+0x196f/0x2be0 fs/namespace.c:3233 do_mount fs/namespace.c:3246 [inline] __do_sys_mount fs/namespace.c:3454 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3431 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665f9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f68f2b87188 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 00000000004665f9 RDX: 00000000200000c0 RSI: 0000000020000000 RDI: 000000000040000a RBP: 00000000004bfbb9 R08: 0000000020000100 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60 R13: 00007ffe19002dff R14: 00007f68f2b87300 R15: 0000000000022000 Allocated by task 17768: kasan_save_stack mm/kasan/common.c:38 [inline] kasan_set_track mm/kasan/common.c:46 [inline] set_alloc_info mm/kasan/common.c:427 [inline] ____kasan_kmalloc+0xc2/0xf0 mm/kasan/common.c:506 kasan_kmalloc include/linux/kasan.h:233 [inline] __kmalloc+0xb4/0x380 mm/slub.c:4055 kmalloc include/linux/slab.h:559 [inline] kzalloc include/linux/slab.h:684 [inline] fib6_info_alloc+0x2c/0xd0 net/ipv6/ip6_fib.c:154 ip6_route_info_create+0x55d/0x1a10 net/ipv6/route.c:3638 ip6_route_add+0x22/0x120 net/ipv6/route.c:3728 inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352 rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmsg+0x319/0x400 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Last potentially related work creation: kasan_save_stack+0x27/0x50 mm/kasan/common.c:38 kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345 __call_rcu kernel/rcu/tree.c:3039 [inline] call_rcu+0x1b1/0xa30 kernel/rcu/tree.c:3114 fib6_info_release include/net/ip6_fib.h:337 [inline] ip6_route_info_create+0x10c4/0x1a10 net/ipv6/route.c:3718 ip6_route_add+0x22/0x120 net/ipv6/route.c:3728 inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352 rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg net/socket.c:674 [inline] ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350 ___sys_sendmsg net/socket.c:2404 [inline] __sys_sendmsg+0x319/0x400 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae Second to last potentially related work creation: kasan_save_stack+0x27/0x50 mm/kasan/common.c:38 kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345 insert_work+0x54/0x400 kernel/workqueue.c:1331 __queue_work+0x981/0xcc0 kernel/workqueue.c:1497 queue_work_on+0x111/0x200 kernel/workqueue.c:1524 queue_work include/linux/workqueue.h:507 [inline] call_usermodehelper_exec+0x283/0x470 kernel/umh.c:433 kobject_uevent_env+0x1349/0x1730 lib/kobject_uevent.c:617 kvm_uevent_notify_change+0x309/0x3b0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4809 kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:877 [inline] kvm_put_kvm+0x9c/0xd10 arch/x86/kvm/../../../virt/kvm/kvm_main.c:920 kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3120 __fput+0x352/0x7b0 fs/file_table.c:280 task_work_run+0x146/0x1c0 kernel/task_work.c:140 tracehook_notify_resume include/linux/tracehook.h:189 [inline] exit_to_user_mode_loop kernel/entry/common.c:174 [inline] exit_to_user_mode_prepare+0x10b/0x1e0 kernel/entry/common.c:208 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline] syscall_exit_to_user_mode+0x26/0x70 kernel/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff8880145c7800 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 56 bytes to the right of 192-byte region [ffff8880145c7800, ffff8880145c78c0) The buggy address belongs to the page: page:ffffea00005171c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x145c7 flags: 0xfff00000000200(slab) raw: 00fff00000000200 ffffea00006474c0 0000000200000002 ffff888010c41a00 raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880145c7780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880145c7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880145c7880: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff8880145c7900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880145c7980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ================================================================== In the ip6_route_info_create function, in the case that the nh pointer is not NULL, the fib6_nh in fib6_info has not been allocated. Therefore, when trying to free fib6_info in this error case using fib6_info_release, the function will call fib6_info_destroy_rcu, which it will access fib6_nh_release(f6i->fib6_nh); However, f6i->fib6_nh doesn't have any refcount yet given the lack of allocation causing the reported memory issue above. Therefore, releasing the empty pointer directly instead would be the solution. Fixes: f88d8ea67fbdb ("ipv6: Plumb support for nexthop object in a fib6_info") Fixes: 706ec91916462 ("ipv6: Fix nexthop refcnt leak when creating ipv6 route info") Signed-off-by: Coco Li <lixiaoyan@google.com> Cc: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:19:49 UTC
5e7a2c6 Merge tag 'wireless-drivers-2021-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for v5.13 We have only mt76 fixes this time, most important being the fix for A-MSDU injection attacks. mt76 * mitigate A-MSDU injection attacks (CVE-2020-24588) * fix possible array out of bound access in mt7921_mcu_tx_rate_report * various aggregation and HE setting fixes * suspend/resume fix for pci devices * mt7615: fix crash when runtime-pm is not supported ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:17:33 UTC
5960786 fib: Return the correct errno code When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:13:56 UTC
49251cd net: Return the correct errno code When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:13:56 UTC
d773695 net/x25: Return the correct errno code When kalloc or kmemdup failed, should return ENOMEM rather than ENOBUF. Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:13:56 UTC
a27fb31 cxgb4: fix regression with HASH tc prio value update commit db43b30cd89c ("cxgb4: add ethtool n-tuple filter deletion") has moved searching for next highest priority HASH filter rule to cxgb4_flow_rule_destroy(), which searches the rhashtable before the the rule is removed from it and hence always finds at least 1 entry. Fix by removing the rule from rhashtable first before calling cxgb4_flow_rule_destroy() and hence avoid fetching stale info. Fixes: db43b30cd89c ("cxgb4: add ethtool n-tuple filter deletion") Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:12:42 UTC
e031018 Merge branch 'caif-fixes' Pavel Skripkin says: ==================== This patch series fix 2 memory leaks in caif interface. Syzbot reported memory leak in cfserl_create(). The problem was in cfcnfg_add_phy_layer() function. This function accepts struct cflayer *link_support and assign it to corresponting structures, but it can fail in some cases. These cases must be handled to prevent leaking allocated struct cflayer *link_support pointer, because if error accured before assigning link_support pointer to somewhere, this pointer must be freed. Fail log: [ 49.051872][ T7010] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6) [ 49.110236][ T7042] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6) [ 49.134936][ T7045] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6) [ 49.163083][ T7043] caif:cfcnfg_add_phy_layer(): Too many CAIF Link Layers (max 6) [ 55.248950][ T6994] kmemleak: 4 new suspected memory leaks (see /sys/kernel/debug/kmemleak) int cfcnfg_add_phy_layer(..., struct cflayer *link_support, ...) { ... /* CAIF protocol allow maximum 6 link-layers */ for (i = 0; i < 7; i++) { phyid = (dev->ifindex + i) & 0x7; if (phyid == 0) continue; if (cfcnfg_get_phyinfo_rcu(cnfg, phyid) == NULL) goto got_phyid; } pr_warn("Too many CAIF Link Layers (max 6)\n"); goto out; ... if (link_support != NULL) { link_support->id = phyid; layer_set_dn(frml, link_support); layer_set_up(link_support, frml); layer_set_dn(link_support, phy_layer); layer_set_up(phy_layer, link_support); } ... } As you can see, if cfcnfg_add_phy_layer fails before layer_set_*, link_support becomes leaked. So, in this series, I made cfcnfg_add_phy_layer() return an int and added error handling code to prevent leaking link_support pointer in caif_device_notify() and cfusbl_device_notify() functions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:05:07 UTC
7f5d866 net: caif: fix memory leak in cfusbl_device_notify In case of caif_enroll_dev() fail, allocated link_support won't be assigned to the corresponding structure. So simply free allocated pointer in case of error. Fixes: 7ad65bf68d70 ("caif: Add support for CAIF over CDC NCM USB interface") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:05:07 UTC
b53558a net: caif: fix memory leak in caif_device_notify In case of caif_enroll_dev() fail, allocated link_support won't be assigned to the corresponding structure. So simply free allocated pointer in case of error Fixes: 7c18d2205ea7 ("caif: Restructure how link caif link layer enroll") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+7ec324747ce876a29db6@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:05:07 UTC
a2805dc net: caif: add proper error handling caif_enroll_dev() can fail in some cases. Ingnoring these cases can lead to memory leak due to not assigning link_support pointer to anywhere. Fixes: 7c18d2205ea7 ("caif: Restructure how link caif link layer enroll") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:05:06 UTC
bce130e net: caif: added cfserl_release function Added cfserl_release() function. Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:05:06 UTC
4189777 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== This series contains updates to igb, igc, ixgbe, ixgbevf, i40e and ice drivers. Kurt Kanzenbach fixes XDP for igb when PTP is enabled by pulling the timestamp and adjusting appropriate values prior to XDP operations. Magnus adds missing exception tracing for XDP on igb, igc, ixgbe, ixgbevf, i40e and ice drivers. Maciej adds tracking of AF_XDP zero copy enabled queues to resolve an issue with copy mode Tx for the ice driver. Note: Patch 7 will conflict when merged with net-next. Please carry these changes forward. IGC_XDP_TX and IGC_XDP_REDIRECT will need to be changed to return to conform with the net-next changes. Let me know if you have issues. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 22:02:55 UTC
86b8406 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2021-06-02 The following pull-request contains BPF updates for your *net* tree. We've added 2 non-merge commits during the last 7 day(s) which contain a total of 4 files changed, 19 insertions(+), 24 deletions(-). The main changes are: 1) Fix pahole BTF generation when ccache is used, from Javier Martinez Canillas. 2) Fix BPF lockdown hooks in bpf_probe_read_kernel{,_str}() helpers which caused a deadlock from bcc programs, triggered OOM killer from audit side and didn't work generally with SELinux policy rules due to pointing to wrong task struct, from Daniel Borkmann. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 21:17:42 UTC
c47cc30 net: kcm: fix memory leak in kcm_sendmsg Syzbot reported memory leak in kcm_sendmsg()[1]. The problem was in non-freed frag_list in case of error. In the while loop: if (head == skb) skb_shinfo(head)->frag_list = tskb; else skb->next = tskb; frag_list filled with skbs, but nothing was freeing them. backtrace: [<0000000094c02615>] __alloc_skb+0x5e/0x250 net/core/skbuff.c:198 [<00000000e5386cbd>] alloc_skb include/linux/skbuff.h:1083 [inline] [<00000000e5386cbd>] kcm_sendmsg+0x3b6/0xa50 net/kcm/kcmsock.c:967 [1] [<00000000f1613a8a>] sock_sendmsg_nosec net/socket.c:652 [inline] [<00000000f1613a8a>] sock_sendmsg+0x4c/0x60 net/socket.c:672 Reported-and-tested-by: syzbot+b039f5699bd82e1fb011@syzkaller.appspotmail.com Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Cc: stable@vger.kernel.org Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 21:13:26 UTC
1f14a62 Bluetooth: btusb: Fix failing to init controllers with operation firmware Some firmware when operation don't may have broken versions leading to error like the following: [ 6.176482] Bluetooth: hci0: Firmware revision 0.0 build 121 week 7 2021 [ 6.177906] bluetooth hci0: Direct firmware load for intel/ibt-20-0-0.sfi failed with error -2 [ 6.177910] Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-20-0-0.sfi (-2) Since we load the firmware file just to check if its version had changed comparing to the one already loaded we can just skip since the firmware is already operation. Fixes: ac0565462e330 ("Bluetooth: btintel: Check firmware version before download") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> 03 June 2021, 21:02:17 UTC
a83d958 Bluetooth: Fix VIRTIO_ID_BT assigned number It turned out that the VIRTIO_ID_* are not assigned in the virtio_ids.h file in the upstream kernel. Picking the next free one was wrong and there is a process that has been followed now. See https://github.com/oasis-tcs/virtio-spec/issues/108 for details. Fixes: afd2daa26c7a ("Bluetooth: Add support for virtio transport driver") Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> 03 June 2021, 21:01:55 UTC
261ba78 sit: set name of device back to struct parms addrconf_set_sit_dstaddr will use parms->name. Signed-off-by: zhang kai <zhangkaiheb@126.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 20:57:36 UTC
a8db57c rtnetlink: Fix missing error code in rtnl_bridge_notify() The error code is missing in this code scenario, add the error code '-EINVAL' to the return value 'err'. Eliminate the follow smatch warning: net/core/rtnetlink.c:4834 rtnl_bridge_notify() warn: missing error code 'err'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 20:56:27 UTC
59717f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Do not allow to add conntrack helper extension for confirmed conntracks in the nf_tables ct expectation support. 2) Fix bogus EBUSY in nfnetlink_cthelper when NFCTH_PRIV_DATA_LEN is passed on userspace helper updates. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 03 June 2021, 20:49:08 UTC
e102db7 ice: track AF_XDP ZC enabled queues in bitmap Commit c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") silently introduced a regression and broke the Tx side of AF_XDP in copy mode. xsk_pool on ice_ring is set only based on the existence of the XDP prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed. That is not something that should happen for copy mode as it should use the regular data path ice_clean_tx_irq. This results in a following splat when xdpsock is run in txonly or l2fwd scenarios in copy mode: <snip> [ 106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 106.057269] #PF: supervisor read access in kernel mode [ 106.062493] #PF: error_code(0x0000) - not-present page [ 106.067709] PGD 0 P4D 0 [ 106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45 [ 106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [ 106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50 [ 106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00 [ 106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206 [ 106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800 [ 106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800 [ 106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800 [ 106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff [ 106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018 [ 106.157117] FS: 0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000 [ 106.165332] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0 [ 106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.192898] PKRU: 55555554 [ 106.195653] Call Trace: [ 106.198143] <IRQ> [ 106.200196] ice_clean_tx_irq_zc+0x183/0x2a0 [ice] [ 106.205087] ice_napi_poll+0x3e/0x590 [ice] [ 106.209356] __napi_poll+0x2a/0x160 [ 106.212911] net_rx_action+0xd6/0x200 [ 106.216634] __do_softirq+0xbf/0x29b [ 106.220274] irq_exit_rcu+0x88/0xc0 [ 106.223819] common_interrupt+0x7b/0xa0 [ 106.227719] </IRQ> [ 106.229857] asm_common_interrupt+0x1e/0x40 </snip> Fix this by introducing the bitmap of queues that are zero-copy enabled, where each bit, corresponding to a queue id that xsk pool is being configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and checked within ice_xsk_pool(). The latter is a function used for deciding which napi poll routine is executed. Idea is being taken from our other drivers such as i40e and ixgbe. Fixes: c7a219048e45 ("ice: Remove xsk_buff_pool from VSI structure") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 03 June 2021, 15:38:37 UTC
45ce085 igc: add correct exception tracing for XDP Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 73f1071c1d29 ("igc: Add support for XDP_TX action") Fixes: 4ff320361092 ("igc: Add support for XDP_REDIRECT action") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Dvora Fuxbrumer <dvorax.fuxbrumer@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 03 June 2021, 15:38:37 UTC
faae814 ixgbevf: add correct exception tracing for XDP Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 03 June 2021, 15:38:37 UTC
74431c4 igb: add correct exception tracing for XDP Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 9cbc948b5a20 ("igb: add XDP support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 03 June 2021, 15:38:37 UTC
8281356 ixgbe: add correct exception tracing for XDP Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 33fdc82f0883 ("ixgbe: add support for XDP_TX action") Fixes: d0bcacd0a130 ("ixgbe: add AF_XDP zero-copy Rx support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 03 June 2021, 15:38:37 UTC
89d65df ice: add correct exception tracing for XDP Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 03 June 2021, 15:38:37 UTC
f6c10b4 i40e: add correct exception tracing for XDP Add missing exception tracing to XDP when a number of different errors can occur. The support was only partial. Several errors where not logged which would confuse the user quite a lot not knowing where and why the packets disappeared. Fixes: 74608d17fe29 ("i40e: add support for XDP_TX action") Fixes: 0a714186d3c0 ("i40e: add AF_XDP zero-copy Rx support") Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 03 June 2021, 15:38:37 UTC
5379260 igb: Fix XDP with PTP enabled When using native XDP with the igb driver, the XDP frame data doesn't point to the beginning of the packet. It's off by 16 bytes. Everything works as expected with XDP skb mode. Actually these 16 bytes are used to store the packet timestamps. Therefore, pull the timestamp before executing any XDP operations and adjust all other code accordingly. The igc driver does it like that as well. Tested with Intel i210 card and AF_XDP sockets. Fixes: 9cbc948b5a20 ("igb: add XDP support") Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> 03 June 2021, 15:38:37 UTC
373e864 ieee802154: fix error return code in ieee802154_llsec_getparams() Fix to return negative error code -ENOBUFS from the error handling case instead of 0, as done elsewhere in this function. Fixes: 3e9c156e2c21 ("ieee802154: add netlink interfaces for llsec") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Link: https://lore.kernel.org/r/20210519141614.3040055-1-weiyongjun1@huawei.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> 03 June 2021, 08:59:49 UTC
79c6b8e ieee802154: fix error return code in ieee802154_add_iface() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: be51da0f3e34 ("ieee802154: Stop using NLA_PUT*().") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20210508062517.2574-1-thunder.leizhen@huawei.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> 03 June 2021, 08:50:08 UTC
aab53e6 net: ieee802154: mrf24j40: Drop unneeded of_match_ptr() Driver can be used in different environments and moreover, when compiled with !OF, the compiler may issue a warning due to unused mrf24j40_of_match variable. Hence drop unneeded of_match_ptr() call. While at it, update headers block to reflect above changes. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20210531132226.47081-1-andriy.shevchenko@linux.intel.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> 03 June 2021, 08:32:04 UTC
ad6f5cc net/ieee802154: drop unneeded assignment in llsec_iter_devkeys() In order to keep the code style consistency of the whole file, redundant return value ‘rc’ and its assignments should be deleted The clang_analyzer complains as follows: net/ieee802154/nl-mac.c:1203:12: warning: Although the value stored to 'rc' is used in the enclosing expression, the value is never actually read from 'rc' No functional change, only more efficient. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Link: https://lore.kernel.org/r/1619346299-40237-1-git-send-email-yang.lee@linux.alibaba.com Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org> 03 June 2021, 08:09:36 UTC
ab00f3e net: stmmac: fix issue where clk is being unprepared twice In the case of MDIO bus registration failure due to no external PHY devices is connected to the MAC, clk_disable_unprepare() is called in stmmac_bus_clk_config() and intel_eth_pci_probe() respectively. The second call in intel_eth_pci_probe() will caused the following:- [ 16.578605] intel-eth-pci 0000:00:1e.5: No PHY found [ 16.583778] intel-eth-pci 0000:00:1e.5: stmmac_dvr_probe: MDIO bus (id: 2) registration failed [ 16.680181] ------------[ cut here ]------------ [ 16.684861] stmmac-0000:00:1e.5 already disabled [ 16.689547] WARNING: CPU: 13 PID: 2053 at drivers/clk/clk.c:952 clk_core_disable+0x96/0x1b0 [ 16.697963] Modules linked in: dwc3 iTCO_wdt mei_hdcp iTCO_vendor_support udc_core x86_pkg_temp_thermal kvm_intel marvell10g kvm sch_fq_codel nfsd irqbypass dwmac_intel(+) stmmac uio ax88179_178a pcs_xpcs phylink uhid spi_pxa2xx_platform usbnet mei_me pcspkr tpm_crb mii i2c_i801 dw_dmac dwc3_pci thermal dw_dmac_core intel_rapl_msr libphy i2c_smbus mei tpm_tis intel_th_gth tpm_tis_core tpm intel_th_acpi intel_pmc_core intel_th i915 fuse configfs snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_pcm snd_timer snd soundcore [ 16.746785] CPU: 13 PID: 2053 Comm: systemd-udevd Tainted: G U 5.13.0-rc3-intel-lts #76 [ 16.756134] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.2012031421 12/03/2020 [ 16.769465] RIP: 0010:clk_core_disable+0x96/0x1b0 [ 16.774222] Code: 00 8b 05 45 96 17 01 85 c0 7f 24 48 8b 5b 30 48 85 db 74 a5 8b 43 7c 85 c0 75 93 48 8b 33 48 c7 c7 6e 32 cc b7 e8 b2 5d 52 00 <0f> 0b 5b 5d c3 65 8b 05 76 31 18 49 89 c0 48 0f a3 05 bc 92 1a 01 [ 16.793016] RSP: 0018:ffffa44580523aa0 EFLAGS: 00010086 [ 16.798287] RAX: 0000000000000000 RBX: ffff8d7d0eb70a00 RCX: 0000000000000000 [ 16.805435] RDX: 0000000000000002 RSI: ffffffffb7c62d5f RDI: 00000000ffffffff [ 16.812610] RBP: 0000000000000287 R08: 0000000000000000 R09: ffffa445805238d0 [ 16.819759] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8d7d0eb70a00 [ 16.826904] R13: ffff8d7d027370c8 R14: 0000000000000006 R15: ffffa44580523ad0 [ 16.834047] FS: 00007f9882fa2600(0000) GS:ffff8d80a0940000(0000) knlGS:0000000000000000 [ 16.842177] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.847966] CR2: 00007f9882bea3d8 CR3: 000000010b126001 CR4: 0000000000370ee0 [ 16.855144] Call Trace: [ 16.857614] clk_core_disable_lock+0x1b/0x30 [ 16.861941] intel_eth_pci_probe.cold+0x11d/0x136 [dwmac_intel] [ 16.867913] pci_device_probe+0xcf/0x150 [ 16.871890] really_probe+0xf5/0x3e0 [ 16.875526] driver_probe_device+0x64/0x150 [ 16.879763] device_driver_attach+0x53/0x60 [ 16.883998] __driver_attach+0x9f/0x150 [ 16.887883] ? device_driver_attach+0x60/0x60 [ 16.892288] ? device_driver_attach+0x60/0x60 [ 16.896698] bus_for_each_dev+0x77/0xc0 [ 16.900583] bus_add_driver+0x184/0x1f0 [ 16.904469] driver_register+0x6c/0xc0 [ 16.908268] ? 0xffffffffc07ae000 [ 16.911598] do_one_initcall+0x4a/0x210 [ 16.915489] ? kmem_cache_alloc_trace+0x305/0x4e0 [ 16.920247] do_init_module+0x5c/0x230 [ 16.924057] load_module+0x2894/0x2b70 [ 16.927857] ? __do_sys_finit_module+0xb5/0x120 [ 16.932441] __do_sys_finit_module+0xb5/0x120 [ 16.936845] do_syscall_64+0x42/0x80 [ 16.940476] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.945586] RIP: 0033:0x7f98830e5ccd [ 16.949177] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48 [ 16.967970] RSP: 002b:00007ffc66b60168 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 16.975583] RAX: ffffffffffffffda RBX: 000055885de35ef0 RCX: 00007f98830e5ccd [ 16.982725] RDX: 0000000000000000 RSI: 00007f98832541e3 RDI: 0000000000000012 [ 16.989868] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000 [ 16.997042] R10: 0000000000000012 R11: 0000000000000246 R12: 00007f98832541e3 [ 17.004222] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffc66b60328 [ 17.011369] ---[ end trace df06a3dab26b988c ]--- [ 17.016062] ------------[ cut here ]------------ [ 17.020701] stmmac-0000:00:1e.5 already unprepared Removing the stmmac_bus_clks_config() call in stmmac_dvr_probe and let dwmac-intel to handle the unprepare and disable of the clk device. Fixes: 5ec55823438e ("net: stmmac: add clocks management for gmac driver") Cc: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com> Reviewed-by: Joakim Zhang <qiangqing.zhang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> 02 June 2021, 20:30:05 UTC
b508d5f net: ipconfig: Don't override command-line hostnames or domains If the user specifies a hostname or domain name as part of the ip= command-line option, preserve it and don't overwrite it with one supplied by DHCP/BOOTP. For instance, ip=::::myhostname::dhcp will use "myhostname" rather than ignoring and overwriting it. Fix the comment on ic_bootp_string that suggests it only copies a string "if not already set"; it doesn't have any such logic. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: David S. Miller <davem@davemloft.net> 02 June 2021, 20:27:03 UTC
dd62766 Merge tag 'mlx5-fixes-2021-06-01' of git://git.kernel.org/pub/scm/linu x/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-06-01 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> 02 June 2021, 20:12:00 UTC
ff40e51 bpf, lockdown, audit: Fix buggy SELinux lockdown permission checks Commit 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. This is indirectly also getting audit subsystem involved to report events. The latter is problematic, as reported by Ondrej and Serhei, since it can bring down the whole system via audit: 1) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. 2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit() when trying to wake up kauditd, for example, when using trace_sched_switch() tracepoint, see details in [1]. Triggering this was not via some hypothetical corner case, but with existing tools like runqlat & runqslower from bcc, for example, which make use of this tracepoint. Rough call sequence goes like: rq_lock(rq) -> -------------------------+ trace_sched_switch() -> | bpf_prog_xyz() -> +-> deadlock selinux_lockdown() -> | audit_log_end() -> | wake_up_interruptible() -> | try_to_wake_up() -> | rq_lock(rq) --------------+ What's worse is that the intention of 59438b46471a to further restrict lockdown settings for specific applications in respect to the global lockdown policy is completely broken for BPF. The SELinux policy rule for the current lockdown check looks something like this: allow <who> <who> : lockdown { <reason> }; However, this doesn't match with the 'current' task where the security_locked_down() is executed, example: httpd does a syscall. There is a tracing program attached to the syscall which triggers a BPF program to run, which ends up doing a bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does the permission check against 'current', that is, httpd in this example. httpd has literally zero relation to this tracing program, and it would be nonsensical having to write an SELinux policy rule against httpd to let the tracing helper pass. The policy in this case needs to be against the entity that is installing the BPF program. For example, if bpftrace would generate a histogram of syscall counts by user space application: bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' bpftrace would then go and generate a BPF program from this internally. One way of doing it [for the sake of the example] could be to call bpf_get_current_task() helper and then access current->comm via one of bpf_probe_read_kernel{,_str}() helpers. So the program itself has nothing to do with httpd or any other random app doing a syscall here. The BPF program _explicitly initiated_ the lockdown check. The allow/deny policy belongs in the context of bpftrace: meaning, you want to grant bpftrace access to use these helpers, but other tracers on the system like my_random_tracer _not_. Therefore fix all three issues at the same time by taking a completely different approach for the security_locked_down() hook, that is, move the check into the program verification phase where we actually retrieve the BPF func proto. This also reliably gets the task (current) that is trying to install the BPF tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since we're moving this out of the BPF helper's fast-path which can be called several millions of times per second. The check is then also in line with other security_locked_down() hooks in the system where the enforcement is performed at open/load time, for example, open_kcore() for /proc/kcore access or module_sig_check() for module signatures just to pick few random ones. What's out of scope in the fix as well as in other security_locked_down() hook locations /outside/ of BPF subsystem is that if the lockdown policy changes on the fly there is no retrospective action. This requires a different discussion, potentially complex infrastructure, and it's also not clear whether this can be solved generically. Either way, it is out of scope for a suitable stable fix which this one is targeting. Note that the breakage is specifically on 59438b46471a where it started to rely on 'current' as UAPI behavior, and _not_ earlier infrastructure such as 9d1f8be5cf42 ("bpf: Restrict bpf when kernel lockdown is in confidentiality mode"). [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says: I starting seeing this with F-34. When I run a container that is traced with BPF to record the syscalls it is doing, auditd is flooded with messages like: type=AVC msg=audit(1619784520.593:282387): avc: denied { confidentiality } for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM" scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0 This seems to be leading to auditd running out of space in the backlog buffer and eventually OOMs the machine. [...] auditd running at 99% CPU presumably processing all the messages, eventually I get: Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64 Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000 Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1 Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [...] [1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDCHXJUj43J7jQ@mail.gmail.com/, Serhei Makarov says: Upstream kernel 5.11.0-rc7 and later was found to deadlock during a bpf_probe_read_compat() call within a sched_switch tracepoint. The problem is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on ppc64le. Example stack trace: [...] [ 730.868702] stack backtrace: [ 730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1 [ 730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 730.873278] Call Trace: [ 730.873770] dump_stack+0x7f/0xa1 [ 730.874433] check_noncircular+0xdf/0x100 [ 730.875232] __lock_acquire+0x1202/0x1e10 [ 730.876031] ? __lock_acquire+0xfc0/0x1e10 [ 730.876844] lock_acquire+0xc2/0x3a0 [ 730.877551] ? __wake_up_common_lock+0x52/0x90 [ 730.878434] ? lock_acquire+0xc2/0x3a0 [ 730.879186] ? lock_is_held_type+0xa7/0x120 [ 730.880044] ? skb_queue_tail+0x1b/0x50 [ 730.880800] _raw_spin_lock_irqsave+0x4d/0x90 [ 730.881656] ? __wake_up_common_lock+0x52/0x90 [ 730.882532] __wake_up_common_lock+0x52/0x90 [ 730.883375] audit_log_end+0x5b/0x100 [ 730.884104] slow_avc_audit+0x69/0x90 [ 730.884836] avc_has_perm+0x8b/0xb0 [ 730.885532] selinux_lockdown+0xa5/0xd0 [ 730.886297] security_locked_down+0x20/0x40 [ 730.887133] bpf_probe_read_compat+0x66/0xd0 [ 730.887983] bpf_prog_250599c5469ac7b5+0x10f/0x820 [ 730.888917] trace_call_bpf+0xe9/0x240 [ 730.889672] perf_trace_run_bpf_submit+0x4d/0xc0 [ 730.890579] perf_trace_sched_switch+0x142/0x180 [ 730.891485] ? __schedule+0x6d8/0xb20 [ 730.892209] __schedule+0x6d8/0xb20 [ 730.892899] schedule+0x5b/0xc0 [ 730.893522] exit_to_user_mode_prepare+0x11d/0x240 [ 730.894457] syscall_exit_to_user_mode+0x27/0x70 [ 730.895361] entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown") Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Reported-by: Jakub Hrozek <jhrozek@redhat.com> Reported-by: Serhei Makarov <smakarov@redhat.com> Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jamorris@linux.microsoft.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Frank Eigler <fche@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f841c3@iogearbox.net 02 June 2021, 19:59:22 UTC
8971ee8 netfilter: nfnetlink_cthelper: hit EBUSY on updates if size mismatches The private helper data size cannot be updated. However, updates that contain NFCTH_PRIV_DATA_LEN might bogusly hit EBUSY even if the size is the same. Fixes: 12f7a505331e ("netfilter: add user-space connection tracking helper infrastructure") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> 02 June 2021, 10:43:50 UTC
1710eb9 netfilter: nft_ct: skip expectations for confirmed conntrack nft_ct_expect_obj_eval() calls nf_ct_ext_add() for a confirmed conntrack entry. However, nf_ct_ext_add() can only be called for !nf_ct_is_confirmed(). [ 1825.349056] WARNING: CPU: 0 PID: 1279 at net/netfilter/nf_conntrack_extend.c:48 nf_ct_xt_add+0x18e/0x1a0 [nf_conntrack] [ 1825.351391] RIP: 0010:nf_ct_ext_add+0x18e/0x1a0 [nf_conntrack] [ 1825.351493] Code: 41 5c 41 5d 41 5e 41 5f c3 41 bc 0a 00 00 00 e9 15 ff ff ff ba 09 00 00 00 31 f6 4c 89 ff e8 69 6c 3d e9 eb 96 45 31 ed eb cd <0f> 0b e9 b1 fe ff ff e8 86 79 14 e9 eb bf 0f 1f 40 00 0f 1f 44 00 [ 1825.351721] RSP: 0018:ffffc90002e1f1e8 EFLAGS: 00010202 [ 1825.351790] RAX: 000000000000000e RBX: ffff88814f5783c0 RCX: ffffffffc0e4f887 [ 1825.351881] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88814f578440 [ 1825.351971] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff88814f578447 [ 1825.352060] R10: ffffed1029eaf088 R11: 0000000000000001 R12: ffff88814f578440 [ 1825.352150] R13: ffff8882053f3a00 R14: 0000000000000000 R15: 0000000000000a20 [ 1825.352240] FS: 00007f992261c900(0000) GS:ffff889faec00000(0000) knlGS:0000000000000000 [ 1825.352343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1825.352417] CR2: 000056070a4d1158 CR3: 000000015efe0000 CR4: 0000000000350ee0 [ 1825.352508] Call Trace: [ 1825.352544] nf_ct_helper_ext_add+0x10/0x60 [nf_conntrack] [ 1825.352641] nft_ct_expect_obj_eval+0x1b8/0x1e0 [nft_ct] [ 1825.352716] nft_do_chain+0x232/0x850 [nf_tables] Add the ct helper extension only for unconfirmed conntrack. Skip rule evaluation if the ct helper extension does not exist. Thus, you can only create expectations from the first packet. It should be possible to remove this limitation by adding a new action to attach a generic ct helper to the first packet. Then, use this ct helper extension from follow up packets to create the ct expectation. While at it, add a missing check to skip the template conntrack too and remove check for IPCT_UNTRACK which is implicit to !ct. Fixes: 857b46027d6f ("netfilter: nft_ct: add ct expectations support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> 02 June 2021, 10:43:34 UTC
216214c net/mlx5: DR, Create multi-destination flow table with level less than 64 Flow table that contains flow pointing to multiple flow tables or multiple TIRs must have a level lower than 64. In our case it applies to muli- destination flow table. Fix the level of the created table to comply with HW Spec definitions, and still make sure that its level lower than SW-owned tables, so that it would be possible to point from the multi-destination FW table to SW tables. Fixes: 34583beea4b7 ("net/mlx5: DR, Create multi-destination table for SW-steering use") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 02 June 2021, 01:30:21 UTC
5349cbb net/mlx5e: Fix conflict with HW TS and CQE compression When a driver's profile doesn't support a dedicated PTP-RQ, configuration of CQE compression while HW TS is configured should fail. Fixes: 885b8cfb161e ("net/mlx5e: Update ethtool setting of CQE compression") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 02 June 2021, 01:30:21 UTC
256f79d net/mlx5e: Fix HW TS with CQE compression according to profile When the driver's profile doesn't support a dedicated PTP-RQ, the PTP accuracy of HW TS is affected by the CQE compression. In this case, turn off CQE compression. Otherwise, the driver crashes: BUG: kernel NULL pointer dereference, address:0000000000000018 ... ... RIP: 0010:mlx5e_ptp_rx_set_fs+0x25/0x1a0 [mlx5_core] ... ... Call Trace: mlx5e_ptp_activate_channel+0xb2/0xf0 [mlx5_core] mlx5e_activate_priv_channels+0x3b9/0x8c0 [mlx5_core] ? __mutex_unlock_slowpath+0x45/0x2a0 ? mlx5e_refresh_tirs+0x151/0x1e0 [mlx5_core] mlx5e_switch_priv_channels+0x1cd/0x2d0 [mlx5_core] ? mlx5e_xdp_allowed+0x150/0x150 [mlx5_core] mlx5e_safe_switch_params+0x118/0x3c0 [mlx5_core] ? __mutex_lock+0x6e/0x8e0 ? mlx5e_hwstamp_set+0xa9/0x300 [mlx5_core] mlx5e_hwstamp_set+0x194/0x300 [mlx5_core] ? dev_ioctl+0x9b/0x3d0 mlx5i_ioctl+0x37/0x60 [mlx5_core] mlx5i_pkey_ioctl+0x12/0x20 [mlx5_core] dev_ioctl+0xa9/0x3d0 sock_ioctl+0x268/0x420 __x64_sys_ioctl+0x3d8/0x790 ? lockdep_hardirqs_on_prepare+0xe4/0x190 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 960fbfe222a4 ("net/mlx5e: Allow coexistence of CQE compression and HW TS PTP") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 02 June 2021, 01:30:21 UTC
2a2c84f net/mlx5e: Fix adding encap rules to slow path On some devices the ignore flow level cap is not supported and we shouldn't use it. Setting the dest ft with mlx5_chains_get_tc_end_ft() already gives the correct end ft if ignore flow level cap is supported or not. Fixes: 39ac237ce009 ("net/mlx5: E-Switch, Refactor chains and priorities") Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 02 June 2021, 01:30:20 UTC
afe93f7 net/mlx5e: Check for needed capability for cvlan matching If not supported show an error and return instead of trying to offload to the hardware and fail. Fixes: 699e96ddf47f ("net/mlx5e: Support offloading tc double vlan headers match") Reported-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 02 June 2021, 01:30:20 UTC
5940e64 net/mlx5: Check firmware sync reset requested is set before trying to abort it In case driver sent NACK to firmware on sync reset request, it will get sync reset abort event while it didn't set sync reset requested mode. Thus, on abort sync reset event handler, driver should check reset requested is set before trying to stop sync reset poll. Fixes: 7dd6df329d4c ("net/mlx5: Handle sync reset abort event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 02 June 2021, 01:30:20 UTC
b38742e net/mlx5e: Disable TLS offload for uplink representor TLS offload is not supported in switchdev mode. Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 02 June 2021, 01:30:19 UTC
d8ec920 net/mlx5e: Fix incompatible casting Device supports setting of a single fec mode at a time, enforce this by bitmap_weight == 1. Input from fec command is in u32, avoid cast to unsigned long and use bitmap_from_arr32 to populate bitmap safely. Fixes: 4bd9d5070b92 ("net/mlx5e: Enforce setting of a single FEC mode") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> 02 June 2021, 01:30:19 UTC
back to top