https://github.com/torvalds/linux
Revision 16d42d313350946f4b9a8b74a13c99f0461a6572 authored by Shay Drory on 04 April 2022, 07:47:36 UTC, committed by Saeed Mahameed on 18 May 2022, 06:03:57 UTC
In case fw sync reset is called in parallel to device removal, device
might stuck in the following deadlock:
         CPU 0                        CPU 1
         -----                        -----
                                  remove_one
                                   uninit_one (locks intf_state_mutex)
mlx5_sync_reset_now_event()
work in fw_reset->wq.
 mlx5_enter_error_state()
  mutex_lock (intf_state_mutex)
                                   cleanup_once
                                    fw_reset_cleanup()
                                     destroy_workqueue(fw_reset->wq)

Drain the fw_reset WQ, and make sure no new work is being queued, before
entering uninit_one().
The Drain is done before devlink_unregister() since fw_reset, in some
flows, is using devlink API devlink_remote_reload_actions_performed().

Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
1 parent 04c551b
History
Tip revision: 16d42d313350946f4b9a8b74a13c99f0461a6572 authored by Shay Drory on 04 April 2022, 07:47:36 UTC
net/mlx5: Drain fw_reset when removing device
Tip revision: 16d42d3
File Mode Size
Documentation
LICENSES
arch
block
certs
crypto
drivers
fs
include
init
ipc
kernel
lib
mm
net
samples
scripts
security
sound
tools
usr
virt
.clang-format -rw-r--r-- 16.6 KB
.cocciconfig -rw-r--r-- 59 bytes
.get_maintainer.ignore -rw-r--r-- 71 bytes
.gitattributes -rw-r--r-- 62 bytes
.gitignore -rw-r--r-- 1.9 KB
.mailmap -rw-r--r-- 22.3 KB
COPYING -rw-r--r-- 496 bytes
CREDITS -rw-r--r-- 98.9 KB
Kbuild -rw-r--r-- 1.3 KB
Kconfig -rw-r--r-- 555 bytes
MAINTAINERS -rw-r--r-- 642.0 KB
Makefile -rw-r--r-- 63.6 KB
README -rw-r--r-- 727 bytes

README

back to top