https://github.com/torvalds/linux
Revision 899d2a05dc14733cfba6224083c6b0dd5a738590 authored by Caleb Sander on 18 November 2022, 23:27:56 UTC, committed by Christoph Hellwig on 30 November 2022, 13:37:46 UTC
Walking the nvme_ns_head siblings list is protected by the head's srcu
in nvme_ns_head_submit_bio() but not nvme_mpath_revalidate_paths().
Removing namespaces from the list also fails to synchronize the srcu.
Concurrent scan work can therefore cause use-after-frees.

Hold the head's srcu lock in nvme_mpath_revalidate_paths() and
synchronize with the srcu, not the global RCU, in nvme_ns_remove().

Observed the following panic when making NVMe/RDMA connections
with native multipath on the Rocky Linux 8.6 kernel
(it seems the upstream kernel has the same race condition).
Disassembly shows the faulting instruction is cmp 0x50(%rdx),%rcx;
computing capacity != get_capacity(ns->disk).
Address 0x50 is dereferenced because ns->disk is NULL.
The NULL disk appears to be the result of concurrent scan work
freeing the namespace (note the log line in the middle of the panic).

[37314.206036] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[37314.206036] nvme0n3: detected capacity change from 0 to 11811160064
[37314.299753] PGD 0 P4D 0
[37314.299756] Oops: 0000 [#1] SMP PTI
[37314.299759] CPU: 29 PID: 322046 Comm: kworker/u98:3 Kdump: loaded Tainted: G        W      X --------- -  - 4.18.0-372.32.1.el8test86.x86_64 #1
[37314.299762] Hardware name: Dell Inc. PowerEdge R720/0JP31P, BIOS 2.7.0 05/23/2018
[37314.299763] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[37314.299783] RIP: 0010:nvme_mpath_revalidate_paths+0x26/0xb0 [nvme_core]
[37314.299790] Code: 1f 44 00 00 66 66 66 66 90 55 53 48 8b 5f 50 48 8b 83 c8 c9 00 00 48 8b 13 48 8b 48 50 48 39 d3 74 20 48 8d 42 d0 48 8b 50 20 <48> 3b 4a 50 74 05 f0 80 60 70 ef 48 8b 50 30 48 8d 42 d0 48 39 d3
[37315.058803] RSP: 0018:ffffabe28f913d10 EFLAGS: 00010202
[37315.121316] RAX: ffff927a077da800 RBX: ffff92991dd70000 RCX: 0000000001600000
[37315.206704] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff92991b719800
[37315.292106] RBP: ffff929a6b70c000 R08: 000000010234cd4a R09: c0000000ffff7fff
[37315.377501] R10: 0000000000000001 R11: ffffabe28f913a30 R12: 0000000000000000
[37315.462889] R13: ffff92992716600c R14: ffff929964e6e030 R15: ffff92991dd70000
[37315.548286] FS:  0000000000000000(0000) GS:ffff92b87fb80000(0000) knlGS:0000000000000000
[37315.645111] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37315.713871] CR2: 0000000000000050 CR3: 0000002208810006 CR4: 00000000000606e0
[37315.799267] Call Trace:
[37315.828515]  nvme_update_ns_info+0x1ac/0x250 [nvme_core]
[37315.892075]  nvme_validate_or_alloc_ns+0x2ff/0xa00 [nvme_core]
[37315.961871]  ? __blk_mq_free_request+0x6b/0x90
[37316.015021]  nvme_scan_work+0x151/0x240 [nvme_core]
[37316.073371]  process_one_work+0x1a7/0x360
[37316.121318]  ? create_worker+0x1a0/0x1a0
[37316.168227]  worker_thread+0x30/0x390
[37316.212024]  ? create_worker+0x1a0/0x1a0
[37316.258939]  kthread+0x10a/0x120
[37316.297557]  ? set_kthread_struct+0x50/0x50
[37316.347590]  ret_from_fork+0x35/0x40
[37316.390360] Modules linked in: nvme_rdma nvme_tcp(X) nvme_fabrics nvme_core netconsole iscsi_tcp libiscsi_tcp dm_queue_length dm_service_time nf_conntrack_netlink br_netfilter bridge stp llc overlay nft_chain_nat ipt_MASQUERADE nf_nat xt_addrtype xt_CT nft_counter xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_multiport nft_compat nf_tables libcrc32c nfnetlink dm_multipath tg3 rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm intel_rapl_msr iTCO_wdt iTCO_vendor_support dcdbas intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul crc32_pclmul mlx5_ib ghash_clmulni_intel ib_uverbs rapl intel_cstate intel_uncore ib_core ipmi_si joydev mei_me pcspkr ipmi_devintf mei lpc_ich wmi ipmi_msghandler acpi_power_meter ext4 mbcache jbd2 sd_mod t10_pi sg mgag200 mlx5_core drm_kms_helper syscopyarea
[37316.390419]  sysfillrect ahci sysimgblt fb_sys_fops libahci drm crc32c_intel libata mlxfw pci_hyperv_intf tls i2c_algo_bit psample dm_mirror dm_region_hash dm_log dm_mod fuse [last unloaded: nvme_core]
[37317.645908] CR2: 0000000000000050

Fixes: e7d65803e2bb ("nvme-multipath: revalidate paths during rescan")
Signed-off-by: Caleb Sander <csander@purestorage.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
1 parent a56ea61
History
Tip revision: 899d2a05dc14733cfba6224083c6b0dd5a738590 authored by Caleb Sander on 18 November 2022, 23:27:56 UTC
nvme: fix SRCU protection of nvme_ns_head list
Tip revision: 899d2a0
File Mode Size
bpf
cgroup
configs
debug
dma
entry
events
futex
gcov
irq
kcsan
livepatch
locking
module
power
printk
rcu
sched
time
trace
.gitignore -rw-r--r-- 67 bytes
Kconfig.freezer -rw-r--r-- 92 bytes
Kconfig.hz -rw-r--r-- 1.7 KB
Kconfig.locks -rw-r--r-- 4.9 KB
Kconfig.preempt -rw-r--r-- 4.8 KB
Makefile -rw-r--r-- 5.1 KB
acct.c -rw-r--r-- 15.8 KB
async.c -rw-r--r-- 9.2 KB
audit.c -rw-r--r-- 64.8 KB
audit.h -rw-r--r-- 10.7 KB
audit_fsnotify.c -rw-r--r-- 5.3 KB
audit_tree.c -rw-r--r-- 25.6 KB
audit_watch.c -rw-r--r-- 13.7 KB
auditfilter.c -rw-r--r-- 34.4 KB
auditsc.c -rw-r--r-- 80.8 KB
backtracetest.c -rw-r--r-- 1.9 KB
bounds.c -rw-r--r-- 751 bytes
capability.c -rw-r--r-- 14.8 KB
cfi.c -rw-r--r-- 2.2 KB
compat.c -rw-r--r-- 6.8 KB
configs.c -rw-r--r-- 2.0 KB
context_tracking.c -rw-r--r-- 23.2 KB
cpu.c -rw-r--r-- 65.8 KB
cpu_pm.c -rw-r--r-- 6.1 KB
crash_core.c -rw-r--r-- 12.7 KB
crash_dump.c -rw-r--r-- 1.1 KB
cred.c -rw-r--r-- 24.2 KB
delayacct.c -rw-r--r-- 6.8 KB
dma.c -rw-r--r-- 3.3 KB
exec_domain.c -rw-r--r-- 1.1 KB
exit.c -rw-r--r-- 45.3 KB
extable.c -rw-r--r-- 4.2 KB
fail_function.c -rw-r--r-- 7.0 KB
fork.c -rw-r--r-- 79.8 KB
freezer.c -rw-r--r-- 4.4 KB
gen_kheaders.sh -rwxr-xr-x 3.1 KB
groups.c -rw-r--r-- 5.0 KB
hung_task.c -rw-r--r-- 9.4 KB
iomem.c -rw-r--r-- 4.7 KB
irq_work.c -rw-r--r-- 7.5 KB
jump_label.c -rw-r--r-- 20.4 KB
kallsyms.c -rw-r--r-- 23.7 KB
kallsyms_internal.h -rw-r--r-- 858 bytes
kcmp.c -rw-r--r-- 5.4 KB
kcov.c -rw-r--r-- 28.9 KB
kexec.c -rw-r--r-- 7.4 KB
kexec_core.c -rw-r--r-- 31.1 KB
kexec_elf.c -rw-r--r-- 11.4 KB
kexec_file.c -rw-r--r-- 32.4 KB
kexec_internal.h -rw-r--r-- 924 bytes
kheaders.c -rw-r--r-- 1.6 KB
kmod.c -rw-r--r-- 5.0 KB
kprobes.c -rw-r--r-- 73.8 KB
ksysfs.c -rw-r--r-- 6.3 KB
kthread.c -rw-r--r-- 41.6 KB
latencytop.c -rw-r--r-- 7.6 KB
module_signature.c -rw-r--r-- 1.1 KB
notifier.c -rw-r--r-- 17.9 KB
nsproxy.c -rw-r--r-- 12.8 KB
padata.c -rw-r--r-- 27.4 KB
panic.c -rw-r--r-- 18.8 KB
params.c -rw-r--r-- 23.1 KB
pid.c -rw-r--r-- 18.2 KB
pid_namespace.c -rw-r--r-- 11.3 KB
profile.c -rw-r--r-- 13.5 KB
ptrace.c -rw-r--r-- 36.9 KB
range.c -rw-r--r-- 3.0 KB
reboot.c -rw-r--r-- 31.7 KB
regset.c -rw-r--r-- 1.9 KB
relay.c -rw-r--r-- 30.0 KB
resource.c -rw-r--r-- 51.7 KB
resource_kunit.c -rw-r--r-- 4.3 KB
rseq.c -rw-r--r-- 10.0 KB
scftorture.c -rw-r--r-- 20.0 KB
scs.c -rw-r--r-- 2.9 KB
seccomp.c -rw-r--r-- 63.4 KB
signal.c -rw-r--r-- 123.3 KB
smp.c -rw-r--r-- 34.3 KB
smpboot.c -rw-r--r-- 11.9 KB
smpboot.h -rw-r--r-- 640 bytes
softirq.c -rw-r--r-- 24.1 KB
stackleak.c -rw-r--r-- 4.5 KB
stacktrace.c -rw-r--r-- 10.4 KB
static_call.c -rw-r--r-- 158 bytes
static_call_inline.c -rw-r--r-- 12.5 KB
stop_machine.c -rw-r--r-- 18.3 KB
sys.c -rw-r--r-- 65.2 KB
sys_ni.c -rw-r--r-- 10.2 KB
sysctl-test.c -rw-r--r-- 10.7 KB
sysctl.c -rw-r--r-- 59.5 KB
task_work.c -rw-r--r-- 5.0 KB
taskstats.c -rw-r--r-- 15.8 KB
torture.c -rw-r--r-- 25.3 KB
tracepoint.c -rw-r--r-- 20.3 KB
tsacct.c -rw-r--r-- 5.0 KB
ucount.c -rw-r--r-- 9.1 KB
uid16.c -rw-r--r-- 5.1 KB
uid16.h -rw-r--r-- 442 bytes
umh.c -rw-r--r-- 15.1 KB
up.c -rw-r--r-- 1.5 KB
user-return-notifier.c -rw-r--r-- 1.3 KB
user.c -rw-r--r-- 5.9 KB
user_namespace.c -rw-r--r-- 35.7 KB
usermode_driver.c -rw-r--r-- 4.3 KB
utsname.c -rw-r--r-- 3.8 KB
utsname_sysctl.c -rw-r--r-- 3.2 KB
watch_queue.c -rw-r--r-- 17.2 KB
watchdog.c -rw-r--r-- 22.6 KB
watchdog_hld.c -rw-r--r-- 7.7 KB
workqueue.c -rw-r--r-- 168.4 KB
workqueue_internal.h -rw-r--r-- 2.4 KB

back to top