https://github.com/torvalds/linux
Revision da314c9923fed553a007785a901fd395b7eb6c19 authored by Herbert Xu on 22 September 2015, 03:38:56 UTC, committed by David S. Miller on 24 September 2015, 19:07:08 UTC
On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
>
> store_release and load_acquire are different from the usual memory
> barriers and can't be paired this way.  You have to pair store_release
> and load_acquire.  Besides, it isn't a particularly good idea to

OK I've decided to drop the acquire/release helpers as they don't
help us at all and simply pessimises the code by using full memory
barriers (on some architectures) where only a write or read barrier
is needed.

> depend on memory barriers embedded in other data structures like the
> above.  Here, especially, rhashtable_insert() would have write barrier
> *before* the entry is hashed not necessarily *after*, which means that
> in the above case, a socket which appears to have set bound to a
> reader might not visible when the reader tries to look up the socket
> on the hashtable.

But you are right we do need an explicit write barrier here to
ensure that the hashing is visible.

> There's no reason to be overly smart here.  This isn't a crazy hot
> path, write barriers tend to be very cheap, store_release more so.
> Please just do smp_store_release() and note what it's paired with.

It's not about being overly smart.  It's about actually understanding
what's going on with the code.  I've seen too many instances of
people simply sprinkling synchronisation primitives around without
any knowledge of what is happening underneath, which is just a recipe
for creating hard-to-debug races.

> > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
> >  		}
> >  	}
> >
> > -	if (!nlk->portid) {
> > +	if (!nlk->bound) {
>
> I don't think you can skip load_acquire here just because this is the
> second deref of the variable.  That doesn't change anything.  Race
> condition could still happen between the first and second tests and
> skipping the second would lead to the same kind of bug.

The reason this one is OK is because we do not use nlk->portid or
try to get nlk from the hash table before we return to user-space.

However, there is a real bug here that none of these acquire/release
helpers discovered.  The two bound tests here used to be a single
one.  Now that they are separate it is entirely possible for another
thread to come in the middle and bind the socket.  So we need to
repeat the portid check in order to maintain consistency.

> > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
> >  	    !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
> >  		return -EPERM;
> >
> > -	if (!nlk->portid)
> > +	if (!nlk->bound)
>
> Don't we need load_acquire here too?  Is this path holding a lock
> which makes that unnecessary?

Ditto.

---8<---
The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink:
Fix autobind race condition that leads to zero port ID") created
some new races that can occur due to inconcsistencies between the
two port IDs.

Tejun is right that a barrier is unavoidable.  Therefore I am
reverting to the original patch that used a boolean to indicate
that a user netlink socket has been bound.

Barriers have been added where necessary to ensure that a valid
portid and the hashed socket is visible.

I have also changed netlink_insert to only return EBUSY if the
socket is bound to a portid different to the requested one.  This
combined with only reading nlk->bound once in netlink_bind fixes
a race where two threads that bind the socket at the same time
with different port IDs may both succeed.

Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID")
Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nacked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 7bbe33f
History
Tip revision: da314c9923fed553a007785a901fd395b7eb6c19 authored by Herbert Xu on 22 September 2015, 03:38:56 UTC
netlink: Replace rhash_portid with bound
Tip revision: da314c9
File Mode Size
ABI
DocBook
EDID
PCI
RCU
accounting
acpi
aoe
arm
arm64
auxdisplay
backlight
blackfin
block
blockdev
bus-devices
cdrom
cgroups
cma
connector
console
cpu-freq
cpuidle
cris
crypto
development-process
device-mapper
devicetree
dmaengine
driver-model
dvb
early-userspace
extcon
fault-injection
fb
features
filesystems
firmware_class
fmc
frv
gpio
hid
hwmon
i2c
ia64
ide
infiniband
input
ioctl
isdn
ja_JP
kbuild
kdump
ko_KR
laptops
leds
locking
m68k
memory-devices
metag
mic
mips
misc-devices
mmc
mn10300
mtd
namespaces
netlabel
networking
nfc
nios2
nvdimm
nvmem
parisc
pcmcia
phy
platform
power
powerpc
pps
prctl
pti
ptp
rapidio
s390
scheduler
scsi
security
serial
sh
sound
spi
sysctl
target
thermal
timers
tpm
trace
usb
vDSO
video4linux
virtual
vm
w1
watchdog
wimax
x86
xtensa
zh_CN
00-INDEX -rw-r--r-- 16.8 KB
BUG-HUNTING -rw-r--r-- 8.1 KB
Changes -rw-r--r-- 11.5 KB
CodeOfConflict -rw-r--r-- 1.4 KB
CodingStyle -rw-r--r-- 34.0 KB
DMA-API-HOWTO.txt -rw-r--r-- 34.9 KB
DMA-API.txt -rw-r--r-- 27.9 KB
DMA-ISA-LPC.txt -rw-r--r-- 5.2 KB
DMA-attributes.txt -rw-r--r-- 4.5 KB
HOWTO -rw-r--r-- 27.1 KB
IPMI.txt -rw-r--r-- 29.2 KB
IRQ-affinity.txt -rw-r--r-- 2.5 KB
IRQ-domain.txt -rw-r--r-- 10.0 KB
IRQ.txt -rw-r--r-- 962 bytes
Intel-IOMMU.txt -rw-r--r-- 3.8 KB
Makefile -rw-r--r-- 181 bytes
ManagementStyle -rw-r--r-- 12.9 KB
SAK.txt -rw-r--r-- 2.8 KB
SM501.txt -rw-r--r-- 2.8 KB
SecurityBugs -rw-r--r-- 1.8 KB
SubmitChecklist -rw-r--r-- 4.4 KB
SubmittingDrivers -rw-r--r-- 6.2 KB
SubmittingPatches -rw-r--r-- 35.1 KB
VGA-softcursor.txt -rw-r--r-- 2.0 KB
adding-syscalls.txt -rw-r--r-- 23.8 KB
applying-patches.txt -rw-r--r-- 19.5 KB
assoc_array.txt -rw-r--r-- 20.0 KB
atomic_ops.txt -rw-r--r-- 21.8 KB
bad_memory.txt -rw-r--r-- 1.1 KB
basic_profiling.txt -rw-r--r-- 1.7 KB
bcache.txt -rw-r--r-- 16.4 KB
binfmt_misc.txt -rw-r--r-- 6.5 KB
braille-console.txt -rw-r--r-- 1.4 KB
bt8xxgpio.txt -rw-r--r-- 4.3 KB
btmrvl.txt -rw-r--r-- 2.9 KB
bus-virt-phys-mapping.txt -rw-r--r-- 7.9 KB
cachetlb.txt -rw-r--r-- 17.1 KB
circular-buffers.txt -rw-r--r-- 8.4 KB
clk.txt -rw-r--r-- 10.3 KB
coccinelle.txt -rw-r--r-- 9.0 KB
cpu-hotplug.txt -rw-r--r-- 16.8 KB
cpu-load.txt -rw-r--r-- 3.0 KB
cputopology.txt -rw-r--r-- 4.5 KB
crc32.txt -rw-r--r-- 8.5 KB
dcdbas.txt -rw-r--r-- 3.6 KB
debugging-modules.txt -rw-r--r-- 954 bytes
debugging-via-ohci1394.txt -rw-r--r-- 7.4 KB
dell_rbu.txt -rw-r--r-- 4.9 KB
devices.txt -rw-r--r-- 116.2 KB
digsig.txt -rw-r--r-- 2.8 KB
dma-buf-sharing.txt -rw-r--r-- 20.9 KB
dontdiff -rw-r--r-- 2.5 KB
dynamic-debug-howto.txt -rw-r--r-- 12.6 KB
edac.txt -rw-r--r-- 23.0 KB
efi-stub.txt -rw-r--r-- 3.2 KB
eisa.txt -rw-r--r-- 7.1 KB
email-clients.txt -rw-r--r-- 9.6 KB
flexible-arrays.txt -rw-r--r-- 5.5 KB
futex-requeue-pi.txt -rw-r--r-- 5.0 KB
gcov.txt -rw-r--r-- 7.6 KB
gdb-kernel-debugging.txt -rw-r--r-- 5.9 KB
highuid.txt -rw-r--r-- 2.4 KB
hsi.txt -rw-r--r-- 2.9 KB
hw_random.txt -rw-r--r-- 3.5 KB
hwspinlock.txt -rw-r--r-- 12.7 KB
init.txt -rw-r--r-- 2.5 KB
initrd.txt -rw-r--r-- 14.1 KB
intel_txt.txt -rw-r--r-- 10.2 KB
io-mapping.txt -rw-r--r-- 3.2 KB
io_ordering.txt -rw-r--r-- 1.9 KB
iostats.txt -rw-r--r-- 8.0 KB
irqflags-tracing.txt -rw-r--r-- 2.3 KB
isapnp.txt -rw-r--r-- 433 bytes
java.txt -rw-r--r-- 10.9 KB
kasan.txt -rw-r--r-- 8.1 KB
kernel-doc-nano-HOWTO.txt -rw-r--r-- 11.7 KB
kernel-docs.txt -rw-r--r-- 33.1 KB
kernel-parameters.txt -rw-r--r-- 144.6 KB
kernel-per-CPU-kthreads.txt -rw-r--r-- 13.2 KB
kmemcheck.txt -rw-r--r-- 29.9 KB
kmemleak.txt -rw-r--r-- 8.5 KB
kobject.txt -rw-r--r-- 18.0 KB
kprobes.txt -rw-r--r-- 30.3 KB
kref.txt -rw-r--r-- 8.4 KB
kselftest.txt -rw-r--r-- 2.0 KB
ldm.txt -rw-r--r-- 3.8 KB
local_ops.txt -rw-r--r-- 6.5 KB
lockup-watchdogs.txt -rw-r--r-- 4.1 KB
logo.gif -rw-r--r-- 16.0 KB
logo.txt -rw-r--r-- 563 bytes
lzo.txt -rw-r--r-- 7.8 KB
magic-number.txt -rw-r--r-- 8.7 KB
mailbox.txt -rw-r--r-- 4.1 KB
md-cluster.txt -rw-r--r-- 7.0 KB
md.txt -rw-r--r-- 25.3 KB
media-framework.txt -rw-r--r-- 14.7 KB
memory-barriers.txt -rw-r--r-- 108.5 KB
memory-hotplug.txt -rw-r--r-- 17.1 KB
men-chameleon-bus.txt -rw-r--r-- 6.1 KB
module-signing.txt -rw-r--r-- 10.3 KB
mono.txt -rw-r--r-- 2.5 KB
nommu-mmap.txt -rw-r--r-- 12.7 KB
ntb.txt -rw-r--r-- 6.3 KB
numastat.txt -rw-r--r-- 836 bytes
oops-tracing.txt -rw-r--r-- 12.8 KB
padata.txt -rw-r--r-- 7.3 KB
parport-lowlevel.txt -rw-r--r-- 32.2 KB
parport.txt -rw-r--r-- 8.8 KB
percpu-rw-semaphore.txt -rw-r--r-- 1.1 KB
phy.txt -rw-r--r-- 6.9 KB
pi-futex.txt -rw-r--r-- 5.7 KB
pinctrl.txt -rw-r--r-- 50.4 KB
pnp.txt -rw-r--r-- 6.8 KB
preempt-locking.txt -rw-r--r-- 5.2 KB
printk-formats.txt -rw-r--r-- 9.0 KB
pwm.txt -rw-r--r-- 4.7 KB
ramoops.txt -rw-r--r-- 5.2 KB
rbtree.txt -rw-r--r-- 13.3 KB
remoteproc.txt -rw-r--r-- 12.7 KB
rfkill.txt -rw-r--r-- 4.8 KB
robust-futex-ABI.txt -rw-r--r-- 8.7 KB
robust-futexes.txt -rw-r--r-- 9.4 KB
rpmsg.txt -rw-r--r-- 13.5 KB
rtc.txt -rw-r--r-- 9.9 KB
serial-console.txt -rw-r--r-- 4.0 KB
sgi-ioc4.txt -rw-r--r-- 2.0 KB
smsc_ece1099.txt -rw-r--r-- 2.4 KB
sparse.txt -rw-r--r-- 3.8 KB
stable_api_nonsense.txt -rw-r--r-- 9.2 KB
stable_kernel_rules.txt -rw-r--r-- 6.0 KB
static-keys.txt -rw-r--r-- 11.5 KB
svga.txt -rw-r--r-- 14.1 KB
sysfs-rules.txt -rw-r--r-- 9.0 KB
sysrq.txt -rw-r--r-- 11.8 KB
this_cpu_ops.txt -rw-r--r-- 11.1 KB
unaligned-memory-access.txt -rw-r--r-- 10.4 KB
unicode.txt -rw-r--r-- 6.5 KB
unshare.txt -rw-r--r-- 13.1 KB
vfio.txt -rw-r--r-- 21.3 KB
vgaarbiter.txt -rw-r--r-- 8.1 KB
video-output.txt -rw-r--r-- 1.1 KB
vme_api.txt -rw-r--r-- 13.4 KB
volatile-considered-harmful.txt -rw-r--r-- 5.6 KB
workqueue.txt -rw-r--r-- 14.7 KB
xillybus.txt -rw-r--r-- 17.7 KB
xz.txt -rw-r--r-- 5.7 KB
zorro.txt -rw-r--r-- 2.9 KB

back to top