https://github.com/torvalds/linux
Revision da314c9923fed553a007785a901fd395b7eb6c19 authored by Herbert Xu on 22 September 2015, 03:38:56 UTC, committed by David S. Miller on 24 September 2015, 19:07:08 UTC
On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote:
>
> store_release and load_acquire are different from the usual memory
> barriers and can't be paired this way.  You have to pair store_release
> and load_acquire.  Besides, it isn't a particularly good idea to

OK I've decided to drop the acquire/release helpers as they don't
help us at all and simply pessimises the code by using full memory
barriers (on some architectures) where only a write or read barrier
is needed.

> depend on memory barriers embedded in other data structures like the
> above.  Here, especially, rhashtable_insert() would have write barrier
> *before* the entry is hashed not necessarily *after*, which means that
> in the above case, a socket which appears to have set bound to a
> reader might not visible when the reader tries to look up the socket
> on the hashtable.

But you are right we do need an explicit write barrier here to
ensure that the hashing is visible.

> There's no reason to be overly smart here.  This isn't a crazy hot
> path, write barriers tend to be very cheap, store_release more so.
> Please just do smp_store_release() and note what it's paired with.

It's not about being overly smart.  It's about actually understanding
what's going on with the code.  I've seen too many instances of
people simply sprinkling synchronisation primitives around without
any knowledge of what is happening underneath, which is just a recipe
for creating hard-to-debug races.

> > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
> >  		}
> >  	}
> >
> > -	if (!nlk->portid) {
> > +	if (!nlk->bound) {
>
> I don't think you can skip load_acquire here just because this is the
> second deref of the variable.  That doesn't change anything.  Race
> condition could still happen between the first and second tests and
> skipping the second would lead to the same kind of bug.

The reason this one is OK is because we do not use nlk->portid or
try to get nlk from the hash table before we return to user-space.

However, there is a real bug here that none of these acquire/release
helpers discovered.  The two bound tests here used to be a single
one.  Now that they are separate it is entirely possible for another
thread to come in the middle and bind the socket.  So we need to
repeat the portid check in order to maintain consistency.

> > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
> >  	    !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
> >  		return -EPERM;
> >
> > -	if (!nlk->portid)
> > +	if (!nlk->bound)
>
> Don't we need load_acquire here too?  Is this path holding a lock
> which makes that unnecessary?

Ditto.

---8<---
The commit 1f770c0a09da855a2b51af6d19de97fb955eca85 ("netlink:
Fix autobind race condition that leads to zero port ID") created
some new races that can occur due to inconcsistencies between the
two port IDs.

Tejun is right that a barrier is unavoidable.  Therefore I am
reverting to the original patch that used a boolean to indicate
that a user netlink socket has been bound.

Barriers have been added where necessary to ensure that a valid
portid and the hashed socket is visible.

I have also changed netlink_insert to only return EBUSY if the
socket is bound to a portid different to the requested one.  This
combined with only reading nlk->bound once in netlink_bind fixes
a race where two threads that bind the socket at the same time
with different port IDs may both succeed.

Fixes: 1f770c0a09da ("netlink: Fix autobind race condition that leads to zero port ID")
Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Nacked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 7bbe33f
History
Tip revision: da314c9923fed553a007785a901fd395b7eb6c19 authored by Herbert Xu on 22 September 2015, 03:38:56 UTC
netlink: Replace rhash_portid with bound
Tip revision: da314c9
File Mode Size
9p
adfs
affs
afs
autofs4
befs
bfs
btrfs
cachefiles
ceph
cifs
coda
configfs
cramfs
debugfs
devpts
dlm
ecryptfs
efivarfs
efs
exofs
exportfs
ext2
ext4
f2fs
fat
freevxfs
fscache
fuse
gfs2
hfs
hfsplus
hostfs
hpfs
hugetlbfs
isofs
jbd2
jffs2
jfs
kernfs
lockd
logfs
minix
ncpfs
nfs
nfs_common
nfsd
nilfs2
nls
notify
ntfs
ocfs2
omfs
openpromfs
overlayfs
proc
pstore
qnx4
qnx6
quota
ramfs
reiserfs
romfs
squashfs
sysfs
sysv
tracefs
ubifs
udf
ufs
xfs
Kconfig -rw-r--r-- 6.4 KB
Kconfig.binfmt -rw-r--r-- 7.0 KB
Makefile -rw-r--r-- 4.1 KB
aio.c -rw-r--r-- 43.0 KB
anon_inodes.c -rw-r--r-- 4.9 KB
attr.c -rw-r--r-- 7.9 KB
bad_inode.c -rw-r--r-- 4.7 KB
binfmt_aout.c -rw-r--r-- 10.8 KB
binfmt_elf.c -rw-r--r-- 60.4 KB
binfmt_elf_fdpic.c -rw-r--r-- 46.9 KB
binfmt_em86.c -rw-r--r-- 2.8 KB
binfmt_flat.c -rw-r--r-- 26.4 KB
binfmt_misc.c -rw-r--r-- 17.5 KB
binfmt_script.c -rw-r--r-- 3.0 KB
block_dev.c -rw-r--r-- 45.5 KB
buffer.c -rw-r--r-- 89.4 KB
char_dev.c -rw-r--r-- 13.3 KB
compat.c -rw-r--r-- 37.2 KB
compat_binfmt_elf.c -rw-r--r-- 3.7 KB
compat_ioctl.c -rw-r--r-- 45.5 KB
coredump.c -rw-r--r-- 19.2 KB
dax.c -rw-r--r-- 21.6 KB
dcache.c -rw-r--r-- 89.4 KB
dcookies.c -rw-r--r-- 6.9 KB
direct-io.c -rw-r--r-- 37.7 KB
drop_caches.c -rw-r--r-- 1.6 KB
eventfd.c -rw-r--r-- 11.2 KB
eventpoll.c -rw-r--r-- 59.0 KB
exec.c -rw-r--r-- 40.7 KB
fcntl.c -rw-r--r-- 16.6 KB
fhandle.c -rw-r--r-- 6.5 KB
file.c -rw-r--r-- 22.4 KB
file_table.c -rw-r--r-- 8.5 KB
filesystems.c -rw-r--r-- 6.4 KB
fs-writeback.c -rw-r--r-- 66.4 KB
fs_pin.c -rw-r--r-- 2.0 KB
fs_struct.c -rw-r--r-- 3.3 KB
inode.c -rw-r--r-- 52.8 KB
internal.h -rw-r--r-- 3.6 KB
ioctl.c -rw-r--r-- 15.7 KB
libfs.c -rw-r--r-- 30.4 KB
locks.c -rw-r--r-- 69.6 KB
mbcache.c -rw-r--r-- 24.1 KB
mount.h -rw-r--r-- 3.5 KB
mpage.c -rw-r--r-- 20.0 KB
namei.c -rw-r--r-- 114.6 KB
namespace.c -rw-r--r-- 81.5 KB
no-block.c -rw-r--r-- 688 bytes
nsfs.c -rw-r--r-- 3.7 KB
open.c -rw-r--r-- 26.9 KB
pipe.c -rw-r--r-- 25.0 KB
pnode.c -rw-r--r-- 11.2 KB
pnode.h -rw-r--r-- 1.8 KB
posix_acl.c -rw-r--r-- 19.9 KB
proc_namespace.c -rw-r--r-- 7.7 KB
read_write.c -rw-r--r-- 28.9 KB
readdir.c -rw-r--r-- 6.9 KB
select.c -rw-r--r-- 25.4 KB
seq_file.c -rw-r--r-- 22.6 KB
signalfd.c -rw-r--r-- 9.2 KB
splice.c -rw-r--r-- 46.2 KB
stack.c -rw-r--r-- 2.5 KB
stat.c -rw-r--r-- 12.0 KB
statfs.c -rw-r--r-- 5.3 KB
super.c -rw-r--r-- 35.0 KB
sync.c -rw-r--r-- 9.7 KB
timerfd.c -rw-r--r-- 13.0 KB
userfaultfd.c -rw-r--r-- 34.9 KB
utimes.c -rw-r--r-- 5.9 KB
xattr.c -rw-r--r-- 22.7 KB

back to top