https://github.com/torvalds/linux
Revision 65550098c1c4db528400c73acf3e46bfa78d9264 authored by David Howells on 28 July 2020, 23:03:56 UTC, committed by David S. Miller on 30 July 2020, 23:50:20 UTC
There's a race between rxrpc_sendmsg setting up a call, but then failing to send anything on it due to an error, and recvmsg() seeing the call completion occur and trying to return the state to the user. An assertion fails in rxrpc_recvmsg() because the call has already been released from the socket and is about to be released again as recvmsg deals with it. (The recvmsg_q queue on the socket holds a ref, so there's no problem with use-after-free.) We also have to be careful not to end up reporting an error twice, in such a way that both returns indicate to userspace that the user ID supplied with the call is no longer in use - which could cause the client to malfunction if it recycles the user ID fast enough. Fix this by the following means: (1) When sendmsg() creates a call after the point that the call has been successfully added to the socket, don't return any errors through sendmsg(), but rather complete the call and let recvmsg() retrieve them. Make sendmsg() return 0 at this point. Further calls to sendmsg() for that call will fail with ESHUTDOWN. Note that at this point, we haven't send any packets yet, so the server doesn't yet know about the call. (2) If sendmsg() returns an error when it was expected to create a new call, it means that the user ID wasn't used. (3) Mark the call disconnected before marking it completed to prevent an oops in rxrpc_release_call(). (4) recvmsg() will then retrieve the error and set MSG_EOR to indicate that the user ID is no longer known by the kernel. An oops like the following is produced: kernel BUG at net/rxrpc/recvmsg.c:605! ... RIP: 0010:rxrpc_recvmsg+0x256/0x5ae ... Call Trace: ? __init_waitqueue_head+0x2f/0x2f ____sys_recvmsg+0x8a/0x148 ? import_iovec+0x69/0x9c ? copy_msghdr_from_user+0x5c/0x86 ___sys_recvmsg+0x72/0xaa ? __fget_files+0x22/0x57 ? __fget_light+0x46/0x51 ? fdget+0x9/0x1b do_recvmmsg+0x15e/0x232 ? _raw_spin_unlock+0xa/0xb ? vtime_delta+0xf/0x25 __x64_sys_recvmmsg+0x2c/0x2f do_syscall_64+0x4c/0x78 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 357f5ef64628 ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()") Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 591eee6
Tip revision: 65550098c1c4db528400c73acf3e46bfa78d9264 authored by David Howells on 28 July 2020, 23:03:56 UTC
rxrpc: Fix race between recvmsg and sendmsg on immediate call failure
rxrpc: Fix race between recvmsg and sendmsg on immediate call failure
Tip revision: 6555009
atomic_bitops.txt
=============
Atomic bitops
=============
While our bitmap_{}() functions are non-atomic, we have a number of operations
operating on single bits in a bitmap that are atomic.
API
---
The single bit operations are:
Non-RMW ops:
test_bit()
RMW atomic operations without return value:
{set,clear,change}_bit()
clear_bit_unlock()
RMW atomic operations with return value:
test_and_{set,clear,change}_bit()
test_and_set_bit_lock()
Barriers:
smp_mb__{before,after}_atomic()
All RMW atomic operations have a '__' prefixed variant which is non-atomic.
SEMANTICS
---------
Non-atomic ops:
In particular __clear_bit_unlock() suffers the same issue as atomic_set(),
which is why the generic version maps to clear_bit_unlock(), see atomic_t.txt.
RMW ops:
The test_and_{}_bit() operations return the original value of the bit.
ORDERING
--------
Like with atomic_t, the rule of thumb is:
- non-RMW operations are unordered;
- RMW operations that have no return value are unordered;
- RMW operations that have a return value are fully ordered.
- RMW operations that are conditional are unordered on FAILURE,
otherwise the above rules apply. In the case of test_and_{}_bit() operations,
if the bit in memory is unchanged by the operation then it is deemed to have
failed.
Except for a successful test_and_set_bit_lock() which has ACQUIRE semantics and
clear_bit_unlock() which has RELEASE semantics.
Since a platform only has a single means of achieving atomic operations
the same barriers as for atomic_t are used, see atomic_t.txt.
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...