https://github.com/facebook/rocksdb
Revision 3ebe8658d06dbb3958036887d8601a5290c04ba1 authored by Giuseppe Ottaviano on 12 October 2021, 07:14:41 UTC, committed by Andrew Kryczka on 14 October 2021, 17:44:46 UTC
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).

This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.

It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.

While at it, do a couple of optimizations:

- In `WBMStallInterface::Signal()` signal the CV only after releasing the
  lock. Signaling under the lock is a common pitfall, as it causes the woken-up
  thread to immediately go back to sleep because the mutex is still locked by
  the awaker.

- Move all allocations and deallocations outside of the lock.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009

Test Plan:
```
USE_CLANG=1 make -j64 all check
```

Reviewed By: akankshamahajan15

Differential Revision: D31550668

Pulled By: ot

fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
1 parent 430fd40
History
Tip revision: 3ebe8658d06dbb3958036887d8601a5290c04ba1 authored by Giuseppe Ottaviano on 12 October 2021, 07:14:41 UTC
Fix race in WriteBufferManager (#9009)
Tip revision: 3ebe865
File Mode Size
.circleci
.github
buckifier
build_tools
cache
cmake
coverage
db
db_stress_tool
docs
env
examples
file
fuzz
hdfs
include
java
logging
memory
memtable
microbench
monitoring
options
plugin
port
table
test_util
third-party
tools
trace_replay
util
utilities
.clang-format -rw-r--r-- 138 bytes
.gitignore -rw-r--r-- 1.0 KB
.lgtm.yml -rw-r--r-- 67 bytes
.travis.yml -rw-r--r-- 8.1 KB
.watchmanconfig -rw-r--r-- 130 bytes
AUTHORS -rw-r--r-- 322 bytes
CMakeLists.txt -rw-r--r-- 49.3 KB
CODE_OF_CONDUCT.md -rw-r--r-- 3.3 KB
CONTRIBUTING.md -rw-r--r-- 706 bytes
COPYING -rw-r--r-- 17.7 KB
DEFAULT_OPTIONS_HISTORY.md -rw-r--r-- 1.5 KB
DUMP_FORMAT.md -rw-r--r-- 763 bytes
HISTORY.md -rw-r--r-- 185.8 KB
INSTALL.md -rw-r--r-- 7.8 KB
LANGUAGE-BINDINGS.md -rw-r--r-- 1.2 KB
LICENSE.Apache -rw-r--r-- 11.1 KB
LICENSE.leveldb -rw-r--r-- 1.5 KB
Makefile -rw-r--r-- 89.3 KB
PLUGINS.md -rw-r--r-- 320 bytes
README.md -rw-r--r-- 2.0 KB
ROCKSDB_LITE.md -rw-r--r-- 1.0 KB
TARGETS -rw-r--r-- 58.6 KB
USERS.md -rw-r--r-- 8.1 KB
Vagrantfile -rw-r--r-- 1017 bytes
WINDOWS_PORT.md -rw-r--r-- 12.5 KB
appveyor.yml -rw-r--r-- 3.3 KB
defs.bzl -rw-r--r-- 1.9 KB
issue_template.md -rw-r--r-- 294 bytes
src.mk -rw-r--r-- 42.8 KB
thirdparty.inc -rw-r--r-- 7.8 KB

README.md

back to top