https://github.com/facebook/rocksdb
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
6a43615 | anand76 | 06 April 2023, 16:38:52 UTC | Update HISTORY.md for 8.1.1 | 06 April 2023, 16:38:52 UTC |
af72046 | anand76 | 05 April 2023, 23:22:08 UTC | Ensure VerifyFileChecksums reads don't exceed readahead_size (#11328) Summary: VerifyFileChecksums currently interprets the readahead_size as a payload of readahead_size for calculating the checksum, plus a prefetch of an additional readahead_size. Hence each read is readahead_size * 2. This change treats it as chunks of readahead_size for checksum calculation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11328 Test Plan: Add a unit test Reviewed By: pdillinger Differential Revision: D44718781 Pulled By: anand1976 fbshipit-source-id: 79bae1ebaa27de2a13bc86f5910bf09356936e63 | 06 April 2023, 02:01:09 UTC |
501f359 | anand76 | 30 March 2023, 16:56:37 UTC | Remove platform009 and default to platform010 (#11333) Summary: Platform009 is no longer supported in fbcode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11333 Reviewed By: pdillinger, ltamasi Differential Revision: D44486431 Pulled By: anand1976 fbshipit-source-id: 99e19a70ebbb04ae750d39c33a110518bb25487e | 06 April 2023, 00:07:57 UTC |
fc1df4c | Hui Xiao | 05 April 2023, 22:09:03 UTC | Update version.h for 8.1.1 | 05 April 2023, 22:09:03 UTC |
3b6ac67 | Hui Xiao | 05 April 2023, 21:42:31 UTC | Fix initialization-order-fiasco in write_stall_stats.cc (#11355) Summary: **Context/Summary:** As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11355 Test Plan: - Ran previously failed tests and they succeed - Perf `./db_bench -seed=1679014417652004 -db=/dev/shm/testdb/ -statistics=false -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=100000 -db_write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3` Reviewed By: ajkr Differential Revision: D44719333 Pulled By: hx235 fbshipit-source-id: 23d22f314144071d97f7106ff1241c31c0bdf08b | 05 April 2023, 22:03:58 UTC |
d3dbed0 | Levi Tamasi | 22 March 2023, 20:33:50 UTC | Backport an internal change to regression_build_test.sh (#11319) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11319 Reviewed By: cbi42 Differential Revision: D44308743 Pulled By: ltamasi fbshipit-source-id: ffd054e9f4162797cfe1ef78240ad2501f78bbbd | 22 March 2023, 20:44:35 UTC |
f8741f1 | Levi Tamasi | 18 March 2023, 17:49:48 UTC | Update version in HISTORY.md | 18 March 2023, 17:49:48 UTC |
cb58477 | Hui Xiao | 18 March 2023, 16:51:58 UTC | New stat rocksdb.{cf|db}-write-stall-stats exposed in a structural way (#11300) Summary: **Context/Summary:** Users are interested in figuring out what has caused write stall. - Refactor write stall related stats from property `kCFStats` into its own db property `rocksdb.cf-write-stall-stats` as a map or string. For now, this only contains count of different combination of (CF-scope `WriteStallCause`) + (`WriteStallCondition`) - Add new `WriteStallCause::kWriteBufferManagerLimit` to reflect write stall caused by write buffer manager - Add new `rocksdb.db-write-stall-stats`. For now, this only contains `WriteStallCause::kWriteBufferManagerLimit` + `WriteStallCondition::kStopped` - Expose functions in new class `WriteStallStatsMapKeys` for examining the above two properties returned as map - Misc: rename/comment some write stall InternalStats for clarity Pull Request resolved: https://github.com/facebook/rocksdb/pull/11300 Test Plan: - New UT - Stress test `python3 tools/db_crashtest.py blackbox --simple --get_property_one_in=1` - Perf test: Both converge very slowly at similar rates but post-change has higher average ops/sec than pre-change even though they are run at the same time. ``` ./db_bench -seed=1679014417652004 -db=/dev/shm/testdb/ -statistics=false -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=100000 -db_write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 ``` pre-change: ``` fillseq [AVG 15 runs] : 1176 (± 732) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1052.671 micros/op 949 ops/sec 105.267 seconds 100000 operations; 0.5 MB/s fillseq [AVG 16 runs] : 1162 (± 685) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1387.330 micros/op 720 ops/sec 138.733 seconds 100000 operations; 0.4 MB/s fillseq [AVG 17 runs] : 1136 (± 646) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1232.011 micros/op 811 ops/sec 123.201 seconds 100000 operations; 0.4 MB/s fillseq [AVG 18 runs] : 1118 (± 610) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1282.567 micros/op 779 ops/sec 128.257 seconds 100000 operations; 0.4 MB/s fillseq [AVG 19 runs] : 1100 (± 578) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1914.336 micros/op 522 ops/sec 191.434 seconds 100000 operations; 0.3 MB/s fillseq [AVG 20 runs] : 1071 (± 551) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1227.510 micros/op 814 ops/sec 122.751 seconds 100000 operations; 0.4 MB/s fillseq [AVG 21 runs] : 1059 (± 525) ops/sec; 0.5 (± 0.3) MB/sec ``` post-change: ``` fillseq [AVG 15 runs] : 1226 (± 732) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1323.825 micros/op 755 ops/sec 132.383 seconds 100000 operations; 0.4 MB/s fillseq [AVG 16 runs] : 1196 (± 687) ops/sec; 0.6 (± 0.4) MB/sec fillseq : 1223.905 micros/op 817 ops/sec 122.391 seconds 100000 operations; 0.4 MB/s fillseq [AVG 17 runs] : 1174 (± 647) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1168.996 micros/op 855 ops/sec 116.900 seconds 100000 operations; 0.4 MB/s fillseq [AVG 18 runs] : 1156 (± 611) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1348.729 micros/op 741 ops/sec 134.873 seconds 100000 operations; 0.4 MB/s fillseq [AVG 19 runs] : 1134 (± 579) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1196.887 micros/op 835 ops/sec 119.689 seconds 100000 operations; 0.4 MB/s fillseq [AVG 20 runs] : 1119 (± 550) ops/sec; 0.6 (± 0.3) MB/sec fillseq : 1193.697 micros/op 837 ops/sec 119.370 seconds 100000 operations; 0.4 MB/s fillseq [AVG 21 runs] : 1106 (± 524) ops/sec; 0.6 (± 0.3) MB/sec ``` Reviewed By: ajkr Differential Revision: D44159541 Pulled By: hx235 fbshipit-source-id: 8d29efb70001fdc52d34535eeb3364fc3e71e40b | 18 March 2023, 16:51:58 UTC |
204fcff | Peter Dillinger | 18 March 2023, 03:23:49 UTC | HyperClockCache support for SecondaryCache, with refactoring (#11301) Summary: Internally refactors SecondaryCache integration out of LRUCache specifically and into a wrapper/adapter class that works with various Cache implementations. Notably, this relies on separating the notion of async lookup handles from other cache handles, so that HyperClockCache doesn't have to deal with the problem of allocating handles from the hash table for lookups that might fail anyway, and might be on the same key without support for coalescing. (LRUCache's hash table can incorporate previously allocated handles thanks to its pointer indirection.) Specifically, I'm worried about the case in which hundreds of threads try to access the same block and probing in the hash table degrades to linear search on the pile of entries with the same key. This change is a big step in the direction of supporting stacked SecondaryCaches, but there are obstacles to completing that. Especially, there is no SecondaryCache hook for evictions to pass from one to the next. It has been proposed that evictions be transmitted simply as the persisted data (as in SaveToCallback), but given the current structure provided by the CacheItemHelpers, that would require an extra copy of the block data, because there's intentionally no way to ask for a contiguous Slice of the data (to allow for flexibility in storage). `AsyncLookupHandle` and the re-worked `WaitAll()` should be essentially prepared for stacked SecondaryCaches, but several "TODO with stacked secondaries" issues remain in various places. It could be argued that the stacking instead be done as a SecondaryCache adapter that wraps two (or more) SecondaryCaches, but at least with the current API that would require an extra heap allocation on SecondaryCache Lookup for a wrapper SecondaryCacheResultHandle that can transfer a Lookup between secondaries. We could also consider trying to unify the Cache and SecondaryCache APIs, though that might be difficult if `AsyncLookupHandle` is kept a fixed struct. ## cache.h (public API) Moves `secondary_cache` option from LRUCacheOptions to ShardedCacheOptions so that it is applicable to HyperClockCache. ## advanced_cache.h (advanced public API) * Add `Cache::CreateStandalone()` so that the SecondaryCache support wrapper can use it. * Add `SetEvictionCallback()` / `eviction_callback_` so that the SecondaryCache support wrapper can use it. Only a single callback is supported for efficiency. If there is ever a need for more than one, hopefully that can be handled with a broadcast callback wrapper. These are essentially the two "extra" pieces of `Cache` for pulling out specific SecondaryCache support from the `Cache` implementation. I think it's a good trade-off as these are reasonable, limited, and reusable "cut points" into the `Cache` implementations. * Remove async capability from standard `Lookup()` (getting rid of awkward restrictions on pending Handles) and add `AsyncLookupHandle` and `StartAsyncLookup()`. As noted in the comments, the full struct of `AsyncLookupHandle` is exposed so that it can be stack allocated, for efficiency, though more data is being copied around than before, which could impact performance. (Lookup info -> AsyncLookupHandle -> Handle vs. Lookup info -> Handle) I could foresee a future in which a Cache internally saves a pointer to the AsyncLookupHandle, which means it's dangerous to allow it to be copyable or even movable. It also means it's not compatible with std::vector (which I don't like requiring as an API parameter anyway), so `WaitAll()` expects any contiguous array of AsyncLookupHandles. I believe this is best for common case efficiency, while behaving well in other cases also. For example, `WaitAll()` has no effect on default-constructed AsyncLookupHandles, which look like a completed cache miss. ## cacheable_entry.h A couple of functions are obsolete because Cache::Handle can no longer be pending. ## cache.cc Provides default implementations for new or revamped Cache functions, especially appropriate for non-blocking caches. ## secondary_cache_adapter.{h,cc} The full details of the Cache wrapper adding SecondaryCache support. Essentially replicates the SecondaryCache handling that was in LRUCache, but obviously refactored. There is a bit of logic duplication, where Lookup() is essentially a manually optimized version of StartAsyncLookup() and Wait(), but it's roughly a dozen lines of code. ## sharded_cache.h, typed_cache.h, charged_cache.{h,cc}, sim_cache.cc Simply updated for Cache API changes. ## lru_cache.{h,cc} Carefully remove SecondaryCache logic, implement `CreateStandalone` and eviction handler functionality. ## clock_cache.{h,cc} Expose existing `CreateStandalone` functionality, add eviction handler functionality. Light refactoring. ## block_based_table_reader* Mostly re-worked the only usage of async Lookup, which is in BlockBasedTable::MultiGet. Used arrays in place of autovector in some places for efficiency. Simplified some logic by not trying to process some cache results before they're all ready. Created new function `BlockBasedTable::GetCachePriority()` to reduce some pre-existing code duplication (and avoid making it worse). Fixed at least one small bug from the prior confusing mixture of async and sync Lookups. In MaybeReadBlockAndLoadToCache(), called by RetrieveBlock(), called by MultiGet() with wait=false, is_cache_hit for the block_cache_tracer entry would not be set to true if the handle was pending after Lookup and before Wait. ## Intended follow-up work * Figure out if there are any missing stats or block_cache_tracer work in refactored BlockBasedTable::MultiGet * Stacked secondary caches (see above discussion) * See if we can make up for the small MultiGet performance regression. * Study more performance with SecondaryCache * Items evicted from over-full LRUCache in Release were not being demoted to SecondaryCache, and still aren't to minimize unit test churn. Ideally they would be demoted, but it's an exceptional case so not a big deal. * Use CreateStandalone for cache reservations (save unnecessary hash table operations). Not a big deal, but worthy cleanup. * Somehow I got the contract for SecondaryCache::Insert wrong in #10945. (Doesn't take ownership!) That API comment needs to be fixed, but didn't want to mingle that in here. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11301 Test Plan: ## Unit tests Generally updated to include HCC in SecondaryCache tests, though HyperClockCache has some different, less strict behaviors that leads to some tests not really being set up to work with it. Some of the tests remain disabled with it, but I think we have good coverage without them. ## Crash/stress test Updated to use the new combination. ## Performance First, let's check for regression on caches without secondary cache configured. Adding support for the eviction callback is likely to have a tiny effect, but it shouldn't be worrisome. LRUCache could benefit slightly from less logic around SecondaryCache handling. We can test with cache_bench default settings, built with DEBUG_LEVEL=0 and PORTABLE=0. ``` (while :; do base/cache_bench --cache_type=hyper_clock_cache | grep Rough; done) | awk '{ sum += $9; count++; print $0; print "Average: " int(sum / count) }' ``` **Before** this and #11299 (which could also have a small effect), running for about an hour, before & after running concurrently for each cache type: HyperClockCache: 3168662 (average parallel ops/sec) LRUCache: 2940127 **After** this and #11299, running for about an hour: HyperClockCache: 3164862 (average parallel ops/sec) (0.12% slower) LRUCache: 2940928 (0.03% faster) This is an acceptable difference IMHO. Next, let's consider essentially the worst case of new CPU overhead affecting overall performance. MultiGet uses the async lookup interface regardless of whether SecondaryCache or folly are used. We can configure a benchmark where all block cache queries are for data blocks, and all are hits. Create DB and test (before and after tests running simultaneously): ``` TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=30000000 -disable_wal=1 -bloom_bits=16 TEST_TMPDIR=/dev/shm base/db_bench -benchmarks=multireadrandom[-X30] -readonly -multiread_batched -batch_size=32 -num=30000000 -bloom_bits=16 -cache_size=6789000000 -duration 20 -threads=16 ``` **Before**: multireadrandom [AVG 30 runs] : 3444202 (± 57049) ops/sec; 240.9 (± 4.0) MB/sec multireadrandom [MEDIAN 30 runs] : 3514443 ops/sec; 245.8 MB/sec **After**: multireadrandom [AVG 30 runs] : 3291022 (± 58851) ops/sec; 230.2 (± 4.1) MB/sec multireadrandom [MEDIAN 30 runs] : 3366179 ops/sec; 235.4 MB/sec So that's roughly a 3% regression, on kind of a *worst case* test of MultiGet CPU. Similar story with HyperClockCache: **Before**: multireadrandom [AVG 30 runs] : 3933777 (± 41840) ops/sec; 275.1 (± 2.9) MB/sec multireadrandom [MEDIAN 30 runs] : 3970667 ops/sec; 277.7 MB/sec **After**: multireadrandom [AVG 30 runs] : 3755338 (± 30391) ops/sec; 262.6 (± 2.1) MB/sec multireadrandom [MEDIAN 30 runs] : 3785696 ops/sec; 264.8 MB/sec Roughly a 4-5% regression. Not ideal, but not the whole story, fortunately. Let's also look at Get() in db_bench: ``` TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=readrandom[-X30] -readonly -num=30000000 -bloom_bits=16 -cache_size=6789000000 -duration 20 -threads=16 ``` **Before**: readrandom [AVG 30 runs] : 2198685 (± 13412) ops/sec; 153.8 (± 0.9) MB/sec readrandom [MEDIAN 30 runs] : 2209498 ops/sec; 154.5 MB/sec **After**: readrandom [AVG 30 runs] : 2292814 (± 43508) ops/sec; 160.3 (± 3.0) MB/sec readrandom [MEDIAN 30 runs] : 2365181 ops/sec; 165.4 MB/sec That's showing roughly a 4% improvement, perhaps because of the secondary cache code that is no longer part of LRUCache. But weirdly, HyperClockCache is also showing 2-3% improvement: **Before**: readrandom [AVG 30 runs] : 2272333 (± 9992) ops/sec; 158.9 (± 0.7) MB/sec readrandom [MEDIAN 30 runs] : 2273239 ops/sec; 159.0 MB/sec **After**: readrandom [AVG 30 runs] : 2332407 (± 11252) ops/sec; 163.1 (± 0.8) MB/sec readrandom [MEDIAN 30 runs] : 2335329 ops/sec; 163.3 MB/sec Reviewed By: ltamasi Differential Revision: D44177044 Pulled By: pdillinger fbshipit-source-id: e808e48ff3fe2f792a79841ba617be98e48689f5 | 18 March 2023, 03:23:49 UTC |
eac6b6d | anand76 | 17 March 2023, 21:57:09 UTC | Ignore async_io ReadOption if FileSystem doesn't support it (#11296) Summary: In PosixFileSystem, IO uring support is opt-in. If the support is not enabled by the user, then ignore the async_io ReadOption in MultiGet and iteration at the top, rather than follow the async_io codepath and transparently switch to sync IO at the FileSystem layer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11296 Test Plan: Add new unit tests Reviewed By: akankshamahajan15 Differential Revision: D44045776 Pulled By: anand1976 fbshipit-source-id: a0881bf763ca2fde50b84063d0068bb521edd8b9 | 17 March 2023, 21:57:09 UTC |
a72d55c | Levi Tamasi | 17 March 2023, 21:47:29 UTC | Increase the stress test coverage of GetEntity (#11303) Summary: The `GetEntity` API is currently used in the stress tests for verification purposes; this patch extends the coverage by adding a mode where all point lookups in the non-batched, batched, and CF consistency stress tests are done using this API. The PR also includes a bit of refactoring to eliminate some boilerplate code around the wide-column consistency checks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11303 Test Plan: Ran stress tests of the batched, non-batched, and CF consistency varieties. Reviewed By: akankshamahajan15 Differential Revision: D44148503 Pulled By: ltamasi fbshipit-source-id: fecdbfd3e65a459bbf16ab7aa7b9173e19240077 | 17 March 2023, 21:47:29 UTC |
291300e | hackingthekernel | 16 March 2023, 23:57:03 UTC | add c-api for allowing FIFO compaction (#11156) Summary: Addressing issue https://github.com/facebook/rocksdb/issues/11079 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11156 Reviewed By: pdillinger Differential Revision: D42869964 Pulled By: ajkr fbshipit-source-id: 58214901d4e072c568d4c5cf0944a0b1c60de897 | 16 March 2023, 23:57:03 UTC |
ccaa322 | Peter Dillinger | 16 March 2023, 00:51:44 UTC | Simplify tracking entries already in SecondaryCache (#11299) Summary: In preparation for factoring secondary cache support out of individual Cache implementations, we can get rid of the "in secondary cache" flag on entries through a workable hack: when an entry is promoted from secondary, it is inserted in primary using a helper that lacks secondary cache support, thus preventing re-insertion into secondary cache through existing logic. This adds to the complexity of building CacheItemHelpers, because you always have to be able to get to an equivalent helper without secondary cache support, but that complexity is reasonably isolated within RocksDB typed_cache.h and test code. gcc-7 seems to have problems with constexpr constructor referencing `this` so removed constexpr support on CacheItemHelper. Also refactored some related test code to share common code / functionality. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11299 Test Plan: existing tests Reviewed By: anand1976 Differential Revision: D44101453 Pulled By: pdillinger fbshipit-source-id: 7a59d0a3938ee40159c90c3e65d7004f6a272345 | 16 March 2023, 00:51:44 UTC |
664dabd | nccx | 15 March 2023, 22:29:28 UTC | Add Microsoft Bing as a user (#11270) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11270 Reviewed By: pdillinger Differential Revision: D43811584 Pulled By: ajkr fbshipit-source-id: f27e55395644a469840785685646456f6b1452fc | 15 March 2023, 22:29:28 UTC |
bab5f9a | Hui Xiao | 15 March 2023, 21:02:43 UTC | Add new stat rocksdb.table.open.prefetch.tail.read.bytes, rocksdb.table.open.prefetch.tail.{miss|hit} (#11265) Summary: **Context/Summary:** We are adding new stats to measure behavior of prefetched tail size and look up into this buffer The stat collection is done in FilePrefetchBuffer but only for prefetched tail buffer during table open for now using FilePrefetchBuffer enum. It's cleaner than the alternative of implementing in upper-level call places of FilePrefetchBuffer for table open. It also has the benefit of extensible to other types of FilePrefetchBuffer if needed. See db bench for perf regression concern. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11265 Test Plan: **- Piggyback on existing test** **- rocksdb.table.open.prefetch.tail.miss is harder to UT so I manually set prefetch tail read bytes to be small and run db bench.** ``` ./db_bench -db=/tmp/testdb -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 -use_direct_reads=true ``` ``` rocksdb.table.open.prefetch.tail.read.bytes P50 : 4096.000000 P95 : 4096.000000 P99 : 4096.000000 P100 : 4096.000000 COUNT : 225 SUM : 921600 rocksdb.table.open.prefetch.tail.miss COUNT : 91 rocksdb.table.open.prefetch.tail.hit COUNT : 1034 ``` **- No perf regression observed in db_bench** SETUP command: create same db with ~900 files for pre-change/post-change. ``` ./db_bench -db=/tmp/testdb -benchmarks="fillseq" -key_size=32 -value_size=512 -num=500000 -write_buffer_size=655360 -disable_auto_compactions=true -target_file_size_base=16777216 -compression_type=none ``` TEST command 60 runs or til convergence: as suggested by anand1976 and akankshamahajan15, vary `seek_nexts` and `async_io` in testing. ``` ./db_bench -use_existing_db=true -db=/tmp/testdb -statistics=false -cache_size=0 -cache_index_and_filter_blocks=false -benchmarks=seekrandom[-X60] -num=50000 -seek_nexts={10, 500, 1000} -async_io={0|1} -use_direct_reads=true ``` async io = 0, direct io read = true | seek_nexts = 10, 30 runs | seek_nexts = 500, 12 runs | seek_nexts = 1000, 6 runs -- | -- | -- | -- pre-post change | 4776 (± 28) ops/sec; 24.8 (± 0.1) MB/sec | 288 (± 1) ops/sec; 74.8 (± 0.4) MB/sec | 145 (± 4) ops/sec; 75.6 (± 2.2) MB/sec post-change | 4790 (± 32) ops/sec; 24.9 (± 0.2) MB/sec | 288 (± 3) ops/sec; 74.7 (± 0.8) MB/sec | 143 (± 3) ops/sec; 74.5 (± 1.6) MB/sec async io = 1, direct io read = true | seek_nexts = 10, 54 runs | seek_nexts = 500, 6 runs | seek_nexts = 1000, 4 runs -- | -- | -- | -- pre-post change | 3350 (± 36) ops/sec; 17.4 (± 0.2) MB/sec | 264 (± 0) ops/sec; 68.7 (± 0.2) MB/sec | 138 (± 1) ops/sec; 71.8 (± 1.0) MB/sec post-change | 3358 (± 27) ops/sec; 17.4 (± 0.1) MB/sec | 263 (± 2) ops/sec; 68.3 (± 0.8) MB/sec | 139 (± 1) ops/sec; 72.6 (± 0.6) MB/sec Reviewed By: ajkr Differential Revision: D43781467 Pulled By: hx235 fbshipit-source-id: a706a18472a8edb2b952bac3af40eec803537f2a | 15 March 2023, 21:02:43 UTC |
601efe3 | Peter Dillinger | 15 March 2023, 19:08:17 UTC | Misc cleanup of block cache code (#11291) Summary: ... ahead of a larger change. * Rename confusingly named `is_in_sec_cache` to `kept_in_sec_cache` * Unify naming of "standalone" block cache entries (was "detached" in clock_cache) * Remove some unused definitions in clock_cache.h (leftover from a previous revision) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11291 Test Plan: usual tests and CI, no behavior changes Reviewed By: anand1976 Differential Revision: D43984642 Pulled By: pdillinger fbshipit-source-id: b8bf0c5b90a932a88bcbdb413b2f256834aedf97 | 15 March 2023, 19:08:17 UTC |
11cb6af | Hui Xiao | 14 March 2023, 23:53:20 UTC | Fix bug of prematurely excluded CF in atomic flush contains unflushed data that should've been included in the atomic flush (#11148) Summary: **Context:** Atomic flush should guarantee recoverability of all data of seqno up to the max seqno of the flush. It achieves this by ensuring all such data are flushed by the time this atomic flush finishes through `SelectColumnFamiliesForAtomicFlush()`. However, our crash test exposed the following case where an excluded CF from an atomic flush contains unflushed data of seqno less than the max seqno of that atomic flush and loses its data with `WriteOptions::DisableWAL=true` in face of a crash right after the atomic flush finishes . ``` ./db_stress --preserve_unverified_changes=1 --reopen=0 --acquire_snapshot_one_in=0 --adaptive_readahead=1 --allow_data_in_errors=True --async_io=1 --atomic_flush=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=15 --bottommost_compression_type=none --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=0 --checkpoint_one_in=0 --checksum_type=kXXH3 --clear_column_family_one_in=0 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=1 --compaction_ttl=100 --compression_max_dict_buffer_bytes=134217727 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=lz4hc --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=1048576 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --expected_values_dir=$exp --fail_if_options_file_error=0 --fifo_allow_compaction=0 --file_checksum_impl=none --flush_one_in=0 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=2 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --long_running_snapshots=1 --manual_wal_flush_one_in=100 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=10000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=4 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=0 --periodic_compaction_seconds=100 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=32 --readahead_size=16384 --readpercent=50 --recycle_log_file_num=0 --ribbon_starting_level=6 --secondary_cache_fault_one_in=0 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=10 --subcompactions=1 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 --use_multiget=1 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=0 --verify_db_one_in=1000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=none --write_buffer_size=524288 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=30 & pid=$! sleep 0.2 sleep 10 kill $pid sleep 0.2 ./db_stress --ops_per_thread=1 --preserve_unverified_changes=1 --reopen=0 --acquire_snapshot_one_in=0 --adaptive_readahead=1 --allow_data_in_errors=True --async_io=1 --atomic_flush=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=15 --bottommost_compression_type=none --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=0 --checkpoint_one_in=0 --checksum_type=kXXH3 --clear_column_family_one_in=0 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=1 --compaction_ttl=100 --compression_max_dict_buffer_bytes=134217727 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=lz4hc --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=1048576 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --expected_values_dir=$exp --fail_if_options_file_error=0 --fifo_allow_compaction=0 --file_checksum_impl=none --flush_one_in=0 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=2 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --long_running_snapshots=1 --manual_wal_flush_one_in=100 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=10000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=4 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=0 --periodic_compaction_seconds=100 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=32 --readahead_size=16384 --readpercent=50 --recycle_log_file_num=0 --ribbon_starting_level=6 --secondary_cache_fault_one_in=0 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=10 --subcompactions=1 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 --use_multiget=1 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=0 --verify_db_one_in=1000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=none --write_buffer_size=524288 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=30 & pid=$! sleep 0.2 sleep 40 kill $pid sleep 0.2 Verification failed for column family 6 key 0000000000000239000000000000012B0000000000000138 (56622): value_from_db: , value_from_expected: 4A6331754E4F4C4D42434041464744455A5B58595E5F5C5D5253505156575455, msg: Value not found: NotFound: Crash-recovery verification failed :( No writes or ops? Verification failed :( ``` The bug is due to the following: - When atomic flush is used, an empty CF is legally [excluded](https://github.com/facebook/rocksdb/blob/7.10.fb/db/db_filesnapshot.cc#L39) in `SelectColumnFamiliesForAtomicFlush` as the first step of `DBImpl::FlushForGetLiveFiles` before [passing](https://github.com/facebook/rocksdb/blob/7.10.fb/db/db_filesnapshot.cc#L42) the included CFDs to `AtomicFlushMemTables`. - But [later](https://github.com/facebook/rocksdb/blob/7.10.fb/db/db_impl/db_impl_compaction_flush.cc#L2133) in `AtomicFlushMemTables`, `WaitUntilFlushWouldNotStallWrites` will [release the db mutex](https://github.com/facebook/rocksdb/blob/7.10.fb/db/db_impl/db_impl_compaction_flush.cc#L2403), during which data@seqno N can be inserted into the excluded CF and data@seqno M can be inserted into one of the included CFs, where M > N. - However, data@seqno N in an already-excluded CF is thus excluded from this atomic flush while we seqno N is less than seqno M. **Summary:** - Replace `SelectColumnFamiliesForAtomicFlush()`-before-`AtomicFlushMemTables()` with `SelectColumnFamiliesForAtomicFlush()`-after-wait-within-`AtomicFlushMemTables()` so we ensure no write affecting the recoverability of this atomic job (i.e, change to max seqno of this atomic flush or insertion of data with less seqno than the max seqno of the atomic flush to excluded CF) can happen after calling `SelectColumnFamiliesForAtomicFlush()`. - For above, refactored and clarified comments on `SelectColumnFamiliesForAtomicFlush()` and `AtomicFlushMemTables()` for clearer semantics of passed-in CFDs to atomic-flush Pull Request resolved: https://github.com/facebook/rocksdb/pull/11148 Test Plan: - New unit test failed before the fix and passes after - Make check - Rehearsal stress test Reviewed By: ajkr Differential Revision: D42799871 Pulled By: hx235 fbshipit-source-id: 13636b63e9c25c5895857afc36ea580d57f6d644 | 14 March 2023, 23:53:20 UTC |
2a23bee | Peter Dillinger | 14 March 2023, 03:41:55 UTC | Use CacheWrapper in more places (#11295) Summary: ... to simplify code and make it less prone to needless updates on refactoring. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11295 Test Plan: existing tests (no functional changes intended) Reviewed By: hx235 Differential Revision: D44040260 Pulled By: pdillinger fbshipit-source-id: 1b6badb5c8ca673db0903bfaba3cfbc986f386be | 14 March 2023, 03:41:55 UTC |
4988192 | Levi Tamasi | 14 March 2023, 01:43:27 UTC | Rename a recently added PerfContext counter (#11294) Summary: The patch renames the counter added in https://github.com/facebook/rocksdb/issues/11284 for better consistency with the existing naming scheme. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11294 Test Plan: `make check` Reviewed By: jowlyzhang Differential Revision: D44035964 Pulled By: ltamasi fbshipit-source-id: 8b1a2a03ee728148365367e0ecc1fcf462f62191 | 14 March 2023, 01:43:27 UTC |
648e972 | Peter Dillinger | 13 March 2023, 21:19:59 UTC | Document DB::Resume(), fix LockWALInEffect test (#11290) Summary: In rare cases seeing failures like this ``` [ RUN ] DBWriteTestInstance/DBWriteTest.LockWALInEffect/2 db/db_write_test.cc:653: Failure Put("key3", "value") Corruption: Not active ``` in a test with no explicit threading. This is likely because of the unpredictability of background auto-resume. I didn't really know this feature, in part because DB::Resume() was undocumented. So I believe I have fixed the test and documented the API function. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11290 Test Plan: 1000s of stress runs of the test with gtest-parallel Reviewed By: anand1976 Differential Revision: D43984583 Pulled By: pdillinger fbshipit-source-id: d30dec120b4864e193751b2e33ff16834d313db3 | 13 March 2023, 21:19:59 UTC |
9aa3b6f | Changyu Bi | 13 March 2023, 18:06:59 UTC | Support range deletion tombstones in `CreateColumnFamilyWithImport` (#11252) Summary: CreateColumnFamilyWithImport() did not support range tombstones for two reasons: 1. it uses point keys of a input file to determine its boundary (smallest and largest internal key), which means range tombstones outside of the point key range will be effectively dropped. 2. it does not handle files with no point keys. Also included a fix in external_sst_file_ingestion_job.cc where the blocks read in `GetIngestedFileInfo()` can be added to block cache now (issue fixed in https://github.com/facebook/rocksdb/pull/6429). This PR adds support for exporting and importing column family with range tombstones. The main change is to add smallest internal key and largest internal key to `SstFileMetaData` that will be part of the output of `ExportColumnFamily()`. Then during `CreateColumnFamilyWithImport(...,const ExportImportFilesMetaData& metadata,...)`, file boundaries can be set from `metadata` directly. This is needed since when file boundaries are extended by range tombstones, sometimes they cannot be deduced from a file's content alone. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11252 Test Plan: - added unit tests that fails before this change Closes https://github.com/facebook/rocksdb/issues/11245 Reviewed By: ajkr Differential Revision: D43577443 Pulled By: cbi42 fbshipit-source-id: 6bff78e583cc50c44854994dea0a8dd519398f2f | 13 March 2023, 18:06:59 UTC |
fbd603d | Alan Paxton | 10 March 2023, 20:26:09 UTC | Reverse wrong order of parameter names for Java WriteBatchWithIndex#iteratorWithBase (#11280) Summary: Fix for https://github.com/facebook/rocksdb/issues/11008 `Java_org_rocksdb_WriteBatchWithIndex_iteratorWithBase` takes parameters `(… jlong jwbwi_handle, jlong jcf_handle, jlong jbase_iterator_handle, jlong jread_opts_handle)` while `WriteBatchWithIndex.java` declares `private native long iteratorWithBase(final long handle, final long baseIteratorHandle, final long cfHandle, final long readOptionsHandle)`. Luckily the only call to `iteratorWithBase` passes the parameters in the correct order for the implementation `(… cfHandle, baseIteratorHandle …)` This type checks because the types are the same (long words). The code is currently used correctly, it is just extremely misleading. Swap the names of the 2 parameters in the Java method so that the correct usage is clear. There already exist test methods which call the API correctly and only succeed because of that. These continue to work. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11280 Reviewed By: cbi42 Differential Revision: D43874798 Pulled By: ajkr fbshipit-source-id: b59bc930bf579f4e0804f0effd4fb17f4225d60c | 10 March 2023, 20:26:09 UTC |
969d4e1 | Jaepil Jeong | 10 March 2023, 00:42:57 UTC | Fix compile errors in Clang due to unused variables depending on the build configuration (#11234) Summary: This PR fixes compilation errors in Clang due to unused variables like the below: ``` [109/329] Building CXX object CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o FAILED: CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o ccache /opt/homebrew/opt/llvm/bin/clang++ -DGFLAGS=1 -DGFLAGS_IS_A_DLL=0 -DHAVE_FULLFSYNC -DJEMALLOC_NO_DEMANGLE -DLZ4 -DOS_MACOSX -DROCKSDB_JEMALLOC -DROCKSDB_LIB_IO_POSIX -DROCKSDB_NO_DYNAMIC_EXTENSION -DROCKSDB_PLATFORM_POSIX -DSNAPPY -DTBB -DZLIB -DZSTD -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -I/Users/jaepil/app/include -I/opt/homebrew/include -I/opt/homebrew/opt/llvm/include -I/opt/homebrew/opt/llvm/include/c++/v1 -W -Wextra -Wall -pthread -Wsign-compare -Wshadow -Wno-unused-parameter -Wno-unused-variable -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-strict-aliasing -Wno-invalid-offsetof -fno-omit-frame-pointer -momit-leaf-frame-pointer -march=armv8-a+crc+crypto -Wno-unused-function -Werror -O2 -g -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk -std=gnu++20 -MD -MT CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o -MF CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o.d -o CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o -c /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/db/version_edit_handler.cc /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/db/version_edit_handler.cc:30:10: error: variable 'recovered_edits' set but not used [-Werror,-Wunused-but-set-variable] size_t recovered_edits = 0; ^ 1 error generated. ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11234 Reviewed By: cbi42 Differential Revision: D43458604 Pulled By: ajkr fbshipit-source-id: d8c50e1a108887b037a120cd9f19374ddaeee817 | 10 March 2023, 00:42:57 UTC |
7a07afe | zhangliangkai1992 | 09 March 2023, 21:11:25 UTC | DBWithTTLImpl::IsStale overflow when ttl is 15 years (#11279) Summary: Fix DBWIthTTLImpl::IsStale overflow Pull Request resolved: https://github.com/facebook/rocksdb/pull/11279 Reviewed By: cbi42 Differential Revision: D43875039 Pulled By: ajkr fbshipit-source-id: 3e5feb8c4c4480bf1421b0763ade3d2e459ec028 | 09 March 2023, 21:11:25 UTC |
daeec50 | Alan Paxton | 09 March 2023, 21:11:00 UTC | Add instructions for installing googlebenchmark (#11282) Summary: Per the discussion in https://groups.google.com/g/rocksdb/c/JqhlvSs6ZEs/m/bnXZ7Q--AAAJ It seems non-obvious that googlebenchmark must be installed manually before microbenchmarks can be run. I have added more detail to the installation instructions to make it clearer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11282 Reviewed By: cbi42 Differential Revision: D43874724 Pulled By: ajkr fbshipit-source-id: f64a4ac4914cb057955d1ca965885f8822ca7764 | 09 March 2023, 21:11:00 UTC |
1de6976 | akankshamahajan | 09 March 2023, 17:16:20 UTC | Fix hang in async_io benchmarks in regression script (#11285) Summary: Fix hang in async_io benchmarks in regression script. I changed the order of benchmarks and that somehow fixed the issue of hang. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11285 Test Plan: Ran it manually Reviewed By: pdillinger Differential Revision: D43937431 Pulled By: akankshamahajan15 fbshipit-source-id: 7c43075d3be6b8f41d08e845664012768b769661 | 09 March 2023, 17:16:20 UTC |
1d52438 | Levi Tamasi | 09 March 2023, 02:22:11 UTC | Add a PerfContext counter for merge operands applied in point lookups (#11284) Summary: The existing PerfContext counter `internal_merge_count` only tracks the Merge operands applied during range scans. The patch adds a new counter called `internal_merge_count_point_lookups` to track the same metric for point lookups (`Get` / `MultiGet` / `GetEntity` / `MultiGetEntity`), and also fixes a couple of cases in the iterator where the existing counter wasn't updated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11284 Test Plan: `make check` Reviewed By: jowlyzhang Differential Revision: D43926082 Pulled By: ltamasi fbshipit-source-id: 321566d8b4cf0a3b6c9b73b7a5c984fb9bb492e9 | 09 March 2023, 02:22:11 UTC |
6c65bf1 | akankshamahajan | 07 March 2023, 23:07:49 UTC | Decrease duration time for internally debugging the regression_script (#11283) Summary: Internally, the benchmark is going on hang state whereas when run on same host manually, it passes. Decrease the duration to 5s to figure out how much time it is taking to complete the benchmark. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11283 Test Plan: Ran manually internally Reviewed By: hx235 Differential Revision: D43882260 Pulled By: akankshamahajan15 fbshipit-source-id: 9ea44164773d4df4fc05cd817b7e011426c4d428 | 07 March 2023, 23:07:49 UTC |
e010732 | Peter Dillinger | 06 March 2023, 19:53:09 UTC | Tests verifying non-zero checksums of zero bytes (#11260) Summary: Adds unit tests verifying that a block payload and checksum of all zeros is not falsely considered valid data. The test exhaustively checks that for blocks up to some length (default 20K, more exhaustively 10M) of all zeros do not produce a block checksum of all zeros. Also small refactoring of an existing checksum test to use parameterized test. (Suggest hiding whitespace changes for review.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11260 Test Plan: this is the test, manual run with `ROCKSDB_THOROUGH_CHECKSUM_TEST=1` to verify up to 10M. Reviewed By: hx235 Differential Revision: D43706192 Pulled By: pdillinger fbshipit-source-id: 95e721c320ca928e7fa2400c2570fb359cc30b1f | 06 March 2023, 19:53:09 UTC |
13357de | akankshamahajan | 06 March 2023, 19:22:21 UTC | Add support for parameters setting related to async_io benchmarks (#11262) Summary: Provide support in benchmark regression to use different options to be used in async_io benchamark only - "$`MAX_READAHEAD_SIZE`", $`INITIAL_READAHEAD_SIZE`", "$`NUM_READS_FOR_READAHEAD_SIZE`". If user wants to run set these parameters for all benchmarks then these parameters need to be set in OPTION file instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11262 Test Plan: Ran manually Reviewed By: anand1976 Differential Revision: D43725567 Pulled By: akankshamahajan15 fbshipit-source-id: 28c3462dd785ffd646d44560fa9c92bc6a8066e5 | 06 March 2023, 19:22:21 UTC |
a1a3b23 | Levi Tamasi | 06 March 2023, 17:50:39 UTC | Deflake/fix BlobSourceCacheReservationTest.IncreaseCacheReservationOnFullCache (#11273) Summary: `BlobSourceCacheReservationTest.IncreaseCacheReservationOnFullCache` is both flaky and also doesn't do what its name says. The patch changes this test so it actually tests increasing the cache reservation, hopefully also deflaking it in the process. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11273 Test Plan: `make check` Reviewed By: akankshamahajan15 Differential Revision: D43800935 Pulled By: ltamasi fbshipit-source-id: 5eb54130dfbe227285b0e14f2084aa4b89f0b107 | 06 March 2023, 17:50:39 UTC |
50e9b3f | Peter Dillinger | 05 March 2023, 16:21:57 UTC | Default print stack traces with GDB on Linux (#11272) Summary: On Linux systems using full ASLR, including CircleCI, the old backtrace()+addr2line stack traces are pretty useless, as seen in some failures under ASSERT_STATUS_CHECKED=1 LIB_MODE=static. Use gdb by default for stack traces under Linux. More detail in code comments. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11272 Test Plan: manual testing locally and on CircleCI with ssh Reviewed By: anand1976 Differential Revision: D43786211 Pulled By: pdillinger fbshipit-source-id: f8c7c77f774b504fbdf7c786ff2430cbc8f5b939 | 05 March 2023, 16:21:57 UTC |
e168c1b | Peter Dillinger | 05 March 2023, 16:21:16 UTC | Use FaultInjectionTestFS in DBWriteTest.LockWALInEffect (#11271) Summary: Existing use of FaultInjectionTestEnv shows rare TSAN errors with parallel Sync and Flush. This appears to be fixed in FaultInjectionTestFS. (Sigh, code duplication and divergence.) Example failure: https://app.circleci.com/pipelines/github/facebook/rocksdb/24631/workflows/fc2a66f0-f21c-48d6-a944-3885bcff50a4/jobs/571928 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11271 Test Plan: wasn't able to reproduce locally but stress tested the updated test with gtest-parallel -r1000 and TSAN. Reviewed By: ajkr Differential Revision: D43779477 Pulled By: pdillinger fbshipit-source-id: a019b0f1d4045a26a15ab08aab63828a398f6d3e | 05 March 2023, 16:21:16 UTC |
ddde1e6 | Igor Canadi | 04 March 2023, 04:55:31 UTC | Avoid ColumnFamilyDescriptor copy (#10978) Summary: Hi. :) Noticed we are copying ColumnFamilyDescriptor here because my process crashed during copy constructor (cause unrelated) Pull Request resolved: https://github.com/facebook/rocksdb/pull/10978 Reviewed By: cbi42 Differential Revision: D41473924 Pulled By: ajkr fbshipit-source-id: 58a3473f2d7b24918f79d4b2726c20081c5e95b4 | 04 March 2023, 04:55:31 UTC |
d053926 | Changyu Bi | 03 March 2023, 20:17:30 UTC | Improve documentation for MergingIterator (#11161) Summary: Add some comments to try to explain how/why MergingIterator works. Made some small refactoring, mostly in MergingIterator::SkipNextDeleted() and MergingIterator::SeekImpl(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/11161 Test Plan: crash test with small key range: ``` python3 tools/db_crashtest.py blackbox --simple --max_key=100 --interval=6000 --write_buffer_size=262144 --target_file_size_base=256 --max_bytes_for_level_base=262144 --block_size=128 --value_size_mult=33 --subcompactions=10 --use_multiget=1 --delpercent=3 --delrangepercent=2 --verify_iterator_with_expected_state_one_in=2 --num_iterations=10 ``` Reviewed By: ajkr Differential Revision: D42860994 Pulled By: cbi42 fbshipit-source-id: 3f0c1c9c6481a7f468bf79d823998907a8116e9e | 03 March 2023, 20:17:30 UTC |
95d67f3 | Levi Tamasi | 03 March 2023, 17:53:13 UTC | Fix/clarify/extend the API comments of CompactionFilter (#11261) Summary: The patch makes the following changes to the API comments: * Some general comments about snapshots, thread safety, and user-defined timestamps are moved to a more prominent place at the top of the file. * Detailed descriptions are added for each `ValueType` and `Decision`, fixing and extending some existing comments (e.g. that of `kRemove`, which suggested that key-values are simply removed from the output, while in reality base values are converted to tombstones) and adding detailed comments that were missing (e.g. `kPurge` and `kChangeWideColumnEntity`). * Updated/extended the comments of `FilterV2/V3` and `FilterBlobByKey`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11261 Reviewed By: akankshamahajan15 Differential Revision: D43714314 Pulled By: ltamasi fbshipit-source-id: 835f4b1bdac1ce0e291155186095211303260729 | 03 March 2023, 17:53:13 UTC |
8dfcfd4 | Yu Zhang | 01 March 2023, 21:28:54 UTC | Fix backward iteration issue when user defined timestamp is enabled in BlobDB (#11258) Summary: During backward iteration, blob verification would fail because the user key (ts included) in `saved_key_` doesn't match the blob. This happens because during`FindValueForCurrentKey`, `saved_key_` is not updated when the user key(ts not included) is the same for all cases except when `timestamp_lb_` is specified. This breaks the blob verification logic when user defined timestamp is enabled and `timestamp_lb_` is not specified. Fix this by always updating `saved_key_` when a smaller user key (ts included) is seen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11258 Test Plan: `make check` `./db_blob_basic_test --gtest_filter=DBBlobWithTimestampTest.IterateBlobs` Run db_bench (built with DEBUG_LEVEL=0) to demonstrate that no overhead is introduced with: `./db_bench -user_timestamp_size=8 -db=/dev/shm/rocksdb -disable_wal=1 -benchmarks=fillseq,seekrandom[-W1-X6] -reverse_iterator=1 -seek_nexts=5` Baseline: - seekrandom [AVG 6 runs] : 72188 (± 1481) ops/sec; 37.2 (± 0.8) MB/sec With this PR: - seekrandom [AVG 6 runs] : 74171 (± 1427) ops/sec; 38.2 (± 0.7) MB/sec Reviewed By: ltamasi Differential Revision: D43675642 Pulled By: jowlyzhang fbshipit-source-id: 8022ae8522d1f66548821855e6eed63640c14e04 | 01 March 2023, 21:28:54 UTC |
cf09917 | anand76 | 28 February 2023, 18:36:56 UTC | Add filter/index/data secondary cache hits stats (#11246) Summary: Add more stats for better visibility into the usefulness of the secondary cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11246 Test Plan: Add a new unit test Reviewed By: akankshamahajan15 Differential Revision: D43521364 Pulled By: anand1976 fbshipit-source-id: a92f04884e738a9bf40ad4047acaaaea343838a7 | 28 February 2023, 18:36:56 UTC |
b7e7350 | yihuang | 27 February 2023, 19:39:38 UTC | fix: add extern and ROCKSDB_LIBRARY_API to two c apis (#11217) Summary: add extern and `ROCKSDB_LIBRARY_API ` to `rocksdb_property_int` and `rocksdb_property_int_cf`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11217 Reviewed By: cbi42 Differential Revision: D43522968 Pulled By: ajkr fbshipit-source-id: 4cd4e136f3890fc17e0a1f9e7ac4e517e4d79afa | 27 February 2023, 19:39:38 UTC |
3c9eed6 | Levi Tamasi | 24 February 2023, 18:33:00 UTC | Enable moving a string or PinnableSlice into PinnableWideColumns (#11248) Summary: This makes it possible to eliminate some copies in `GetEntity` / `MultiGetEntity`, in particular when `Merge`s or blobs are involved. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11248 Test Plan: `make check` Reviewed By: akankshamahajan15 Differential Revision: D43544215 Pulled By: ltamasi fbshipit-source-id: bc4c8955a24bbd8bc4ab098e72133ead757f9707 | 24 February 2023, 18:33:00 UTC |
af7872f | Yu Zhang | 24 February 2023, 01:00:04 UTC | Fix a TestGet failure when user defined timestamp is enabled (#11249) Summary: Stressing small DB with small number of keys and user-defined timestamp enabled usually fails pretty quickly in TestGet. Example command to reproduce the failure: ` tools/db_crashtest.py blackbox --enable_ts --simple --delrangepercent=0 --delpercent=5 --max_key=100 --interval=3 --write_buffer_size=262144 --target_file_size_base=262144 --max_bytes_for_level_base=262144 --subcompactions=1` Example failure: `error : inconsistent values for key 0000000000000009000000000000000A7878: expected state has the key, Get() returns NotFound.` Fixes this test failure by refreshing the read up to timestamp to the most up to date timestamp, a.k.a now, after a key is locked. Without this, things could happen in this order and cause a test failure: <table> <tr> <th>TestGet thread</th> <th> A writing thread</th> </tr> <tr> <td>read_opts.timestamp = GetNow()</td> <td></td> </tr> <tr> <td></td> <td>Lock key, do write</td> </tr> <tr> <td>Lock key, read(read_opts) return NotFound</td> <td></td> </tr> </table> Pull Request resolved: https://github.com/facebook/rocksdb/pull/11249 Reviewed By: ltamasi Differential Revision: D43551302 Pulled By: jowlyzhang fbshipit-source-id: 26877ab379bdb97acd2682a2632bc29718427f38 | 24 February 2023, 01:00:04 UTC |
f007b8f | Yu Zhang | 22 February 2023, 23:44:59 UTC | Support iter_start_ts in integrated BlobDB (#11244) Summary: Fixed an issue during backward iteration when `iter_start_ts` is set in an integrated BlobDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11244 Test Plan: ```make check ./db_blob_basic_test --gtest_filter="DBBlobWithTimestampTest.IterateBlobs" tools/db_crashtest.py --stress_cmd=./db_stress --cleanup_cmd='' --enable_ts whitebox --random_kill_odd 888887 --enable_blob_files=1``` Reviewed By: ltamasi Differential Revision: D43506726 Pulled By: jowlyzhang fbshipit-source-id: 2cdc19ebf8da909d8d43d621353905784949a9f0 | 22 February 2023, 23:44:59 UTC |
229297d | Changyu Bi | 22 February 2023, 20:28:18 UTC | Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) Summary: A second attempt after https://github.com/facebook/rocksdb/issues/10802, with bug fixes and refactoring. This PR updates compaction logic to take range tombstones into account when determining whether to cut the current compaction output file (https://github.com/facebook/rocksdb/issues/4811). Before this change, only point keys were considered, and range tombstones could cause large compactions. For example, if the current compaction outputs is a range tombstone [a, b) and 2 point keys y, z, they would be added to the same file, and may overlap with too many files in the next level and cause a large compaction in the future. This PR also includes ajkr's effort to simplify the logic to add range tombstones to compaction output files in `AddRangeDels()` ([https://github.com/facebook/rocksdb/issues/11078](https://github.com/facebook/rocksdb/pull/11078#issuecomment-1386078861)). The main change is for `CompactionIterator` to emit range tombstone start keys to be processed by `CompactionOutputs`. A new class `CompactionMergingIterator` is introduced to replace `MergingIterator` under `CompactionIterator` to enable emitting of range tombstone start keys. Further improvement after this PR include cutting compaction output at some grandparent boundary key (instead of the next output key) when cutting within a range tombstone to reduce overlap with grandparents. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11113 Test Plan: * added unit test in db_range_del_test * crash test with a small key range: `python3 tools/db_crashtest.py blackbox --simple --max_key=100 --interval=600 --write_buffer_size=262144 --target_file_size_base=256 --max_bytes_for_level_base=262144 --block_size=128 --value_size_mult=33 --subcompactions=10 --use_multiget=1 --delpercent=3 --delrangepercent=2 --verify_iterator_with_expected_state_one_in=2 --num_iterations=10` Reviewed By: ajkr Differential Revision: D42655709 Pulled By: cbi42 fbshipit-source-id: 8367e36ef5640e8f21c14a3855d4a8d6e360a34c | 22 February 2023, 20:28:18 UTC |
9fa9bec | ywave | 22 February 2023, 13:44:03 UTC | fix -Wrange-loop-analysis in Apple clang version 12.0.0 (clang-1200.0.32.29) (#11240) Summary: Fix complain ``` db/db_impl/db_impl_compaction_flush.cc:417:19: error: loop variable 'bg_flush_arg' of type 'const rocksdb::DBImpl::BGFlushArg' creates a copy from type 'const rocksdb::DBImpl::BGFlushArg' [-Werror,-Wrange-loop-analysis] for (const auto bg_flush_arg : bg_flush_args) { ^ db/db_impl/db_impl_compaction_flush.cc:417:8: note: use reference type 'const rocksdb::DBImpl::BGFlushArg &' to prevent copying for (const auto bg_flush_arg : bg_flush_args) { ^~~~~~~~~~~~~~~~~~~~~~~~~ & db/db_impl/db_impl_compaction_flush.cc:2911:21: error: loop variable 'bg_flush_arg' of type 'const rocksdb::DBImpl::BGFlushArg' creates a copy from type 'const rocksdb::DBImpl::BGFlushArg' [-Werror,-Wrange-loop-analysis] for (const auto bg_flush_arg : bg_flush_args) { ^ db/db_impl/db_impl_compaction_flush.cc:2911:10: note: use reference type 'const rocksdb::DBImpl::BGFlushArg &' to prevent copying for (const auto bg_flush_arg : bg_flush_args) { ^~~~~~~~~~~~~~~~~~~~~~~~~ & ``` from ```sh xxx@MacBook-Pro / % g++ -v Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1 Apple clang version 12.0.0 (clang-1200.0.32.29) Target: x86_64-apple-darwin21.6.0 Thread model: posix ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11240 Reviewed By: cbi42 Differential Revision: D43458729 Pulled By: ajkr fbshipit-source-id: 26e110f83451509463a1bc308f737ccb693c9f45 | 22 February 2023, 13:44:03 UTC |
2860804 | Andrew Kryczka | 21 February 2023, 23:21:33 UTC | Update HISTORY.md and version.h for 8.0 release (#11238) Summary: 8.0.fb branch is cut so changes going forward will be part of 8.1. Updated version.h and HISTORY.md accordingly Pull Request resolved: https://github.com/facebook/rocksdb/pull/11238 Reviewed By: cbi42 Differential Revision: D43428345 Pulled By: ajkr fbshipit-source-id: d344b6e504c81a85563ae9d3705b11c533b1cd43 | 21 February 2023, 23:21:33 UTC |
476b015 | anand76 | 21 February 2023, 20:53:55 UTC | Revert enabling IO uring in db_stress (#11242) Summary: IO uring usage is causing crash test failures due to bad cqe data being returned in the uring. Revert the change to enable IO uring in db_stress, and also re-enable async_io in CircleCI so that code path can be tested. Added the -use_io_uring flag to db_stress that, when false, will wrap the default env in db_stress to emulate async IO. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11242 Reviewed By: akankshamahajan15 Differential Revision: D43470569 Pulled By: anand1976 fbshipit-source-id: 7c69ac3f53a79ade31d37313f815f1a4b6108b75 | 21 February 2023, 20:53:55 UTC |
1b48ecc | Changyu Bi | 21 February 2023, 19:57:58 UTC | Fix an assertion failure in DBIter::SeekToLast() when user-defined timestamp is enabled (#11223) Summary: in DBIter::SeekToLast(), key() can be called when iter is invalid and fails the following assertion: ``` ./db/db_iter.h:153: virtual rocksdb::Slice rocksdb::DBIter::key() const: Assertion `valid_' failed. ``` This happens when `iterate_upper_bound` and timestamp_lb_ are set. SeekForPrev(*iterate_upper_bound_) positions the iterator on the same user key as *iterate_upper_bound_. A subsequent PrevInternal() call makes the iterator invalid just be the call to key(). This PR fixes this issue by setting updating the seek key to have max sequence number AND max timestamp when the seek key has the same user key as *iterate_upper_bound_. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11223 Test Plan: - Added a unit test that would fail the above assertion before this fix. Reviewed By: jowlyzhang Differential Revision: D43283600 Pulled By: cbi42 fbshipit-source-id: 0dd3999845b722584679bbc95be2664b266005ba | 21 February 2023, 19:57:58 UTC |
ea85148 | leipeng | 21 February 2023, 19:26:30 UTC | DBIter::FindNextUserEntryInternal: do not PrepareValue for `Delete` (#11211) Summary: `kTypeDeletion/kTypeDeletionWithTimestamp/kTypeSingleDeletion` does not need access iter value, so omit `PrepareValue`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11211 Reviewed By: ajkr Differential Revision: D43253068 Pulled By: cbi42 fbshipit-source-id: 1945c7f8a90b6909128a0553b62d9fd1078b0a08 | 21 February 2023, 19:26:30 UTC |
ebfca2c | Changyu Bi | 21 February 2023, 19:12:22 UTC | Fix comment for option `periodic_compaction_seconds` (#11227) Summary: the comment for option `periodic_compaction_seconds` only mentions support for Leveled and FIFO compaction, while the implementation supports all compaction styles after https://github.com/facebook/rocksdb/issues/5970. This PR updates comment to reflect this. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11227 Reviewed By: ajkr Differential Revision: D43325046 Pulled By: cbi42 fbshipit-source-id: 2364dcb5a01cd098ad52c818fe10d621445e2188 | 21 February 2023, 19:12:22 UTC |
83bc03a | HuangYi | 21 February 2023, 18:52:09 UTC | add c api to set option fail_if_not_bottommost_level (#11158) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11158 Reviewed By: cbi42 Differential Revision: D42870647 Pulled By: ajkr fbshipit-source-id: 1b71a1dd415c34c332cecf60c68ce37fe4393e2a | 21 February 2023, 18:52:09 UTC |
cfe50f7 | HuangYi | 21 February 2023, 18:00:43 UTC | add c api for HyperClockCache (#11110) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11110 Reviewed By: cbi42 Differential Revision: D42660941 Pulled By: ajkr fbshipit-source-id: e977d9b76dfd5d8c62335f961c275f3b810503d7 | 21 February 2023, 18:00:43 UTC |
142b18d | Matt Jurik | 21 February 2023, 17:10:03 UTC | C-API: Support multi-CF flush (#11112) Summary: This PR adds support to the c-api bindings for calling `Flush()` with multiple column families, which is useful for performing atomic flushes (assuming also that the db has been opened with `atomic_flush = true`). Pull Request resolved: https://github.com/facebook/rocksdb/pull/11112 Reviewed By: cbi42 Differential Revision: D42666382 Pulled By: ajkr fbshipit-source-id: 82f05bf32d28452d85c79ea42411c8fea961fd87 | 21 February 2023, 17:10:03 UTC |
fcd816d | Andrew Kryczka | 19 February 2023, 01:30:15 UTC | Add missing override keyword in env_win.h functions (#11232) Summary: I couldn't figure out why this causes failures in our 8.0 release to fbcode while this issue appears to not be new in 8.0. Anyways, we can add the missing `override` keywords to these functions as the compiler insists. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11232 Reviewed By: pdillinger Differential Revision: D43420656 Pulled By: ajkr fbshipit-source-id: da748eeef6ba38dd113dbe4b5143d7558daf38dd | 19 February 2023, 01:30:15 UTC |
d471268 | Alan Paxton | 17 February 2023, 21:03:41 UTC | Fix Java API ComparatorOptions use after delete error (#11176) Summary: The problem ------------- ComparatorOptions is AutoCloseable. AbstractComparator does not hold a reference to its ComparatorOptions, but the native C++ ComparatorJniCallback holds a reference to the ComparatorOptions’ native C++ options structure. This gets deleted when the ComparatorOptions is closed, either explicitly, or as part of try-with-resources. Later, the deleted C++ options structure gets used by the callback and the comparator options are effectively random. The original bug report https://github.com/facebook/rocksdb/issues/8715 was caused by a GC-initiated finalization closing the still-in-use ComparatorOptions. As of 7.0, finalization of RocksDB objects no longer closes them, which worked round the reported bug, but still left ComparatorOptions with a potentially broken lifetime. In any case, we encourage API clients to use the try-with-resources model, and so we need it to work. And if they don't use it, they leak resources. The solution ------------- The solution implemented here is to make a copy of the native C++ options object into the ComparatorJniCallback, rather than a reference. Then the deletion of the native object held by ComparatorOptions is *correctly* deleted when its scope is closed in try/finally. Testing ------- We added a regression unit test based on the original test for the reported ticket. This checkin closes https://github.com/facebook/rocksdb/issues/8715 We expect that there are more instances of "lifecycle" bugs in the Java API. They are a major source of support time/cost, and we note that they could be addressed as a whole using the model proposed/prototyped in https://github.com/facebook/rocksdb/pull/10736 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11176 Reviewed By: cbi42 Differential Revision: D43160885 Pulled By: pdillinger fbshipit-source-id: 60b54215a02ad9abb17363319650328c00a9ad62 | 17 February 2023, 21:03:41 UTC |
b6640c3 | mrambacher | 17 February 2023, 20:54:07 UTC | Remove FactoryFunc from LoadXXXObject (#11203) Summary: The primary purpose of the FactoryFunc was to support LITE mode where the ObjectRegistry was not available. With the removal of LITE mode, the function was no longer required. Note that the MergeOperator had some private classes defined in header files. To gain access to their constructors (and name methods), the class definitions were moved into header files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11203 Reviewed By: cbi42 Differential Revision: D43160255 Pulled By: pdillinger fbshipit-source-id: f3a465fd5d1a7049b73ecf31e4b8c3762f6dae6c | 17 February 2023, 20:54:07 UTC |
25e1365 | Andrew Kryczka | 17 February 2023, 18:58:46 UTC | Merge operator failed subcode (#11231) Summary: From HISTORY.md: Added a subcode of `Status::Corruption`, `Status::SubCode::kMergeOperatorFailed`, for users to identify corruption failures originating in the merge operator, as opposed to RocksDB's internally identified data corruptions. This is a followup to https://github.com/facebook/rocksdb/issues/11092, where we gave users the ability to keep running a DB despite merge operator failing. Now that the DB keeps running despite such failures, they want to be able to distinguish such failures from real corruptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11231 Test Plan: updated unit test Reviewed By: akankshamahajan15 Differential Revision: D43396607 Pulled By: ajkr fbshipit-source-id: 17fbcc779ad724dafada8abd73efd38e1c5208b9 | 17 February 2023, 18:58:46 UTC |
6aef1a0 | Andrew Kryczka | 17 February 2023, 17:03:37 UTC | Use CacheDependencies() at start of ApproximateKeyAnchors() (#11230) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11230 Test Plan: - setup command: `$ ./db_bench -benchmarks=fillrandom,compact -compression_type=none -num=1000000 -write_buffer_size=4194304 -target_file_size_base=4194304 -use_direct_io_for_flush_and_compaction=true -partition_index_and_filters=true -bloom_bits=10 -metadata_block_size=1024` - measure small read count bucketed by size: `$ strace -fye pread64 ./db_bench.ctrl -use_existing_db=true -benchmarks=compact -compaction_readahead_size=4194304 -compression_type=none -num=1000000 -write_buffer_size=4194304 -target_file_size_base=4194304 -use_direct_io_for_flush_and_compaction=true -partition_index_and_filters=true -bloom_bits=10 -metadata_block_size=1024 -subcompactions=4 -cache_size=1048576000 2>&1 >/dev/null | awk '/= [0-9]+$/{print "[", int($NF / 1024), "KB,", int(1 + $NF / 1024), "KB)"}' | sort -n -k 2 | uniq -c | head -3` - before: ``` 1119 [ 0 KB, 1 KB) 1 [ 6 KB, 7 KB) 2 [ 7 KB, 8 KB) ``` - after: ``` 242 [ 0 KB, 1 KB) 1 [ 6 KB, 7 KB) 2 [ 7 KB, 8 KB) ``` Reviewed By: pdillinger Differential Revision: D43388507 Pulled By: ajkr fbshipit-source-id: a02413c9f615b00784700646825a9870ee10f3a7 | 17 February 2023, 17:03:37 UTC |
68e4581 | akankshamahajan | 17 February 2023, 02:33:06 UTC | Return NotSupported in scan if IOUring not supported and enable IOUring in db_stress for async_io testing (#11197) Summary: - Return NotSupported in scan if IOUring not supported if async_io is enabled - Enable IOUring in db_stress for async_io testing - Disable async_io in circleci crash testing as circleci doesn't support IOUring Pull Request resolved: https://github.com/facebook/rocksdb/pull/11197 Test Plan: CircleCI jobs Reviewed By: anand1976 Differential Revision: D43096313 Pulled By: akankshamahajan15 fbshipit-source-id: c2c53a87636950c0243038b9f5bd0d91608e4fda | 17 February 2023, 02:33:06 UTC |
64a1f76 | Peter Dillinger | 17 February 2023, 01:22:27 UTC | Customize CompressedSecondaryCache by block kind (#11204) Summary: Added `do_not_compress_roles` to `CompressedSecondaryCacheOptions` to disable compression on certain kinds of block. Filter blocks are now not compressed by CompressedSecondaryCache by default. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11204 Test Plan: unit test added Reviewed By: anand1976 Differential Revision: D43147698 Pulled By: pdillinger fbshipit-source-id: db496975ae975fa18f157f93fe131a16315ac875 | 17 February 2023, 01:22:27 UTC |
88056ea | Peter Dillinger | 16 February 2023, 16:07:45 UTC | Re-add memory_allocator.h include from cache.h (#11229) Summary: Enough users of NewJemallocNodumpAllocator() with cache.h to justify keeping it. (Reverting one little part of https://github.com/facebook/rocksdb/issues/11192) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11229 Test Plan: CI Reviewed By: ajkr Differential Revision: D43337140 Pulled By: pdillinger fbshipit-source-id: 886b27b96b395619a4209f51b9b7787f4fe89e57 | 16 February 2023, 16:07:45 UTC |
ab22e79 | Levi Tamasi | 16 February 2023, 01:08:25 UTC | Support using MultiGetEntity as verification method in stress tests (#11228) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11228 Reviewed By: akankshamahajan15 Differential Revision: D43332120 Pulled By: ltamasi fbshipit-source-id: 15f32cf335aecb7e654da24ecafc6e010dc65194 | 16 February 2023, 01:08:25 UTC |
94ec433 | Levi Tamasi | 15 February 2023, 19:50:18 UTC | Mention the new MultiGetEntity API in HISTORY.md (#11226) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11226 Reviewed By: akankshamahajan15 Differential Revision: D43317602 Pulled By: ltamasi fbshipit-source-id: 4b7e063848d3cfbdb9f0c0f54d68aeab8a82595c | 15 February 2023, 19:50:18 UTC |
9794acb | Levi Tamasi | 15 February 2023, 17:34:17 UTC | Add a new MultiGetEntity API (#11222) Summary: The new `MultiGetEntity` API can be used to get a consistent view of a batch of keys, with the results presented as wide-column entities. Similarly to `GetEntity` and the iterator's `columns` API, if the entry corresponding to the key is a wide-column entity to start with, it is returned as-is, and if it is a plain key-value, it is wrapped into an entity with a single default column. Implementation-wise, the new API shares the logic of the batched `MultiGet` API (via the `MultiGetCommon` methods). Both single-CF and multi-CF `MultiGetEntity` APIs are provided, and blobs are also supported. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11222 Test Plan: `make check` Reviewed By: akankshamahajan15 Differential Revision: D43256950 Pulled By: ltamasi fbshipit-source-id: 47fb2cb7e2d0470e3580f43fdb2fe9e51f0e7005 | 15 February 2023, 17:34:17 UTC |
6d5e860 | akankshamahajan | 15 February 2023, 02:54:47 UTC | Fix regression script for async_io benchmarks (#11224) Summary: Fix regression script for async_io benchmark using incorrect ops and threads and wrong benchmark name during reporting results. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11224 Test Plan: Ran manually Reviewed By: anand1976 Differential Revision: D43287658 Pulled By: akankshamahajan15 fbshipit-source-id: 433e2caa0e51268e72a875549ab8f7f92a7a4216 | 15 February 2023, 02:54:47 UTC |
1969815 | sdong | 14 February 2023, 20:17:23 UTC | Remove docs/Gemfile.lock and update github-pages version (#11173) Summary: One system reports that a dependency in docs/Gemfile.lock is out-of-date and has a risk. I don't see a point of having Gemfile.lock checked in and dealing with dependencies all the time at all. It should be able to regenerated using `bundle install`. Update Gemfile file to a later version too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11173 Test Plan: Run bundle install bundle exec jekyll serve --host=0.0.0.0 and see website working locally. Reviewed By: ajkr Differential Revision: D42897698 fbshipit-source-id: aeaf065c28b8f6582f1af1b5ffbbd5fa194afe24 | 14 February 2023, 20:17:23 UTC |
c19672c | Yu Zhang | 13 February 2023, 21:40:02 UTC | Enable crash test to run BlobDB together with user-defined timestamp (#11199) Summary: I missed a stress test code sanity check when enabling this combination of tests. This PR addresses that, the "iter_start_ts" function for user defined timestamp feature is not supported when BlobDB is enabled. It's disabled for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11199 Test Plan: Locally always enable BlobDB and run tools/db_crashtest.py --stress_cmd=./db_stress --cleanup_cmd='' --enable_ts whitebox --random_kill_odd 888887 Reviewed By: ltamasi Differential Revision: D43245657 Pulled By: jowlyzhang fbshipit-source-id: 4cae19817bb1afd50a76f9e0e49f006fb5c0b211 | 13 February 2023, 21:40:02 UTC |
42d6652 | Wentian Guo | 13 February 2023, 10:21:38 UTC | remove dependency on options.h for port_posix.h andport_win.h (#11214) Summary: The files in `port/`, such as `port_posix.h`, are layering over the system libraries, so shouldn't include the DB-specific files like `options.h`. This PR remove this dependency. # How The reason that `port_posix.h` (or `port_win.h`) include `options.h` is to use `CpuPriority`, as there is a method `SetCpuPriority()` in `port_posix.h` that uses `CpuPriority.` - I think `SetCpuPriority()` make sense to exist in `port_posix.h` as it provides has platform-dependent implementation - `CpuPriority` enum is defined in `env.h`, but used in `rocksdb/include` and `port/`. Hence, let us define `CpuPriority` enum in a common file, say `port_defs.h`, such that both directories `rocksdb/include` and `port/` can include. When we remove this dependency, some other files have compile errors because they can't find definitions, so add header files to resolve # Test make all check -j Pull Request resolved: https://github.com/facebook/rocksdb/pull/11214 Reviewed By: pdillinger Differential Revision: D43196910 Pulled By: guowentian fbshipit-source-id: 70deccb72844cfb08fcc994f76c6ef6df5d55ab9 | 13 February 2023, 10:21:38 UTC |
a72f591 | akankshamahajan | 10 February 2023, 18:13:19 UTC | Fix a minor bug in the regression script during assigning value (#11215) Summary: Same as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/11215 Test Plan: Ran manually Reviewed By: pdillinger Differential Revision: D43194634 Pulled By: akankshamahajan15 fbshipit-source-id: 336a08a9076b222d7000e4eb2a87fc36b863b05b | 10 February 2023, 18:13:19 UTC |
ab2157f | akankshamahajan | 10 February 2023, 02:36:53 UTC | Extend existing benchmarks seekrandom and multiread to run with async_io (#11170) Summary: ======================================================================= Benchmark seekrandom_asyncio ======================================================================= db_bench_cmd=$(which time) -p ./db_bench --benchmarks=seekrandom --db=/tmp/rocksdb/regression_test/db --wal_dir= --use_existing_db=0 --perf_level=1 --disable_auto_compactions --threads=1 --num=1073741824 --reads=1073741824 --writes=1073741824 --deletes=1073741824 --key_size=100 --value_size=900 --cache_size=1073741824 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=10 --stats_per_interval=1 --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --num_high_pri_threads=4 --num_low_pri_threads=16 --seed=1675181789 --multiread_batched=true --batch_size=128 --multiread_stride=12 --async_io=true --optimize_multiget_for_io=false 2>&1 RocksDB: version 8.0.0 ======================================================================= Benchmark multireadrandom_asyncio ==================================================================== db_bench_cmd=$(which time) -p ./db_bench --benchmarks=multireadrandom --db=/tmp/rocksdb/regression_test/db --wal_dir= --use_existing_db=0 --perf_level=1 --disable_auto_compactions --threads=1 --num=1073741824 --reads=1073741824 --writes=1073741824 --deletes=1073741824 --key_size=100 --value_size=900 --cache_size=1073741824 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=10 --stats_per_interval=1 --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --num_high_pri_threads=4 --num_low_pri_threads=16 --seed=1675181841 --multiread_batched=true --batch_size=128 --multiread_stride=12 --async_io=true --optimize_multiget_for_io=true 2>&1 RocksDB: version 8.0.0 Date: Tue Jan 31 08:17:22 2023 CPU: 32 * Intel Xeon Processor (Skylake) CPUCache: 16384 KB Pull Request resolved: https://github.com/facebook/rocksdb/pull/11170 Reviewed By: ajkr, anand1976 Differential Revision: D42889107 Pulled By: akankshamahajan15 fbshipit-source-id: b819be2bd5f00d1db654b9e829b84f11e6bcab92 | 10 February 2023, 02:36:53 UTC |
3cacd4b | Peter Dillinger | 09 February 2023, 20:12:02 UTC | Put Cache and CacheWrapper in new public header (#11192) Summary: The definition of the Cache class should not be needed by the vast majority of RocksDB users, so I think it is just distracting to include it in cache.h, which is primarily needed for configuring and creating caches. This change moves the class to a new header advanced_cache.h. It is just cut-and-paste except for modifying the class API comment. In general, operations on shared_ptr<Cache> should continue to work when only a forward declaration of Cache is available, as long as all the Cache instances provided are already shared_ptr. See https://stackoverflow.com/a/17650101/454544 Also, the most common way to customize a Cache is by wrapping an existing implementation, so it makes sense to provide CacheWrapper in the public API. This was a cut-and-paste job except removing the implementation of Name() so that derived classes must provide it. Intended follow-up: consolidate Release() into one function to reduce customization bugs / confusion Pull Request resolved: https://github.com/facebook/rocksdb/pull/11192 Test Plan: `make check` Reviewed By: anand1976 Differential Revision: D43055487 Pulled By: pdillinger fbshipit-source-id: 7b05492df35e0f30b581b4c24c579bc275b6d110 | 09 February 2023, 20:12:02 UTC |
b7747bb | Peter Dillinger | 09 February 2023, 17:21:55 UTC | Attempt fix flaky DBWriteTest.LockWALInEffect (#11209) Summary: Example failure: ``` [ RUN ] DBWriteTestInstance/DBWriteTest.LockWALInEffect/1 db/db_write_test.cc:646: Failure Put("key3", "value") Corruption: Not active ``` Presumably from a background compaction prior to Put. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11209 Test Plan: watch CI Reviewed By: akankshamahajan15 Differential Revision: D43147727 Pulled By: pdillinger fbshipit-source-id: a1c34ac5ab124bfe2f23205a30777990056e9082 | 09 February 2023, 17:21:55 UTC |
34bb3dd | Peter Dillinger | 09 February 2023, 04:14:57 UTC | Improve SmallEnumSet (#11178) Summary: In anticipation of using this to represent sets of CacheEntryRole for including or excluding kinds of blocks in block cache tiers, add significant new features to SmallEnumSet, including at least: * List initialization * Applicative constexpr operations * copy/move/equality ops * begin/end/const_iterator for iteration * Better comments Pull Request resolved: https://github.com/facebook/rocksdb/pull/11178 Test Plan: unit tests added/expanded Reviewed By: ltamasi Differential Revision: D42973723 Pulled By: pdillinger fbshipit-source-id: 40783486feda931c3f7c6fcc9a300acd6a4b0a0a | 09 February 2023, 04:14:57 UTC |
ee5305f | Peter Dillinger | 09 February 2023, 01:00:46 UTC | Mitigate presumed OOM in CircleCI (#11206) Summary: We've seen many instances of build-linux-static_lib-alt_namespace-status_checked failing like this: ``` g++: fatal error: Killed signal terminated program cc1plus compilation terminated. make: *** [Makefile:2507: utilities/transactions/transaction_test.o] Error 1 ``` It's understandable that so many static linking jobs could exhaust memory. The executor only has 16 vcores, so going from 32 down to 24 shouldn't hurt build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11206 Test Plan: will watch CI Reviewed By: ajkr Differential Revision: D43137246 Pulled By: pdillinger fbshipit-source-id: 050b0f700c285dd913bcae8b4a76a44d04bb0356 | 09 February 2023, 01:00:46 UTC |
77b61ab | anand76 | 08 February 2023, 20:05:49 UTC | Fix bug in WAL streaming uncompression (#11198) Summary: Fix a bug in the calculation of the input buffer address/offset in log_reader.cc. The bug is when consecutive fragments of a compressed record are located at the same offset in the log reader buffer, the second fragment input buffer is treated as a leftover from the previous input buffer. As a result, the offset in the `ZSTD_inBuffer` is not reset. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11198 Test Plan: Add a unit test in log_test.cc that fails without the fix and passes with it. Reviewed By: ajkr, cbi42 Differential Revision: D43102692 Pulled By: anand1976 fbshipit-source-id: aa2648f4802c33991b76a3233c5a58d4cc9e77fd | 08 February 2023, 20:05:49 UTC |
876d281 | Levi Tamasi | 08 February 2023, 00:17:39 UTC | Add compaction filter support for wide-column entities (#11196) Summary: The patch adds compaction filter support for wide-column entities by introducing a new `CompactionFilter` API called `FilterV3`. This API is called for regular key-values, merge operands, and wide-column entities as well. It is passed the existing value/operand or wide-column structure and it can update the value or columns or keep/delete/etc. the key-value as usual. For compatibility, the default implementation of `FilterV3` keeps all wide-column entities and falls back to calling `FilterV2` for plain old key-values and merge operands. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11196 Test Plan: `make check` Reviewed By: akankshamahajan15 Differential Revision: D43094147 Pulled By: ltamasi fbshipit-source-id: 75acabe9a35254f7f404ba6173ee9c2774382ebd | 08 February 2023, 00:17:39 UTC |
6650ca2 | Hui Xiao | 07 February 2023, 22:11:53 UTC | Remove a couple deprecated convenience.h APIs (#11120) Summary: **Context/Summary:** As instructed by convenience.h comments, a few deprecated APIs are removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11120 Test Plan: - make check & CI - eyeball check on test semantics. Reviewed By: pdillinger Differential Revision: D42937507 Pulled By: hx235 fbshipit-source-id: a9e4709387da01b1d0e9148c2e210f02e9746ee1 | 07 February 2023, 22:11:53 UTC |
b5827c8 | Peter Dillinger | 07 February 2023, 19:26:55 UTC | Revert to LIB_MODE=static for optimized builds (#11195) Summary: Continuous performance testing indicates there's a small performance hit with shared library (-fPIC) builds, so while retaining the motivation for https://github.com/facebook/rocksdb/issues/11168, we set the default for DEBUG_LEVEL=0 Makefile builds back to LIB_MODE=static. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11195 Test Plan: CI, with some updated checks and removal of some now obsolete LIB_MODE overrides Reviewed By: cbi42 Differential Revision: D43090576 Pulled By: pdillinger fbshipit-source-id: 755fe5d07005f85caf24e16f90228ffd46a6e250 | 07 February 2023, 19:26:55 UTC |
68fa90c | Symious | 07 February 2023, 18:14:14 UTC | Add kForceOptimized option to jni (#11181) Summary: Currently the option of "KForceOptimized" is not included in CompactRangeOptions.BottommostLevelCompaction. This PR is to add this option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11181 Reviewed By: ajkr Differential Revision: D43056453 Pulled By: cbi42 fbshipit-source-id: 22fd53f980ab1a86c61dd42e948902542065128f | 07 February 2023, 18:14:14 UTC |
54d7208 | Peter Dillinger | 07 February 2023, 03:44:25 UTC | Fix regression_test.sh for LIB_MODE=shared default (#11194) Summary: Need to scp the .so files. Switched to tar+ssh to support symlinks, faster handling of multiple files, and compression. Also fixing some holes in 'make clean' as I've noticed files like 'librocksdb.so.7.7.0', 'librocksdb_test_debug.so', 'librocksdb_tools_debug.so' hanging around after `make clean` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11194 Test Plan: Manually triggered regression test runs with change, manual `make clean` https://fburl.com/sandcastle/gnxy5lvc https://fburl.com/sandcastle/4pxodwh7 Reviewed By: cbi42 Differential Revision: D43069065 Pulled By: pdillinger fbshipit-source-id: 48552b5980956784a1fdb40638d9e8ad6db51900 | 07 February 2023, 03:44:25 UTC |
9b66331 | Hui Xiao | 07 February 2023, 00:10:03 UTC | Simplify TEST_F(DBWALTest, FixSyncWalOnObseletedWalWithNewManifestCausingMissingWAL) (#11186) Summary: **Context/Summary**: Simplify `TEST_F(DBWALTest, FixSyncWalOnObseletedWalWithNewManifestCausingMissingWAL)` based on https://github.com/facebook/rocksdb/pull/11016#pullrequestreview-1205020134 and delete unused sync points. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11186 Test Plan: - UT failed before fix in https://github.com/facebook/rocksdb/pull/10892 and passes after - Check UT not flaky when running with https://app.circleci.com/pipelines/github/facebook/rocksdb/21985/workflows/5f6cc355-78c1-46d8-89ee-0fd679725a8a/jobs/540878 Reviewed By: ajkr Differential Revision: D43034723 Pulled By: hx235 fbshipit-source-id: f503774987b8f3718505f99e95080a7fad28ac66 | 07 February 2023, 00:10:03 UTC |
92e8874 | Peter Dillinger | 06 February 2023, 22:35:15 UTC | Fix more CircleCI jobs for LIB_MODE=shared default (#11193) Summary: There are a set of jobs using libbenchmark that have linker failures with new default LIB_MODE=shared. This change adds build-linux-run-microbench to the set using LIB_MODE=static to work around the linker failures. I haven't dug into how to fix them. There is another set of jobs using folly that have linker failures with new default LIB_MODE=shared. I tried fixing these by adding --shared-libs to the folly build, but that doesn't work. It kinda looks like the folly shared libs build is simply broken with the boost dependency: ``` /usr/bin/ld: /tmp/fbcode_builder_getdeps-ZrootZprojectZthird-partyZfollyZbuildZfbcode_builder-root/installed/boost-Z1Z72zV-c0-0f3HkylpzONnr1dsHYDaR2GyTLzYdkck/lib/libboost_filesystem.a(exception.o): relocation R_X86_64_PC32 against symbol `_ZTVN5boost10filesystem16filesystem_errorE' can not be used when making a shared object; recompile with -fPIC /usr/bin/ld: final link failed: bad value collect2: error: ld returned 1 exit status ``` I tried updating folly to the latest commit and that didn't help. Otherwise, I didn't dig deeper into fixing that so have added build-linux-clang-13-asan-ubsan-with-folly to the set using LIB_MODE=static Also since I saw a flaky failure (not the first time), increased the timeout on build-linux-unity-and-headers job. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11193 Test Plan: CI Reviewed By: cbi42 Differential Revision: D43061203 Pulled By: pdillinger fbshipit-source-id: c641671f93087f0214ea261ea895bccf657cb1a9 | 06 February 2023, 22:35:15 UTC |
4a51900 | Alan Paxton | 06 February 2023, 19:13:39 UTC | CI Benchmarking. Reduce runtime further as overhead appears to have risen. (#11189) Summary: We had miscalculated (not sure if I suddenly can’t count, or if there is something else going on), and need to leave more overhead to get the benchmarks to run reliably under 1 hour. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11189 Reviewed By: cbi42 Differential Revision: D43052045 Pulled By: ajkr fbshipit-source-id: 3fe68432ed76a1f87d34129b0246e6b6a70a49f2 | 06 February 2023, 19:13:39 UTC |
27cf091 | Peter Dillinger | 04 February 2023, 00:49:54 UTC | Fix compile gettid on older Linux (#11184) Summary: Seen only in post-PR CI job benchmark-linux. Some context: https://stackoverflow.com/questions/30680550/c-gettid-was-not-declared-in-this-scope Pull Request resolved: https://github.com/facebook/rocksdb/pull/11184 Test Plan: watch CI Reviewed By: cbi42 Differential Revision: D43013891 Pulled By: pdillinger fbshipit-source-id: 48b3b7231080a0f803fdc36d13946d5524770302 | 04 February 2023, 00:49:54 UTC |
cf756ed | Peter Dillinger | 03 February 2023, 23:28:52 UTC | Use LIB_MODE=shared build by default with make (#11168) Summary: With https://github.com/facebook/rocksdb/issues/11150 this becomes a practical change that I think is overall good for developer efficiency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11168 Test Plan: More efficient build of all unit tests and tools: ``` $ git clean -fdx $ du -sh . 522M . $ /usr/bin/time make -j32 LIB_MODE=static ... 14270.63user 1043.33system 11:19.85elapsed 2252%CPU (0avgtext+0avgdata 1929944maxresident)k ... $ du -sh . 62G . $ ``` Vs. ``` $ git clean -fdx $ du -sh . 522M . $ /usr/bin/time make -j32 LIB_MODE=shared ... 9479.87user 478.26system 7:20.82elapsed 2258%CPU (0avgtext+0avgdata 1929272maxresident)k ... $ du -sh . 5.4G . $ ``` So 1/3 less build time and >90% less space usage. Individual unit test edit-compile-run is not too different. Modifying an average unit test source file: ``` $ touch db/version_builder_test.cc $ /usr/bin/time make -j32 LIB_MODE=static version_builder_test ... 34.74user 3.37system 0:38.29elapsed 99%CPU (0avgtext+0avgdata 945520maxresident)k ``` Vs. ``` $ touch db/version_builder_test.cc $ /usr/bin/time make -j32 LIB_MODE=shared version_builder_test ... 116.26user 43.91system 0:28.65elapsed 559%CPU (0avgtext+0avgdata 675160maxresident)k ``` A little faster with shared. However, modifying an average DB implementation file has an extra linking step with shared lib: ``` $ touch db/db_impl/db_impl_files.cc $ /usr/bin/time make -j32 LIB_MODE=static version_builder_test ... 33.17user 5.13system 0:39.70elapsed 96%CPU (0avgtext+0avgdata 945544maxresident)k ``` Vs. ``` $ touch db/db_impl/db_impl_files.cc $ /usr/bin/time make -j32 LIB_MODE=shared version_builder_test ... 40.80user 4.66system 0:45.54elapsed 99%CPU (0avgtext+0avgdata 1056340maxresident)k ``` A little slower with shared. On the whole, should be faster and lighter weight because of the many unit test files case Reviewed By: cbi42 Differential Revision: D42894004 Pulled By: pdillinger fbshipit-source-id: 9e827e52ace79b86f849b6a24466e318b4b605a7 | 03 February 2023, 23:28:52 UTC |
e17f310 | Peter Dillinger | 03 February 2023, 21:21:03 UTC | Support stack traces with gdb (and debugger invocation) (#11150) Summary: LIB_MODE=shared is much more efficient for building all the unit tests but comes with the downside of ugly stack traces, generally missing name demangling and source line info. Searching the internet suggests the reliable way to get stack traces with dynamic loading is with gdb. This change automatically tries to use gdb to get a stack trace if built with LIB_MODE=shared, and only on Linux because that's where we have the capability to attach to the proper thread. (We could revise the exact conditions in the future.) If there's a failure invoking gdb, it falls back on the old method. Obscure details of making the output reasonable / pretty are in the source code comments. Based on this, it was easy to make it so that running a test command with ROCKSDB_DEBUG=1 would invoke gdb whenever the stack trace handler was invoked, so I included that. Intended follow-up: make LIB_MODE=shared the new default `make` build config Pull Request resolved: https://github.com/facebook/rocksdb/pull/11150 Test Plan: manual, mostly by injecting an "assert(false)" into a unit test and trying different build modes etc. Although gdb is slower to start showing stack trace output, it seems overall faster in many if not most cases, presumably because it doesn't reload the symbol table for each stack entry. At least with parallel test runs, having many tests dumping stacks with the old method can take so long it appears to hang the test run. Reviewed By: cbi42 Differential Revision: D42894064 Pulled By: pdillinger fbshipit-source-id: 608143309d8c69c40049c9a4abcde4f22e87b4d8 | 03 February 2023, 21:21:03 UTC |
0cf1008 | Peter Dillinger | 03 February 2023, 21:00:04 UTC | Deprecate write_global_seqno and default to false (#11179) Summary: This option has long been intended to be set to false by default and deprecated. It might never be practical to completely remove the feature, so that we can continue to test for backward compatibility by keeping the ability to generate DBs in the old way. Also improved API comments. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11179 Test Plan: existing tests (with one tiny update) Reviewed By: hx235 Differential Revision: D42973927 Pulled By: pdillinger fbshipit-source-id: e9bc161cb933266e094aea2dff8cc03753c39dab | 03 February 2023, 21:00:04 UTC |
390cc0b | Peter Dillinger | 03 February 2023, 20:08:37 UTC | Ensure LockWAL() stall cleared for UnlockWAL() return (#11172) Summary: Fixes https://github.com/facebook/rocksdb/issues/11160 By counting the number of stalls placed on a write queue, we can check in UnlockWAL() whether the stall present at the start of UnlockWAL() has been cleared by the end, or wait until it's cleared. More details in code comments and new unit test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11172 Test Plan: unit test added. Yes, it uses sleep to amplify failure on buggy behavior if present, but using a sync point to only allow new behavior would fail with the old code only because it doesn't contain the new sync point. Basically, using a sync point in UnlockWAL() could easily mask a regression by artificially limiting key behaviors. The test would only check that UnlockWAL() invokes code that *should* do the right thing, without checking that it *does* the right thing. Reviewed By: ajkr Differential Revision: D42894341 Pulled By: pdillinger fbshipit-source-id: 15c9da0ca383e6aec845b29f5447d76cecbf46c3 | 03 February 2023, 20:08:37 UTC |
63da9cf | anand76 | 03 February 2023, 00:35:27 UTC | Return any errors returned by ReadAsync to the MultiGet caller (#11171) Summary: Currently, we incorrectly return a Status::Corruption to the MultiGet caller if the file system ReadAsync cannot issue a read and returns an error for some reason, such as IOStatus::NotSupported(). In this PR, we copy the ReadAsync error to the request status so it can be returned to the user. Tests: Update existing unit tests and add a new one for this scenario Pull Request resolved: https://github.com/facebook/rocksdb/pull/11171 Reviewed By: akankshamahajan15 Differential Revision: D42950057 Pulled By: anand1976 fbshipit-source-id: 85ffcb015fa6c064c311f8a28488fec78c487869 | 03 February 2023, 00:35:27 UTC |
701a19c | Yu Zhang | 03 February 2023, 00:22:32 UTC | Enable crash test for user-defined timestamp and BlobDB combination (#11163) Summary: Enable the set of crash test for when user defined timestamp is enabled in combination with BlobDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11163 Test Plan: `make check` and `db_stress`/`db_crashtest.py` with various combinations. Reviewed By: ltamasi Differential Revision: D42906457 Pulled By: jowlyzhang fbshipit-source-id: 6bec6449a4213b536c787420ff30a7d17b676deb | 03 February 2023, 00:22:32 UTC |
fec5c8d | changyubi | 02 February 2023, 23:15:09 UTC | Remove NUMA setting for benchmark-linux (#11180) Summary: benchmark-linux is failing on main branch after https://github.com/facebook/rocksdb/issues/11074 with the following error msg: ``` /usr/bin/time -f '%e %U %S' -o /tmp/benchmark-results/8.0.0/benchmark_overwriteandwait.t1.s0.log.time numactl --interleave=all timeout 1200 ./db_bench --benchmarks=overwrite,waitforcompaction,stats --use_existing_db=1 --sync=0 --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=4 --max_write_buffer_number=8 --undefok=use_blob_cache,use_shared_block_and_blob_cache,blob_cache_size,blob_cache_numshardbits,prepopulate_blob_cache,multiread_batched,cache_low_pri_pool_ratio,prepopulate_block_cache --db=/tmp/rocksdb-benchmark-datadir --wal_dir=/tmp/rocksdb-benchmark-datadir --num=20000000 --key_size=20 --value_size=400 --block_size=8192 --cache_size=10737418240 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=none --bytes_per_sync=1048576 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --cache_low_pri_pool_ratio=0 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --report_interval_seconds=1 --histogram=1 --memtablerep=skip_list --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --num_levels=8 --min_level_to_compress=3 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --duration=600 --threads=1 --merge_operator="put" --seed=1675372532 --report_file=/tmp/benchmark-results/8.0.0/benchmark_overwriteandwait.t1.s0.log.r.csv 2>&1 | tee -a /tmp/benchmark-results/8.0.0/benchmark_overwriteandwait.t1.s0.log /usr/bin/time: cannot run numactl: No such file or directory ``` This PR removes the newly added NUMA setting. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11180 Test Plan: check next main branch run for benchmark-linux Reviewed By: ajkr Differential Revision: D42975930 Pulled By: cbi42 fbshipit-source-id: f084d39aeba9877c0752502e879c5e612b507653 | 02 February 2023, 23:15:09 UTC |
6781009 | Alan Paxton | 02 February 2023, 19:11:40 UTC | CI Benchmarking. Small configuration changes based on performance analysis. (#11074) Summary: First, we made a small reduction in DURATION_RW as runs were exceeding 1 hour and colliding with subsequent runs. See Mark Callaghan’s blog post at http://smalldatum.blogspot.com/2023/01/variance-in-rocksdb-benchmarks-on-cloud.html Configuration parameters which are not consistent with the following email from Mark (see the blog post for more context) have been updated. Where Mark has defined the parameter and we haven't, we define it explicitly. We will need to further monitor for an expected reduction in variance of test times: To match what I did: --- nsecs=1800 dbdir=/data/m/rx resultdir=bm.lc.nt1.cm1.d0 env WRITE_BUFFER_SIZE_MB=16 TARGET_FILE_SIZE_BASE_MB=16 MAX_BYTES_FOR_LEVEL_BASE_MB=64 MAX_BACKGROUND_JOBS=4 NUM_KEYS=20000000 CACHE_SIZE_MB=10240 DURATION_RW=$nsecs DURATION_RO=$nsecs MB_WRITE_PER_SEC=2 NUM_THREADS=1 COMPRESSION_TYPE=none CACHE_INDEX_AND_FILTER_BLOCKS=1 VALUE_SIZE=400 NUMA=1 MIN_LEVEL_TO_COMPRESS=3 COMPACTION_STYLE=leveled bash benchmark_compare.sh $dbdir $resultdir 7.8.fb env WRITE_BUFFER_SIZE_MB=16 TARGET_FILE_SIZE_BASE_MB=16 MAX_BYTES_FOR_LEVEL_BASE_MB=64 MAX_BACKGROUND_JOBS=4 NUM_KEYS=200000000 CACHE_SIZE_MB=10240 DURATION_RW=$nsecs DURATION_RO=$nsecs MB_WRITE_PER_SEC=2 NUM_THREADS=1 COMPRESSION_TYPE=lz4 CACHE_INDEX_AND_FILTER_BLOCKS=1 VALUE_SIZE=400 NUMA=1 MIN_LEVEL_TO_COMPRESS=3 COMPACTION_STYLE=leveled bash benchmark_compare.sh $dbdir $resultdir 7.8.fb Pull Request resolved: https://github.com/facebook/rocksdb/pull/11074 Reviewed By: ajkr Differential Revision: D42969668 Pulled By: cbi42 fbshipit-source-id: 1ea4e6a3901be4016108f93817eb58f74baac21a | 02 February 2023, 19:11:40 UTC |
6af16ac | Andrew Kryczka | 02 February 2023, 15:50:55 UTC | Update HISTORY.md for #11136 (#11177) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11177 Reviewed By: cbi42 Differential Revision: D42948946 Pulled By: ajkr fbshipit-source-id: 783d3d9007faaa036923a0364cdd0bfbd8e78062 | 02 February 2023, 15:50:55 UTC |
df680b2 | Levi Tamasi | 01 February 2023, 18:03:07 UTC | Clean up InvokeFilterIfNeeded a bit (#11174) Summary: The patch makes some code quality enhancements in `CompactionIterator::InvokeFilterIfNeeded` including the renaming of `filter` (which is most likely a remnant of the days before the `FilterV2` API when the compaction filter used to return a boolean) to `decision`, the removal of some outdated comments, the elimination of an `error` flag which was only used in one failure case out of many, as well as some small stylistic improvements. (Some the above will also come in handy when adding compaction filter support for wide-column entities.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11174 Test Plan: `make check` Reviewed By: akankshamahajan15 Differential Revision: D42901408 Pulled By: ltamasi fbshipit-source-id: ab382d59a4990c5dfe1cee219d49e1d80902b666 | 01 February 2023, 18:03:07 UTC |
071c338 | Andrew Kryczka | 01 February 2023, 00:57:49 UTC | Allow canceling manual compaction while waiting for conflicting compaction (#11165) Summary: This PR adds logic to the `RunManualCompaction()` loop to check for cancellation before waiting on any conflicting compactions to finish. In case of cancellation, `RunManualCompaction()` no longer waits on conflicting compactions Pull Request resolved: https://github.com/facebook/rocksdb/pull/11165 Test Plan: repro test case Reviewed By: cbi42 Differential Revision: D42864058 Pulled By: ajkr fbshipit-source-id: ea4dd1a8f294abe212905495a8fbe8f07fca3f5a | 01 February 2023, 00:57:49 UTC |
753d4d5 | Levi Tamasi | 31 January 2023, 18:17:48 UTC | Support using GetEntity as a verification method in the non-batched stress tests (#11144) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11144 Test Plan: Ran a simple blackbox crash test. Reviewed By: akankshamahajan15 Differential Revision: D42791464 Pulled By: ltamasi fbshipit-source-id: 8eb6e62f0bc47f709816136ff3ded0a41d04fab8 | 31 January 2023, 18:17:48 UTC |
a82021c | Levi Tamasi | 31 January 2023, 17:59:25 UTC | Fix a bug where GetEntity would expose a blob reference (#11162) Summary: The patch fixes a feature interaction bug between BlobDB and the `GetEntity` API: without the patch, `GetEntity` would return the blob reference (wrapped into a single-column entity) instead of the actual blob value. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11162 Test Plan: `make check` Reviewed By: akankshamahajan15 Differential Revision: D42854092 Pulled By: ltamasi fbshipit-source-id: f750d0ff57def107da16f545077ddce9860ff21a | 31 January 2023, 17:59:25 UTC |
94e3bee | Peter Dillinger | 31 January 2023, 06:52:30 UTC | Cleanup, improve, stress test LockWAL() (#11143) Summary: The previous API comments for LockWAL didn't provide much about why you might want to use it, and didn't really meet what one would infer its contract was. Also, LockWAL was not in db_stress / crash test. In this change: * Implement a counting semantics for LockWAL()+UnlockWAL(), so that they can safely be used concurrently across threads or recursively within a thread. This should make the API much less bug-prone and easier to use. * Make sure no UnlockWAL() is needed after non-OK LockWAL() (to match RocksDB conventions) * Make UnlockWAL() reliably return non-OK when there's no matching LockWAL() (for debug-ability) * Clarify API comments on LockWAL(), UnlockWAL(), FlushWAL(), and SyncWAL(). Their exact meanings are not obvious, and I don't think it's appropriate to talk about implementation mutexes in the API comments, but about what operations might block each other. * Add LockWAL()/UnlockWAL() to db_stress and crash test, mostly to check for assertion failures, but also checks that latest seqno doesn't change while WAL is locked. This is simpler to add when LockWAL() is allowed in multiple threads. * Remove unnecessary use of sync points in test DBWALTest::LockWal. There was a bug during development of above changes that caused this test to fail sporadically, with and without this sync point change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11143 Test Plan: unit tests added / updated, added to stress/crash test Reviewed By: ajkr Differential Revision: D42848627 Pulled By: pdillinger fbshipit-source-id: 6d976c51791941a31fd8fbf28b0f82e888d9f4b4 | 31 January 2023, 06:52:30 UTC |
36174d8 | sdong | 31 January 2023, 03:45:47 UTC | DB Stress to fix a false assertion (#11164) Summary: Seeting this error in stress test: db_stress: internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:2459: void rocksdb::StressTest::Open(rocksdb::SharedState *): Assertion `txn_db_ == nullptr' failed. Received signal 6 (Aborted) ...... It doesn't appear that txn_db_ is set to nullptr at all. We set ithere. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11164 Test Plan: Run db_stress transaction and non-transation with low kill rate and see restarting without assertion Reviewed By: ajkr Differential Revision: D42855662 fbshipit-source-id: 06816d37cce9c94a81cb54ab238fb73aa102ed46 | 31 January 2023, 03:45:47 UTC |
24ac53d | Yu Zhang | 30 January 2023, 18:21:21 UTC | Use user key on sst file for blob verification for Get and MultiGet (#11105) Summary: Use the user key on sst file for blob verification for `Get` and `MultiGet` instead of the user key passed from caller. Add tests for `Get` and `MultiGet` operations when user defined timestamp feature is enabled in a BlobDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11105 Test Plan: make V=1 db_blob_basic_test ./db_blob_basic_test --gtest_filter="DBBlobTestWithTimestamp.*" Reviewed By: ltamasi Differential Revision: D42716487 Pulled By: jowlyzhang fbshipit-source-id: 5987ecbb7e56ddf46d2467a3649369390789506a | 30 January 2023, 18:21:21 UTC |
79e57a3 | akankshamahajan | 27 January 2023, 22:51:39 UTC | Move ExternalSSTTestEnv to FileSystemWrapper (#11139) Summary: Migrate ExternalSSTTestEnv to FileSystemWrapper Pull Request resolved: https://github.com/facebook/rocksdb/pull/11139 Reviewed By: anand1976 Differential Revision: D42780180 Pulled By: akankshamahajan15 fbshipit-source-id: 9a4448c9fe5186b518235fe11e1a34dcad897cdd | 27 January 2023, 22:51:39 UTC |