https://github.com/facebook/rocksdb

sort by:
Revision Author Date Message Commit Date
fcf3d75 Update HISTORY.md for PR 9273 (#9282) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9282 Reviewed By: akankshamahajan15 Differential Revision: D33027844 Pulled By: ltamasi fbshipit-source-id: 7540d36010414311bc39610fff92a6498be1570c 10 December 2021, 22:56:20 UTC
417828b Update HISTORY.md and version.h for 6.27.3 10 December 2021, 18:26:22 UTC
1251252 Fix segmentation fault in table_options.prepopulate_block_cache when used with partition_filters (#9263) Summary: When table_options.prepopulate_block_cache is set to BlockBasedTableOptions::PrepopulateBlockCache::kFlushOnly and table_options.partition_filters is also set true, then there is segmentation failure when top level filter is fetched because its entered with wrong type in cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9263 Test Plan: Updated unit tests; Ran db_stress: make crash_test -j32 Reviewed By: pdillinger Differential Revision: D32936566 Pulled By: akankshamahajan15 fbshipit-source-id: 8bd79e53830d3e3c1bb79787e1ffbc3cb46d4426 10 December 2021, 17:53:08 UTC
f181158 Fix an issue with MemTableRepFactory::CreateFromString (#9273) Summary: If ignore_unsupported_options=true, then it is possible for MemTableRepFactory::CreateFromString to succeed without setting a result (result=nullptr). This would cause the original value to be overwritten with null and an error would be raised later when PrepareOptions is invoked. Added unit test for this condition. Will add (in another PR unless required by reviewers) comparable tests for all of the other Customizable classes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9273 Reviewed By: ltamasi Differential Revision: D32990365 Pulled By: mrambacher fbshipit-source-id: b150724c3f5ae7346357b3866244fd93466875c7 09 December 2021, 21:40:03 UTC
1217630 Fix GetOptionsPtr for Wrapped Customizable; Allow null options map (#9213) Summary: 1. Fix GetOptionsPtr for Wrapped (Inner() != nullptr) Customizable objects. This allows the inner options to be returned via this method. 2. Allow the option type map to be nullptr. This allows objects to be registered as options (for GetOptionsPtr) but not be used by the configuration methods. Added tests as appropriate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9213 Reviewed By: zhichao-cao Differential Revision: D32718882 Pulled By: mrambacher fbshipit-source-id: 563203d1f006a2629060feb31c5dff9a233e1e83 09 December 2021, 21:39:49 UTC
8e58830 Fix fstatfs call for compilation on 32 bit systems (#9251) Summary: On some 32-bit systems, BTRFS_SUPER_MAGIC is unsigned while __fsword_t is signed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9251 Reviewed By: ajkr Differential Revision: D32961651 Pulled By: pdillinger fbshipit-source-id: 78e85fc1336f304a21e4d5961e60957c90daed63 09 December 2021, 07:16:24 UTC
5ac7843 Update version to 6.27.2 01 December 2021, 19:43:55 UTC
0ddd74c History Update for #9234 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: 01 December 2021, 19:40:09 UTC
db04357 Fix bug in rocksdb internal automatic prefetching (#9234) Summary: After introducing adaptive_readahead, the original flow got broken. Readahead size was set to 0 because of which rocksdb wasn't be able to do automatic prefetching which it enables after seeing sequential reads. This PR fixes it. ---------------------------------------------------------------------------------------------------- Before this patch: b_bench -use_existing_db=true -db=/tmp/prefix_scan -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 6.27 Date: Tue Nov 30 11:56:50 2021 CPU: 24 * Intel Core Processor (Broadwell) CPUCache: 16384 KB Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 512 bytes each (256 bytes after compression) Entries: 5000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 2594.0 MB (estimated) FileSize: 1373.3 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: Snappy Compression sampling rate: 0 Memtablerep: SkipListFactory Perf Level: 1 WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ DB path: [/tmp/prefix_scan] seekrandom : 5356367.174 micros/op 0 ops/sec; 29.4 MB/s (23 of 23 found) ---------------------------------------------------------------------------------------------------- After the patch: ./db_bench -use_existing_db=true -db=/tmp/prefix_scan -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 6.27 Date: Tue Nov 30 14:38:33 2021 CPU: 24 * Intel Core Processor (Broadwell) CPUCache: 16384 KB Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 512 bytes each (256 bytes after compression) Entries: 5000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 2594.0 MB (estimated) FileSize: 1373.3 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: Snappy Compression sampling rate: 0 Memtablerep: SkipListFactory Perf Level: 1 WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ DB path: [/tmp/prefix_scan] seekrandom : 456504.277 micros/op 2 ops/sec; 359.8 MB/s (264 of 264 found) Pull Request resolved: https://github.com/facebook/rocksdb/pull/9234 Test Plan: Ran ./db_bench -db=/data/mysql/rocksdb/prefix_scan -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000000 -use_d irect_io_for_flush_and_compaction=true -target_file_size_base=16777216 and then ./db_bench -use_existing_db=true -db=/data/mysql/rocksdb/prefix_scan -benchmarks="seekrandom" -key_size=32 -value_siz e=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 and compared the results. Reviewed By: anand1976 Differential Revision: D32743965 Pulled By: akankshamahajan15 fbshipit-source-id: b950fba68c91963b7deb5c20acdf471bc60251f5 01 December 2021, 17:19:55 UTC
e15c8e6 Update version to 6.27.1 29 November 2021, 17:37:41 UTC
0b31668 HISTORY for #9208 Summary: Update HISTORY for bug fix. Test Plan: n/a 29 November 2021, 17:36:44 UTC
57a817d Fix bug affecting GetSortedWalFiles, Backups, Checkpoint (#9208) Summary: Saw error like this: `Backup failed -- IO error: No such file or directory: While opening a file for sequentially reading: /dev/shm/rocksdb/rocksdb_crashtest_blackbox/004426.log: No such file or directory` Unfortunately, GetSortedWalFiles (used by Backups, Checkpoint, etc.) relies on no file deletions happening while its operating, which means not only disabling (more) deletions, but ensuring any pending deletions are completed. Two fixes related to this: * There was a gap in several places between decrementing pending_purge_obsolete_files_ and incrementing bg_purge_scheduled_ where the db mutex would be released and GetSortedWalFiles (and others) could get false information that no deletions are pending. * The fix to https://github.com/facebook/rocksdb/issues/8591 (disabling deletions in GetSortedWalFiles) seems incomplete because it doesn't prevent pending deletions from occuring during the operation (if deletions not already disabled, the case that was to be fixed by the change). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9208 Test Plan: existing tests (it's hard to write a test for interleavings that are now excluded - this is what stress test is for) Reviewed By: ajkr Differential Revision: D32630675 Pulled By: pdillinger fbshipit-source-id: a121e3da648de130cd24d44c524232f4eb22f178 29 November 2021, 17:21:19 UTC
8e43279 More improvements to output for CircleCI (#9201) Summary: More follow-up to https://github.com/facebook/rocksdb/issues/9193 + https://github.com/facebook/rocksdb/issues/9188 * Even though we need to print ETA updates to avoid hitting the 10min timeout, we need to avoid printing an update if there's no actual progress, so that hung tests will timeout after 10 min rather than 5 hours. * When there is a hung test, it's really annoying to track down which test is hung, so if no progress is observed for 1 minute, we run ps once to show what is running. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9201 Test Plan: manual and CircleCI Reviewed By: jay-zhuang Differential Revision: D32612028 Pulled By: pdillinger fbshipit-source-id: 00f8ea70fc5fec9ede28ff74287d90fc73854aad 29 November 2021, 17:19:49 UTC
c48bae0 Fix internal build error (#9195) Summary: Internal build reported: ``` rocksdb/listener.h:470:3: error: extra ';' inside a struct [-Werror,-Wextra-semi] ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9195 Test Plan: import to fbcode and compile. Reviewed By: akankshamahajan15 Differential Revision: D32590138 Pulled By: riversand963 fbshipit-source-id: ca4ed9cca210a1a9a12d3de17c789ef9151c57e8 22 November 2021, 17:40:00 UTC
d561432 Fix some CI output (#9193) Summary: Address some issues with https://github.com/facebook/rocksdb/issues/9188 * Internal CI doesn't render \r as anything, so use \n for "not connected to terminal" case * CircleCI apparently uses a pseudo-tty for output and although rerdirect stdout (because of EAGAIN bug) we don't redirect stderr, so it is detected as a terminal-connected case. Fix by redirecting stderr also. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9193 Test Plan: manual, CI Reviewed By: akankshamahajan15 Differential Revision: D32581128 Pulled By: pdillinger fbshipit-source-id: 5ae7c3209128d8dbd4153c5b9fdb2b810e6deb2e 20 November 2021, 18:21:58 UTC
4340e1f Disable the QPS verification in test temporally (#9190) Summary: Disable the QPS verification in test temporally, which causes the test failure due to different system delays. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9190 Test Plan: make check Reviewed By: siying Differential Revision: D32576289 Pulled By: zhichao-cao fbshipit-source-id: 1df972e77dd82eed5af3462e5db5e141aadf8fae 20 November 2021, 07:47:43 UTC
8101643 Update HISTORY and version.h for 6.27 release (#9192) Summary: As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9192 Reviewed By: ltamasi Differential Revision: D32578141 Pulled By: riversand963 fbshipit-source-id: 16216451c87e383ca8fd309acf15106e46172aaa 20 November 2021, 06:11:56 UTC
3a9f557 Update HISTORY for PR 9187 (#9191) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9191 Reviewed By: riversand963 Differential Revision: D32577939 Pulled By: ltamasi fbshipit-source-id: 3c52067a0c3e9219c1aafdb711718dfcce5dedf5 20 November 2021, 04:07:11 UTC
43ac7a2 Fix an assertion failure when ManifestTailer switches to new Manifest in multi-cf mode (#9143) Summary: Original unit test fail to test the case of multi-cf mode switching to new manifest. The assertion failure will trigger when the primary instance reopens and secondary continues to tail the newly-created MANIFEST. Fix the assertion failure and update existing unit tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9143 Test Plan: make check Reviewed By: ltamasi Differential Revision: D32574233 Pulled By: riversand963 fbshipit-source-id: 857ddbe994019091276458abebcf8e2b65340468 20 November 2021, 03:53:40 UTC
dc5de45 Support readahead during compaction for blob files (#9187) Summary: The patch adds a new BlobDB configuration option `blob_compaction_readahead_size` that can be used to enable prefetching data from blob files during compaction. This is important when using storage with higher latencies like HDDs or remote filesystems. If enabled, prefetching is used for all cases when blobs are read during compaction, namely garbage collection, compaction filters (when the existing value has to be read from a blob file), and `Merge` (when the value of the base `Put` is stored in a blob file). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9187 Test Plan: Ran `make check` and the stress/crash test. Reviewed By: riversand963 Differential Revision: D32565512 Pulled By: ltamasi fbshipit-source-id: 87be9cebc3aa01cc227bec6b5f64d827b8164f5d 20 November 2021, 01:53:47 UTC
cd4ea67 Fix backward compatibility with 2.5 through 2.7 (#9189) Summary: A bug in https://github.com/facebook/rocksdb/issues/9163 can cause checksum verification to fail if parsing a properties block fails. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9189 Test Plan: check_format_compatible.sh (never quite works locally but this particular case seems fixed using variants of SHORT_TEST=1). And added new unit test case. Reviewed By: ajkr Differential Revision: D32574626 Pulled By: pdillinger fbshipit-source-id: 6fa5c8595737b71a3c3d011a52daf6d6c08715d7 20 November 2021, 01:31:01 UTC
6cde8d2 Deprecating `iter_start_seqnum` and `preserve_deletes` (#9091) Summary: `ReadOptions::iter_start_seqnum` and `DBOptions::preserve_deletes` are deprecated, please try using user defined timestamp feature instead. The feature is used to support differential snapshots, but not well maintained (https://github.com/facebook/rocksdb/issues/6837, https://github.com/facebook/rocksdb/issues/8472) and the interface is not user friendly which returns an internal key from the iterator. The user defined timestamp feature is a more flexible feature to support similar usecase, please switch to that if you have such usecase. The deprecated feature will be removed in a future release. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9091 Test Plan: check LOG Fix https://github.com/facebook/rocksdb/issues/9090 Reviewed By: ajkr Differential Revision: D32071750 Pulled By: jay-zhuang fbshipit-source-id: b882c4668dd1bf26ce03c4c192f1bba584bf6104 20 November 2021, 00:55:45 UTC
3ce4d4f Print failures in parallel `make check` (#9188) Summary: Generating megabytes of successful test output has caused issues / inconveniences for CI and internal sandcastle runs. This changes their configuration to only print output from failed tests. (Successful test output is still available in files under t/.) This likewise changes default behavior of parallel `make check` as a quick team poll showed interest in that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9188 Test Plan: Seed some test failures and observe * `make -j24 check` (new behavior) * `PRINT_PARALLEL_OUTPUTS=1 make -j24 check` (old CI behavior) * `QUIET_PARALLEL_TESTS=1 make -j24 check` (old manual run behavior) Reviewed By: siying Differential Revision: D32567392 Pulled By: pdillinger fbshipit-source-id: 8d8fb64aebd16bca103b11e3bd1f13c488a69611 20 November 2021, 00:55:45 UTC
ad40b0b Some small changes to RocksJava build (#9186) Summary: 1. Added a target for building a bundle jar for Sonatype Nexus - sometimes if the OSS Maven Central is misbehaving, it is quicker to upload a bundle to be processed for release. 2. Simplify the publish code by using a for-loop. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9186 Reviewed By: jay-zhuang Differential Revision: D32564469 Pulled By: ajkr fbshipit-source-id: aaceac27e9143fb65b61dad2a46df346586672cd 19 November 2021, 20:23:15 UTC
e12753e Track each SST's timestamp information as user properties (#9093) Summary: Track each SST's timestamp information as user properties https://github.com/facebook/rocksdb/issues/8959 Rockdb has supported user-defined timestamp feature. Application can specify a timestamp when writing each k-v pair. When data flush from memory to disk file called SST files. Each SST files consist of multiple data blocks and several metadata blocks. Among the metadata blocks, there is one called Properties block that tracks some pre-defined properties of this SST file. This PR is for collecting the properties of min and max timestamps of all keys in the file. With those properties the SST file is more convenient to tell whether the keys in the SST have timestamps or not. The changes involved are as follows: 1) Add a class TimestampTablePropertiesCollector to collect min/max timestamp when add keys to table, The way TimestampTablePropertiesCollector use to compare timestamp of key should defined by user by implementing the Comparator::CompareTimestamp function in the user defined comparator. 2) Add corresponding unit tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9093 Reviewed By: ltamasi Differential Revision: D32406927 Pulled By: riversand963 fbshipit-source-id: 25922971b7e67bacf4d53a1fb67c4c5ddaa61573 19 November 2021, 19:37:06 UTC
12117b2 Fix flaky DBTest2.RateLimitedCompactionReads (#9185) Summary: DBTest2.RateLimitedCompactionReads sometime shows following failure: what(): db/db_test2.cc:3976: Failure Expected equality of these values: i + 1 Which is: 4 NumTableFilesAtLevel(0) Which is: 0 The assertion itself doesn't appear to be correct. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9185 Test Plan: Removing an assertion shouldn't break anything. Reviewed By: ajkr Differential Revision: D32549530 fbshipit-source-id: 9993372d8af89161f903337a13f3e316e690a6b8 19 November 2021, 18:08:59 UTC
1e8322c Fix a bug in FlushJob picking more memtables beyond synced WALs (#9142) Summary: After RocksDB 6.19 and before this PR, RocksDB FlushJob may pick more memtables to flush beyond synced WALs. This can be problematic if there are multiple column families, since it can prematurely advance the flushed column family's log_number. Should subsequent attempts fail to sync the latest WALs and the database goes through a recovery, it may detect corrupted WAL number below the flushed column family's log number and complain about column family inconsistency. To fix, we record the maximum memtable ID of the column family being flushed. Then we call SyncClosedLogs() so that all closed WALs at the time when memtable ID is recorded will be synced. I also disabled a unit test temporarily due to reasons described in https://github.com/facebook/rocksdb/issues/9151 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9142 Test Plan: make check Reviewed By: ajkr Differential Revision: D32299956 Pulled By: riversand963 fbshipit-source-id: 0da75888177d91905cf8c9d00605b73afb5970a7 19 November 2021, 17:56:00 UTC
8cf4294 Adhere to per-DB concurrency limit when bottom-pri compactions exist (#9179) Summary: - Fixed bug where bottom-pri manual compactions were counting towards `bg_compaction_scheduled_` instead of `bg_bottom_compaction_scheduled_`. It seems to have no negative effect. - Fixed bug where automatic compaction scheduling did not consider `bg_bottom_compaction_scheduled_`. Now automatic compactions cannot be scheduled that exceed the per-DB compaction concurrency limit (`max_compactions`) when some existing compactions are bottommost. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9179 Test Plan: new unit test for manual/automatic. Also verified the existing automatic/automatic test ("ConcurrentBottomPriLowPriCompactions") hanged until changing it to explicitly enable concurrency. Reviewed By: riversand963 Differential Revision: D32488048 Pulled By: ajkr fbshipit-source-id: 20c4c0693678e81e43f85ed3cc3402fcf26e3310 19 November 2021, 01:31:50 UTC
4a7c1dc Add listener API that notifies on IOError (#9177) Summary: Add a new API in listener.h that notifies about IOErrors on Read/Write/Append/Flush etc. The API reports about IOStatus, filename, Operation name, offset and length. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9177 Test Plan: Added new unit tests Reviewed By: anand1976 Differential Revision: D32470627 Pulled By: akankshamahajan15 fbshipit-source-id: 189a717033590ae227b3beae8b1e7e185e4cdc12 19 November 2021, 01:11:19 UTC
d949323 Check that newIteratorWithBase regardless of WBWI Overwrite Mode (#8134) Summary: The behaviour of WBWI has changed when calling newIteratorWithBase when overwrite is set to true or false. This PR simply adds tests to assert the new correct behaviour. Closes https://github.com/facebook/rocksdb/issues/7370 Closes https://github.com/facebook/rocksdb/pull/8134 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9107 Reviewed By: pdillinger Differential Revision: D32099475 Pulled By: mrambacher fbshipit-source-id: 245f483f73db866cc8a51219a2bff2e09e59faa0 18 November 2021, 19:53:09 UTC
230660b Improve / clean up meta block code & integrity (#9163) Summary: * Checksums are now checked on meta blocks unless specifically suppressed or not applicable (e.g. plain table). (Was other way around.) This means a number of cases that were not checking checksums now are, including direct read TableProperties in Version::GetTableProperties (fixed in meta_blocks ReadTableProperties), reading any block from PersistentCache (fixed in BlockFetcher), read TableProperties in SstFileDumper (ldb/sst_dump/BackupEngine) before table reader open, maybe more. * For that to work, I moved the global_seqno+TableProperties checksum logic to the shared table/ code, because that is used by many utilies such as SstFileDumper. * Also for that to work, we have to know when we're dealing with a block that has a checksum (trailer), so added that capability to Footer based on magic number, and from there BlockFetcher. * Knowledge of trailer presence has also fixed a problem where other table formats were reading blocks including bytes for a non-existant trailer--and awkwardly kind-of not using them, e.g. no shared code checking checksums. (BlockFetcher compression type was populated incorrectly.) Now we only read what is needed. * Minimized code duplication and differing/incompatible/awkward abstractions in meta_blocks.{cc,h} (e.g. SeekTo in metaindex block without parsing block handle) * Moved some meta block handling code from table_properties*.* * Moved some code specific to block-based table from shared table/ code to BlockBasedTable class. The checksum stuff means we can't completely separate it, but things that don't need to be in shared table/ code should not be. * Use unique_ptr rather than raw ptr in more places. (Note: you can std::move from unique_ptr to shared_ptr.) Without enhancements to GetPropertiesOfAllTablesTest (see below), net reduction of roughly 100 lines of code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9163 Test Plan: existing tests and * Enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to verify that checksums are now checked on direct read of table properties by TableCache (new test would fail before this change) * Also enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to test putting table properties under old meta name * Also generally enhanced that same test to actually test what it was supposed to be testing already, by kicking things out of table cache when we don't want them there. Reviewed By: ajkr, mrambacher Differential Revision: D32514757 Pulled By: pdillinger fbshipit-source-id: 507964b9311d186ae8d1131182290cbd97a99fa9 18 November 2021, 19:43:44 UTC
f429466 Fix the analyzer test failure caused by inaccurate timing wait (#9181) Summary: Fix the analyzer test failure caused by inaccurate timing wait. The wait time at different system might be different or cause the delay, now we do not accurately count the lines. Only in a very rare extreme case, test will ignore the part exceed the timing of 1 second. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9181 Test Plan: make check Reviewed By: pdillinger Differential Revision: D32511319 Pulled By: zhichao-cao fbshipit-source-id: e694c8cb465c750cfa5a43dab3eff6707b9a11c8 18 November 2021, 19:28:38 UTC
74544d5 Account Bloom/Ribbon filter construction memory in global memory limit (#9073) Summary: Note: This PR is the 4th part of a bigger PR stack (https://github.com/facebook/rocksdb/pull/9073) and will rebase/merge only after the first three PRs (https://github.com/facebook/rocksdb/pull/9070, https://github.com/facebook/rocksdb/pull/9071, https://github.com/facebook/rocksdb/pull/9130) merge. **Context:** Similar to https://github.com/facebook/rocksdb/pull/8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`. The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](https://github.com/facebook/rocksdb/pull/9073#discussion_r741548409) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option. **Summary:** - Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction: - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance), - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`), - final filter (i.e, `mutable_buf`'s size). - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as explicitly delete the filter bits builder when done with the final filter in block based table. - Added option fo run `filter_bench` with this memory reservation feature Pull Request resolved: https://github.com/facebook/rocksdb/pull/9073 Test Plan: - Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory` - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally - Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter - Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below: - FastLocalBloom - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`: - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0) - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`: - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0) - new feature of RibbonFilter with fallback (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` : - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0) - Ribbon128 - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`: - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0) - new feature (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `: - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0) - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` : - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0) - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console. Reviewed By: pdillinger Differential Revision: D31991348 Pulled By: hx235 fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e 18 November 2021, 17:42:20 UTC
4f678b5 Don't allow parallel crash_test in Makefile (#9180) Summary: Using deps for running blackbox and whitebox allows them to be parallelized, which doesn't seem to be working well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9180 Test Plan: make -j24 crash_test Reviewed By: siying Differential Revision: D32500851 Pulled By: pdillinger fbshipit-source-id: 364288c8d023b93e7ca2724ea40edae2f4eb0407 17 November 2021, 19:19:48 UTC
c9539ed Fix integer overflow in TraceOptions (#9157) Summary: Hello from a happy user of rocksdb java :-) Default constructor of TraceOptions is supposed to initialize size to 64GB but the expression contains an integer overflow. Simple test case with JShell: ``` jshell> 64 * 1024 * 1024 * 1024 $1 ==> 0 jshell> 64L * 1024 * 1024 * 1024 $2 ==> 68719476736 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9157 Reviewed By: pdillinger, zhichao-cao Differential Revision: D32369273 Pulled By: mrambacher fbshipit-source-id: 6a0c95fff7a91f27ff15d65b662c6b101756b450 17 November 2021, 16:41:48 UTC
2225f06 Remove incremental ID from background thread pool names (#9165) Summary: `pthread_setname_np()` fails on attempts to assign oversized names like "rocksdb:bottom10", which resulted in some thread name updates being lost. We do not need the ID suffix so I removed it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9165 Test Plan: ``` $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -max_background_flushes=123 -max_background_compactions=456 -num_bottom_pri_threads=789 -duration=60 ``` While above is running: ``` $ ps -o 'comm' -Lp `pidof db_bench` | grep '^rocksdb:' | sort | uniq -c 789 rocksdb:bottom 123 rocksdb:high 456 rocksdb:low ``` Reviewed By: pdillinger Differential Revision: D32415077 Pulled By: ajkr fbshipit-source-id: a0e013101e26a78bc5eca73509293ef4bf22254f 17 November 2021, 02:26:12 UTC
d95ffba Parallelize sandcastle tests (#9178) Summary: Some tests are timing out, so increase parallelism but also keep test output on console Large credit to jay-zhuang who is currently unavailable to revise & land https://github.com/facebook/rocksdb/issues/9135 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9178 Test Plan: make check and valgrind_test, with and without PRINT_PARALLEL_OUTPUTS=1 https://www.internalfb.com/intern/sandcastle/group/nonce/6211105410048528/ Reviewed By: riversand963 Differential Revision: D32476680 Pulled By: pdillinger fbshipit-source-id: 8844947416e5baf4435e1ef6e33aeecc45621a89 17 November 2021, 00:45:13 UTC
b694cd0 Add tiered storage related read bytes stats to Statistic (#9123) Summary: Add the 3 read bytes counter to the Statistic, which will be used by storage tiering and get the information for files with different temperature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9123 Test Plan: added new testing cases. Reviewed By: siying Differential Revision: D32154745 Pulled By: zhichao-cao fbshipit-source-id: b7905d6dae469a72428742364ec07b634b6f15da 16 November 2021, 23:17:17 UTC
1178d34 Fix portable mac shared_library ld flags (#9149) Summary: Move the 'macosx-version-min' arg to the front of PLATFORM_SHARED_LDFLAGS so that it doesn't get concatenated with the library name. Fixes https://github.com/facebook/rocksdb/issues/9146 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9149 Reviewed By: mrambacher Differential Revision: D32396101 Pulled By: pdillinger fbshipit-source-id: aefcf53384e64d399049f158779acc3a4e54a8fe 16 November 2021, 20:17:17 UTC
f8c685c Check for and disallow shared key space in block caches (#9172) Summary: We have three layers of block cache that often use the same key but map to different physical data: * BlockBasedTableOptions::block_cache * BlockBasedTableOptions::block_cache_compressed * BlockBasedTableOptions::persistent_cache If any two of these happen to share an underlying implementation and key space (insertion into one shows up in another), then memory safety is broken. The simplest case is block_cache == block_cache_compressed. (Credit mrambacher for asking about this case in a review.) With this change, we explicitly check for overlap and preemptively and safely fail with a Status code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9172 Test Plan: test added. Crashes without new check Reviewed By: anand1976 Differential Revision: D32465659 Pulled By: pdillinger fbshipit-source-id: 3876b45b6dce6167e5a7a642725ddc86b96f8e40 16 November 2021, 19:16:05 UTC
28f54e7 fix compile errors in db/kv_checksum.h (#9173) Summary: When defining a template class, the constructor should be specified simply using the class name; it does not take template arguments.a Apparently older versions of gcc and clang did not complain about this syntax, but gcc 11.x and recent versions of clang both complain about this file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9173 Test Plan: When building with platform010 I got compile errors in this file both in `mode/dev` (clang) and in `mode/opt-gcc`. This diff fixes the compile failures. Reviewed By: ajkr Differential Revision: D32455881 Pulled By: simpkins fbshipit-source-id: 0682910d9e2cdade94ce1e77973d47ac04d9f7e2 16 November 2021, 18:20:50 UTC
230f18b Improve parallel test suite runner (#9160) Summary: * Parallel `make check` would pass if a test binary failed to list gtest tests. This is now likely to report as a failure. * Crazy perl was generating some extra incorrect test names causing extra files and binary invocations. Fixed with cleaner awk. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9160 Test Plan: For first part, add an 'assert(false);' to start of hash_test main and see 'make check' pass before, and fail after. For second part, inspect t/ directory before vs. after. Number of executed tests is same: $ cat log* | grep 'PASSED.*test' | awk '{ tot += $4; } END { print tot; }' 10469 Reviewed By: ajkr Differential Revision: D32372006 Pulled By: pdillinger fbshipit-source-id: 185b3db2b67e3f9198eb75322e4d0493e4fc1beb 16 November 2021, 17:59:21 UTC
cff7819 Fix BackupEngine's internal callers of GenericRateLimiter::Request() not honoring bytes <= GetSingleBurstBytes() (#9063) Summary: **Context:** Some existing internal calls of `GenericRateLimiter::Request()` in backupable_db.cc and newly added internal calls in https://github.com/facebook/rocksdb/pull/8722/ do not make sure `bytes <= GetSingleBurstBytes()` as required by rate_limiter https://github.com/facebook/rocksdb/blob/master/include/rocksdb/rate_limiter.h#L47. **Impacts of this bug include:** (1) In debug build, when `GenericRateLimiter::Request()` requests bytes greater than `GenericRateLimiter:: kMinRefillBytesPerPeriod = 100` byte, process will crash due to assertion failure. See https://github.com/facebook/rocksdb/pull/9063#discussion_r737034133 and for possible scenario (2) In production build, although there will not be the above crash due to disabled assertion, the bug can lead to a request of small bytes being blocked for a long time by a request of same priority with insanely large bytes from a different thread. See updated https://github.com/facebook/rocksdb/wiki/Rate-Limiter ("Notice that although....the maximum bytes that can be granted in a single request have to be bounded...") for more info. There is an on-going effort to move rate-limiting to file wrapper level so rate limiting in `BackupEngine` and this PR might be made obsolete in the future. **Summary:** - Implemented loop-calling `GenericRateLimiter::Request()` with `bytes <= GetSingleBurstBytes()` as a static private helper function `BackupEngineImpl::LoopRateLimitRequestHelper` -- Considering make this a util function in `RateLimiter` later or do something with `RateLimiter::RequestToken()` - Replaced buggy internal callers with this helper function wherever requested byte is not pre-limited by `GetSingleBurstBytes()` - Removed the minimum refill bytes per period enforced by `GenericRateLimiter` since it is useless and prevents testing `GenericRateLimiter` for extreme case with small refill bytes per period. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9063 Test Plan: - Added a new test that failed the assertion before this change and now passes - It exposed bugs in [the write during creation in `CopyOrCreateFile()`](https://github.com/facebook/rocksdb/blob/df7cc66e171dfa665e34d293717242784195e1da/utilities/backupable/backupable_db.cc#L2034-L2043), [the read of table properties in `GetFileDbIdentities()`](https://github.com/facebook/rocksdb/blob/df7cc66e171dfa665e34d293717242784195e1da/utilities/backupable/backupable_db.cc#L2372-L2378), [some read of metadata in `BackupMeta::LoadFromFile()`](https://github.com/facebook/rocksdb/blob/df7cc66e171dfa665e34d293717242784195e1da/utilities/backupable/backupable_db.cc#L2726) - Passing Existing tests Reviewed By: ajkr Differential Revision: D31824535 Pulled By: hx235 fbshipit-source-id: d2b3dea7a64e2a4b1e6a59fca322f0800a4fcbcc 16 November 2021, 17:52:16 UTC
2035798 Update TransactionUtil::CheckKeyForConflict to also use timestamps (#9162) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9162 Existing TransactionUtil::CheckKeyForConflict() performs only seq-based conflict checking. If user-defined timestamp is enabled, it should perform conflict checking based on timestamps too. Update TransactionUtil::CheckKey-related methods to verify the timestamp of the latest version of a key is smaller than the read timestamp. Note that CheckKeysForConflict() is not updated since it's used only by optimistic transaction, and we do not plan to update it in this upcoming batch of diffs. Existing GetLatestSequenceForKey() returns the sequence of the latest version of a specific user key. Since we support user-defined timestamp, we need to update this method to also return the timestamp (if enabled) of the latest version of the key. This will be needed for snapshot validation. Reviewed By: ltamasi Differential Revision: D31567960 fbshipit-source-id: 2e4a14aed267435a9aa91bc632d2411c01946d44 15 November 2021, 20:52:18 UTC
9bb13c5 Use system-wide thread ID in info log lines (#9164) Summary: This makes it easier to debug with tools like `ps`. The change only applies to builds with glibc 2.30+ and _GNU_SOURCE extensions enabled. We could adopt it in more cases by using the syscall but this is enough for our build. Replaces https://github.com/facebook/rocksdb/issues/2973. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9164 Test Plan: - ran some benchmarks and correlated logged thread IDs with those shown by `ps -L`. - verified no noticeable regression in throughput for log heavy (more than 700k log lines and over 5k / second) scenario. Benchmark command: ``` $ TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom -compression_type=none -max_bytes_for_level_multiplier=2 -write_buffer_size=262144 -num_levels=7 -max_bytes_for_level_base=2097152 -target_file_size_base=524288 -level_compaction_dynamic_level_bytes=true -max_background_jobs=12 -num=20000000 ``` Results before: 15.9MB/s, 15.8MB/s, 16.0MB/s Results after: 16.3MB/s, 16.3MB/s, 15.8MB/s - Rely on CI to test the fallback behavior Reviewed By: riversand963 Differential Revision: D32399660 Pulled By: ajkr fbshipit-source-id: c24d44fdf7782faa616ef0a0964eaca3539d9c24 13 November 2021, 03:46:06 UTC
3295e9f Clarify max_write_buffer_size_to_maintain (#9154) Summary: I was unable to figure out the behavior by reading the old doc so attempted to write it differently. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9154 Reviewed By: riversand963 Differential Revision: D32338843 Pulled By: ajkr fbshipit-source-id: e1e67720cd92572b195583e5ea2c592180d4fefd 13 November 2021, 03:31:03 UTC
8689936 Fix backward compatibility breakage in FileSystemWrapper (#9156) Summary: Implement the Name() method in FileSystemWrapper, since https://github.com/facebook/rocksdb/issues/8649 removed it and it can cause compilation failures. We can deprecate it in RocksDB 7.0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9156 Reviewed By: riversand963 Differential Revision: D32363977 Pulled By: anand1976 fbshipit-source-id: 1e5a2fec2ab0649255720d89abf5bac26bb64ded 12 November 2021, 01:59:18 UTC
17ce1ca Reuse internal auto readhead_size at each Level (expect L0) for Iterations (#9056) Summary: RocksDB does auto-readahead for iterators on noticing more than two sequential reads for a table file if user doesn't provide readahead_size. The readahead starts at 8KB and doubles on every additional read up to max_auto_readahead_size. However at each level, if iterator moves over next file, readahead_size starts again from 8KB. This PR introduces a new ReadOption "adaptive_readahead" which when set true will maintain readahead_size at each level. So when iterator moves from one file to another, new file's readahead_size will continue from previous file's readahead_size instead of scratch. However if reads are not sequential it will fall back to 8KB (default) with no prefetching for that block. 1. If block is found in cache but it was eligible for prefetch (block wasn't in Rocksdb's prefetch buffer), readahead_size will decrease by 8KB. 2. It maintains readahead_size for L1 - Ln levels. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9056 Test Plan: Added new unit tests Ran db_bench for "readseq, seekrandom, seekrandomwhilewriting, readrandom" with --adaptive_readahead=true and there was no regression if new feature is enabled. Reviewed By: anand1976 Differential Revision: D31773640 Pulled By: akankshamahajan15 fbshipit-source-id: 7332d16258b846ae5cea773009195a5af58f8f98 11 November 2021, 00:20:04 UTC
afcd325 Fix typo in env_win.h (#9138) Summary: overide -> override Pull Request resolved: https://github.com/facebook/rocksdb/pull/9138 Reviewed By: jay-zhuang Differential Revision: D32245235 Pulled By: mrambacher fbshipit-source-id: bed62b843925bed806c06ca3485d33bb45a56dc7 10 November 2021, 20:22:21 UTC
937fbcb Track per-SST user-defined timestamp information in MANIFEST (#9092) Summary: Track per-SST user-defined timestamp information in MANIFEST https://github.com/facebook/rocksdb/issues/8957 Rockdb has supported user-defined timestamp feature. Application can specify a timestamp when writing each k-v pair. When data flush from memory to disk file called SST files, file creation activity will commit to MANIFEST. This commit is for tracking timestamp info in the MANIFEST for each file. The changes involved are as follows: 1) Track max/min timestamp in FileMetaData, and fix invoved codes. 2) Add NewFileCustomTag::kMinTimestamp and NewFileCustomTag::kMinTimestamp in NewFileCustomTag ( in the kNewFile4 part ), and support invoved codes such as VersionEdit Encode and Decode etc. 3) Add unit test code for VersionEdit EncodeDecodeNewFile4, and fix invoved test codes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9092 Reviewed By: ajkr, akankshamahajan15 Differential Revision: D32252323 Pulled By: riversand963 fbshipit-source-id: d2642898d6e3ad1fef0eb866b98045408bd4e162 10 November 2021, 18:49:04 UTC
1a8eec4 Remove invalid RocksJava native entry (#9147) Summary: It seems that an incorrect native source file entry was introduced in https://github.com/facebook/rocksdb/pull/8999. For some reason it appears that CI was not run against that PR, and so the problem was not detected. This PR fixes the problem by removing the invalid entry, allowing RocksJava to build correctly again. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9147 Reviewed By: pdillinger Differential Revision: D32300976 fbshipit-source-id: dbd763b806bacf0fc08f4deaf07c63d0a266c4cf 10 November 2021, 01:21:58 UTC
fc93553 Fix an error on GCC 4.8.5 where -Werror=unused-parameter fails (#9144) Summary: Before this fix compilation with GCC 4.8.5 20150623 (Red Hat 4.8.5-36) would fail with the following error: ``` CC jls/db/db_impl/db_impl.o In file included from ./env/file_system_tracer.h:8:0, from ./file/random_access_file_reader.h:15, from ./file/file_prefetch_buffer.h:15, from ./table/format.h:13, from ./table/internal_iterator.h:14, from ./db/pinned_iterators_manager.h:12, from ./db/range_tombstone_fragmenter.h:15, from ./db/memtable.h:22, from ./db/memtable_list.h:16, from ./db/column_family.h:17, from ./db/db_impl/db_impl.h:22, from db/db_impl/db_impl.cc:9: ./include/rocksdb/file_system.h:108:8: error: unused parameter 'opts' [-Werror=unused-parameter] struct FileOptions : EnvOptions { ^ db/db_impl/db_impl.cc: In member function 'virtual rocksdb::Status rocksdb::DBImpl::SetDBOptions(const std::unordered_map<std::basic_string<char>, std::basic_string<char> >&)': db/db_impl/db_impl.cc:1230:36: note: synthesized method 'rocksdb::FileOptions& rocksdb::FileOptions::operator=(const rocksdb::FileOptions&)' first required here file_options_for_compaction_ = FileOptions(new_db_options); ^ CC jls/db/db_impl/db_impl_compaction_flush.o cc1plus: all warnings being treated as errors make[1]: *** [jls/db/db_impl/db_impl.o] Error 1 make[1]: *** Waiting for unfinished jobs.... make[1]: Leaving directory `/rocksdb-local-build' make: *** [rocksdbjavastatic] Error 2 Makefile:2222: recipe for target 'rocksdbjavastaticdockerarm64v8' failed make: *** [rocksdbjavastaticdockerarm64v8] Error 2 ``` This was detected on both ppc64le and arm64v8, however it does not seem to appear in the same GCC 4.8 version we use for x64 in CircleCI - https://app.circleci.com/pipelines/github/facebook/rocksdb/9691/workflows/c2a94367-14f3-4039-be95-325c34643d41/jobs/227906 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9144 Reviewed By: riversand963 Differential Revision: D32290770 fbshipit-source-id: c90a54ba2a618e1ff3660fff3f3368ab36c3c527 10 November 2021, 01:07:03 UTC
a113cec Fix a bug in timestamp-related GC (#9116) Summary: For multiple versions (ts + seq) of the same user key, if they cross the boundary of `full_history_ts_low_`, we should retain the version that is visible to the `full_history_ts_low_`. Namely, we keep the internal key with the largest timestamp smaller than `full_history_ts_low`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9116 Test Plan: make check Reviewed By: ltamasi Differential Revision: D32261514 Pulled By: riversand963 fbshipit-source-id: e10f47c254c04c05261440051e4f50cb7d95474e 09 November 2021, 21:08:55 UTC
2fbe32b RAII support for per cache reservation through handle (#9130) Summary: Note: This PR is the 3rd PR of a bigger PR stack (https://github.com/facebook/rocksdb/issues/9073) and depends on the second PR (https://github.com/facebook/rocksdb/pull/9071). **See changes from this PR only https://github.com/facebook/rocksdb/pull/9130/commits/00447324d082136b0e777d3ab6a3df3a8452c633** Context: pdillinger brought up a good [point](https://github.com/facebook/rocksdb/pull/9073#discussion_r741478309) about lacking RAII support for per cache reservation in `CacheReservationManager` when reviewing https://github.com/facebook/rocksdb/pull/9073. To summarize the discussion, the current API `CacheReservationManager::UpdateCacheReservation()` requires callers to explicitly calculate and pass in a correct`new_mem_used` to release a cache reservation (if they don't want to rely on the clean-up during `CacheReservationManager`'s destruction - such as they want to release it earlier). While this implementation has convenience in some use-case such as `WriteBufferManager`, where [reservation](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L69-L91) and [release](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L109-L129) amounts do not necessarily correspond symmetrically and thus a flexible `new_mem_used` inputing is needed, it can be prone to caller's calculation error as well as cause a mass of codes in releasing cache in other use-case such as filter construction, where reservation and release amounts do correspond symmetrically and many code paths requiring a cache release, as [pointed](https://github.com/facebook/rocksdb/pull/9073#discussion_r741478309) out by pdillinger. Therefore we decided to provide a new API in `CacheReservationManager` to update reservation with better RAII support for per cache reservation, using a handle to manage the life time of that particular cache reservation. - Added a new class `CacheReservationHandle` - Added a new API `CacheReservationManager::MakeCacheReservation()` that outputs a `CacheReservationHandle` for managing the reservation - Updated class comments to clarify two different cache reservation methods Tests: - Passing new tests Pull Request resolved: https://github.com/facebook/rocksdb/pull/9130 Reviewed By: pdillinger Differential Revision: D32199446 Pulled By: hx235 fbshipit-source-id: 1cba7c636e5ecfb55b0c1e0c2d218cc9b5b30b4e 09 November 2021, 20:06:28 UTC
ffd6085 Add new API CacheReservationManager::GetTotalMemoryUsage() (#9071) Summary: Note: This PR is the 2nd PR of a bigger PR stack (https://github.com/facebook/rocksdb/pull/9073). Context: `CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used)` accepts an accumulated total memory used (e.g, used 10MB so far) instead of usage change (e.g, increase by 5 MB, decrease by 5 MB). It has benefits including consolidating API for increase and decrease as described in https://github.com/facebook/rocksdb/pull/8506. However, not every `CacheReservationManager` user keeps track of this accumulated total memory usage. For example, Bloom/Ribbon Filter construction (e.g, [here](https://github.com/facebook/rocksdb/blob/822d729fcd9f7af9f371ca7168e52dbdab898e41/table/block_based/filter_policy.cc#L587) in https://github.com/facebook/rocksdb/pull/9073) does not while WriteBufferManager and compression dictionary buffering do. Considering future users might or might not keep track of this counter and implementing this counter within `CacheReservationManager` is easy due to the passed-in `std::size_t new_memory_used` in calling `CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used)`, it is proposed to add a new API `CacheReservationManager::GetTotalMemoryUsage()`. As noted in the API comments, since `CacheReservationManager` is NOT thread-safe, external synchronization is needed in calling `UpdateCacheReservation()` if you want `GetTotalMemoryUsed()` returns the indeed latest memory used. - Added and updated private counter `memory_used_` every time `CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used)` is called regardless if the call returns non-okay status - Added `CacheReservationManager::GetTotalMemoryUsage()` to return `memory_used_` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9071 Test Plan: - Passing new tests - Passing existing tests Reviewed By: ajkr Differential Revision: D31887813 Pulled By: hx235 fbshipit-source-id: 9a09f0c8683822673260362894c878b61ee60ceb 09 November 2021, 16:17:03 UTC
efaef9b cleanup error_handler related code (#9098) Summary: Remove code not in use, add comments, remove redundant code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9098 Test Plan: make check Reviewed By: anand1976 Differential Revision: D32027219 Pulled By: zhichao-cao fbshipit-source-id: 253aae926c87726268af6c027bf805dc9156c8a8 08 November 2021, 23:49:17 UTC
a747807 Fix small issues (#5896) Summary: The individual commits in this PR should be self-explanatory. All small and _very_ low-priority changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5896 Reviewed By: riversand963 Differential Revision: D18065108 Pulled By: mrambacher fbshipit-source-id: 236b1a1d9d21f982cc08aa67027108dde5eaf280 08 November 2021, 20:32:38 UTC
dddb791 Enable a few unit tests to use custom Env objects (#9087) Summary: Allow compaction_job_test, db_io_failure_test, dbformat_test, deletefile_test, and fault_injection_test to use a custom Env object. Also move ```RegisterCustomObjects``` declaration to a header file to simplify things. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9087 Test Plan: Run manually using "buck test rocksdb/src:compaction_job_test_fbcode" etc. Reviewed By: riversand963 Differential Revision: D32007222 Pulled By: anand1976 fbshipit-source-id: 99af58559e25bf61563dfa95dc46e31fa7375792 08 November 2021, 19:05:59 UTC
78556c1 Secondary cache error injection (#9002) Summary: Implement secondary cache error injection in db_stress. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9002 Reviewed By: zhichao-cao Differential Revision: D31874896 Pulled By: anand1976 fbshipit-source-id: 8cf04c061a4a44efa0fe88423d05cade67b85f73 08 November 2021, 18:27:27 UTC
e5b34f5 Fb 5789 max total WAL size clarification (#9108) Summary: Add clarification/extension to comments on max_total_wal_size and the Java wrapper MaxTotalWalSize to better explain the effect of the option on log file sizes. Closes https://github.com/facebook/rocksdb/issues/5789 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9108 Reviewed By: pdillinger Differential Revision: D32066640 Pulled By: mrambacher fbshipit-source-id: 7d5affc87e4119019054af9c884a2ea01d68f5b7 08 November 2021, 16:54:37 UTC
be351f4 Restore Java 7 Compatibility (#9103) Summary: RocksDB should still compile on Java 7. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9103 Reviewed By: pdillinger Differential Revision: D32067561 Pulled By: mrambacher fbshipit-source-id: bbe9c18c8007ab3e113de4add56a84c9bde61c8e 08 November 2021, 16:21:02 UTC
8ef5b9d Update MySQLStyleTransactionTest to use SingleDelete (#9062) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9062 Real MySQL-style transactions in MyRocks uses SingleDelete, which is missing in our existint MySQLStyleTransactionTest. Ths diff by lth fills the gap in test coverage. Reviewed By: lth Differential Revision: D31813015 fbshipit-source-id: 196ad761de30ae9ea1f92257058dfc265f211892 07 November 2021, 14:41:12 UTC
9e788be Source files dependencies detection for RocksDB plugins. (#9120) Summary: Otherwise a rebuild is not done if a RocksDB plugin header file is changed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9120 Test Plan: Build RocksDB with a plugin. Change a header file of the RocksDB plugin and rebuild. Signed-off-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com> Reviewed By: riversand963 Differential Revision: D32223303 Pulled By: ajkr fbshipit-source-id: 76d31b10fe915906edc181c7b6398a09b7d079ee 06 November 2021, 18:50:17 UTC
5aad38f Deflake DBBasicTestWithTimestampCompressionSettings.PutAndGetWithComp… (#9136) Summary: …action ``` db/db_with_timestamp_basic_test.cc:2643: Failure db_->CompactFiles(compact_opt, handles_[cf], collector->GetFlushedFiles(), static_cast<int>(kNumTimestamps - i)) Invalid argument: A compaction must contain at least one file. ``` Able to be reproduced by run multiple test in parallel. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9136 Test Plan: ``` gtest-parallel ./db_with_timestamp_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampCompressionSettings.PutAndGetWithCompaction/12 -r 100 -w 100 ``` Reviewed By: riversand963 Differential Revision: D32197734 Pulled By: jay-zhuang fbshipit-source-id: aeb0d6e9b37312f577e203ca81bb7a0f14d4e7ce 06 November 2021, 00:57:50 UTC
3fbddb1 Refactor and unify blob file saving and the logic that finds the oldest live blob file (#9122) Summary: The patch refactors and unifies the logic in `VersionBuilder::SaveBlobFilesTo` and `VersionBuilder::GetMinOldestBlobFileNumber` by introducing a generic helper that can "merge" the list of `BlobFileMetaData` in the base version with the list of `MutableBlobFileMetaData` representing the updated state after applying a sequence of `VersionEdit`s. This serves as groundwork for subsequent changes that will enable us to determine whether a blob file is live after applying a sequence of edits without calling `VersionBuilder::SaveTo`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9122 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D32151472 Pulled By: ltamasi fbshipit-source-id: 11622b475866de823334b8bc21b0e99d913af97e 05 November 2021, 23:50:52 UTC
3018a3e Minor improvement to CacheReservationManager/WriteBufferManager/CompressionDictBuilding (#9139) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9139 Reviewed By: zhichao-cao Differential Revision: D32211415 Pulled By: hx235 fbshipit-source-id: 39ce036ba34e1fb4a1992a33ac6904a4a943301d 05 November 2021, 23:13:47 UTC
5237b39 Fix assertion error during compaction with write-prepared txn enabled (#9105) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9105 The user contract of SingleDelete is that: a SingleDelete can only be issued to a key that exists and has NOT been updated. For example, application can insert one key `key`, and uses a SingleDelete to delete it in the future. The `key` cannot be updated or removed using Delete. In reality, especially when write-prepared transaction is being used, things can get tricky. For example, a prepared transaction already writes `key` to the memtable after a successful Prepare(). Afterwards, should the transaction rollback, it will insert a Delete into the memtable to cancel out the prior Put. Consider the following sequence of operations. ``` // operation sequence 1 Begin txn Put(key) Prepare() Flush() Rollback txn Flush() ``` There will be two SSTs resulting from above. One of the contains a PUT, while the second one contains a Delete. It is also known that releasing a snapshot can lead to an L0 containing only a SD for a particular key. Consider the following operations following the above block. ``` // operation sequence 2 db->Put(key) db->SingleDelete(key) Flush() ``` The operation sequence 2 can result in an L0 with only the SD. Should there be a snapshot for conflict checking created before operation sequence 1, then an attempt to compact the db may hit the assertion failure below, because ikey_.type is Delete (from a rollback). ``` else if (clear_and_output_next_key_) { assert(ikey_.type == kTypeValue || ikey_.type == kTypeBlobIndex); } ``` To fix the assertion failure, we can skip the SingleDelete if we detect an earlier Delete in the same snapshot interval. Reviewed By: ltamasi Differential Revision: D32056848 fbshipit-source-id: 23620a91e28562d91c45cf7e95f414b54b729748 05 November 2021, 22:29:18 UTC
7d74d34 Fixing more loosely formatted issue (#9140) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9140 Upon checking, some jobs still have loosely formatted issue. Also, some started failing due to my previous buggy change. https://fburl.com/scuba/sandcastle/e9mhejsc https://fburl.com/scuba/duplo_events/dbxx4215 Reviewed By: jay-zhuang Differential Revision: D32213703 fbshipit-source-id: 806f872b820afe275cf62459988c9d6f668af35c 05 November 2021, 19:21:38 UTC
28bab0e Improve comments on options.writable_file_max_buffer_size (#9131) Summary: Comments of options.writable_file_max_buffer_size mentioned Windows, which is confusing. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9131 Reviewed By: anand1976 Differential Revision: D32187003 fbshipit-source-id: 1f134d7ecdc4a9d13825d461ab1da56769b9455f 04 November 2021, 23:38:09 UTC
caadf09 Add options.manual_wal_flush to db_bench (#9132) Summary: It is useful to add options.manual_wal_flush to db_bench Pull Request resolved: https://github.com/facebook/rocksdb/pull/9132 Test Plan: Run the benchamrk with the option. Reviewed By: ltamasi Differential Revision: D32188060 fbshipit-source-id: a70835d3cad0f30095218dfda1daff0a432892e5 04 November 2021, 23:03:47 UTC
8be121b Fixing loosely formatted child job specs, double quotes (#9125) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9125 See task for more context Reviewed By: jay-zhuang Differential Revision: D32157345 fbshipit-source-id: be6631c6400fd66e6891e71dc0235798c594010a 04 November 2021, 22:34:51 UTC
2a3511a Fix -Werror=type-limits seen in Travis (#9128) Summary: Work around annoying compiler warning-as-error from https://github.com/facebook/rocksdb/issues/9113 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9128 Test Plan: CI Reviewed By: jay-zhuang Differential Revision: D32181499 Pulled By: pdillinger fbshipit-source-id: d7e5f7857a29f7ba47c49c3aee7150b5763b65d9 04 November 2021, 22:01:10 UTC
1ababeb Deallocate payload of BlockBasedTableBuilder::Rep::FilterBlockBuilder earlier for Full/PartitionedFilter (#9070) Summary: Note: This PR is the 1st part of a bigger PR stack (https://github.com/facebook/rocksdb/pull/9073). Context: Previously, the payload (i.e, filter data) within `BlockBasedTableBuilder::Rep::FilterBlockBuilder` object is not deallocated until `BlockBasedTableBuilder` is deallocated, despite it is no longer useful after its related `filter_content` being written. - Transferred the payload (i.e, the filter data) out of `BlockBasedTableBuilder::Rep::FilterBlockBuilder` object - For PartitionedFilter: - Unified `filters` and `filter_gc` lists into one `std::deque<FilterEntry> filters` by adding a new field `last_filter_entry_key` and storing the `std::unique_ptr filter_data` with the `Slice filter` in the same entry - Reset `last_filter_data` in the case where `filters` is empty, which should be as by then we would've finish using all the `Slice filter` - Deallocated the payload by going out of scope as soon as we're done with using the `filter_content` associated with the payload - This is an internal interface change at the level of `FilterBlockBuilder::Finish()`, which leads to touching the inherited interface in `BlockBasedFilterBlockBuilder`. But for that, the payload transferring is ignored. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9070 Test Plan: - The main focus is to catch segment fault error during `FilterBlockBuilder::Finish()` and `BlockBasedTableBuilder::Finish()` and interface mismatch. Relying on existing CI tests is enough as `assert(false)` was temporarily added to verify the new logic of transferring ownership indeed run Reviewed By: pdillinger Differential Revision: D31884933 Pulled By: hx235 fbshipit-source-id: f73ecfbea13788d4fc058013ace27230110b52f4 04 November 2021, 20:35:38 UTC
a64c8ca Sanitize negative request bytes in GenericRateLimiter::Request and clarify API (#9112) Summary: Context: Surprisingly, there isn't any sanitization against negative `int64_t bytes` in `GenericRateLimiter::Request(int64_t bytes, const Env::IOPriority pri, Statistics* stats)`. A negative `bytes` can be passed in and incorrectly increases `available_bytes_` by subtracting the negative `bytes` from `available_bytes_`, such as [here](https://github.com/facebook/rocksdb/blob/main/util/rate_limiter.cc#L138) and [here](https://github.com/facebook/rocksdb/blob/main/util/rate_limiter.cc#L283), which are incorrect behaviors. - Sanitized negative request bytes by rounding it up to 0 - Added notes to public and internal API Pull Request resolved: https://github.com/facebook/rocksdb/pull/9112 Test Plan: - Rely on existing tests Reviewed By: ajkr Differential Revision: D32085364 Pulled By: hx235 fbshipit-source-id: b1b6066b2dd5ffc7bcbfb07069ca65a33578251b 04 November 2021, 17:11:53 UTC
dfedc74 Some checksum code refactoring (#9113) Summary: To prepare for adding checksum to footer and "context aware" checksums. This also brings closely related code much closer together. Recently added `BlockBasedTableBuilder::ComputeBlockTrailer` for testing is made obsolete in the refactoring, as testing the checksums can happen at a lower level of abstraction. Also now checking for unrecognized checksum type on reading footer, rather than later on use. Also removed an obsolete function delcaration. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9113 Test Plan: existing tests worked before refactoring to remove `ComputeBlockTrailer`. And then refactored+improved tests using it. Reviewed By: mrambacher Differential Revision: D32090149 Pulled By: pdillinger fbshipit-source-id: 2879da683c1498ea85a3b70dace9b6d9f6b47b6e 04 November 2021, 16:09:34 UTC
312d9c4 Re-enable 390x+cmake* Travis jobs (#9110) Summary: Revert "Temporarily disable s390x+cmake* Travis jobs (https://github.com/facebook/rocksdb/issues/9095)" This reverts commit f2d11b3fdceb0886304a5a7c6302121610ebf39f. I have now uploaded the CMake deb for s390x provided by jonathan-albrecht-ibm to our S3 bucket. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9110 Reviewed By: ajkr Differential Revision: D32082903 Pulled By: riversand963 fbshipit-source-id: b7243d19fc133e665a8654e3b528c4f53d5b11d1 04 November 2021, 03:30:15 UTC
9b53f14 Fixed a bug in CompactionIterator when write-preared transaction is used (#9060) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9060 RocksDB bottommost level compaction may zero out an internal key's sequence if the key's sequence is in the earliest_snapshot. In write-prepared transaction, checking the visibility of a certain sequence in a specific released snapshot may return a "snapshot released" result. Therefore, it is possible, after a certain sequence of events, a PUT has its sequence zeroed out, but a subsequent SingleDelete of the same key will still be output with its original sequence. This violates the ascending order of keys and leads to incorrect result. The solution is to use an extra variable `last_key_seq_zeroed_` to track the information about visibility in earliest snapshot. With this variable, we can know for sure that a SingleDelete is in the earliest snapshot even if the said snapshot is released during compaction before processing the SD. Reviewed By: ltamasi Differential Revision: D31813016 fbshipit-source-id: d8cff59d6f34e0bdf282614034aaea99be9174e1 03 November 2021, 22:55:00 UTC
5681014 Fixing child jobs of rocksdb for duplo deprecation (#9117) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9117 This fixes the loosely formatted json child job spec. I.e., trailing commas Reviewed By: jay-zhuang Differential Revision: D32119606 fbshipit-source-id: 0ef571ccbef0e638df18f17288611d3b5774f129 03 November 2021, 21:56:26 UTC
2910264 Skip directory fsync for filesystem btrfs (#8903) Summary: Directory fsync might be expensive on btrfs and it may not be needed. Here are 4 directory fsync cases: 1. creating a new file: dir-fsync is not needed on btrfs, as long as the new file itself is synced. 2. renaming a file: dir-fsync is not needed if the renamed file is synced. So an API `FsyncAfterFileRename(filename, ...)` is provided to sync the file on btrfs. By default, it just calls dir-fsync. 3. deleting files: dir-fsync is forced by set `IOOptions.force_dir_fsync = true` 4. renaming multiple files (like backup and checkpoint): dir-fsync is forced, the same as above. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8903 Test Plan: run tests on btrfs and non btrfs Reviewed By: ajkr Differential Revision: D30885059 Pulled By: jay-zhuang fbshipit-source-id: dd2730b31580b0bcaedffc318a762d7dbf25de4a 03 November 2021, 19:21:27 UTC
0817227 Refactor the detailed consistency checks and the SST saving logic in VersionBuilder (#9099) Summary: The patch refactors the parts of `VersionBuilder` that deal with SST file comparisons. Specifically, it makes the following changes: * Turns `NewestFirstBySeqNo` and `BySmallestKey` from free-standing functions into function objects. Note: `BySmallestKey` has a pointer to the `InternalKeyComparator`, while `NewestFirstBySeqNo` is completely stateless. * Eliminates the wrapper `FileComparator`, which was essentially an unnecessary DIY virtual function call mechanism. * Refactors `CheckConsistencyDetails` and `SaveSSTFilesTo` using helper function templates that take comparator/checker function objects. Using static polymorphism eliminates the need to make runtime decisions about which comparator to use. * Extends some error messages returned by the consistency checks and makes them more uniform. * Removes some incomplete/redundant consistency checks from `VersionBuilder` and `FilePicker`. * Improves const correctness in several places. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9099 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D32027503 Pulled By: ltamasi fbshipit-source-id: 621326ae41f4f55f7ad6a91abbd6e666d5c7857c 03 November 2021, 18:52:47 UTC
2b60621 Don't call OnTableFileCreated with OK for empty+deleted file (#9118) Summary: EventListener::OnTableFileCreated was previously called with OK status and file_size==0 in cases of no SST file contents written (because there was no content to add) and the empty file deleted before calling the listener. This could lead to a stress test assertion failure added in https://github.com/facebook/rocksdb/issues/9054. This changes the status to Aborted, to align with the API doc: "... if the file is successfully created. Now it will also be called on failure case. User can check info.status to see if it succeeded or not." For internal purposes, this case is considered "success" but for listener purposes, no SST file is (successfully) created. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9118 Test Plan: test case added + existing db_stress Reviewed By: ajkr, riversand963 Differential Revision: D32120232 Pulled By: pdillinger fbshipit-source-id: a804e2e0a52598018d3b182da97804d402ffcdfa 03 November 2021, 15:43:27 UTC
21f8a57 Fix TSAN report on MemPurge test (#9115) Summary: TSAN reported data race on count variables in MemPurgeBasic test. This suggests the test could fail if mempurges were slow enough that they don't complete before the count variables being checked, but injecting a long sleep into MemPurge (outside DB mutex) confirms that blocked writes ensure enough mempurges/flushes happen to make the test pass. All the possible different values on testing should be OK to make the test pass. So this change makes the variables atomic so that up-to-date value is always read and TSAN report suppressed. I have also used `.exchange(0)` to make the checking less stateful by "popping off" all the accumulated counts. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9115 Test Plan: updated test, watch for any flakiness Reviewed By: riversand963 Differential Revision: D32114432 Pulled By: pdillinger fbshipit-source-id: c985609d39896a0d8f69ebc87b221e688609bdd8 03 November 2021, 04:54:29 UTC
67a7b74 Clarify setting `CompressionOptions::max_dict_bytes > 0` will charge block cache (#9119) Summary: Add it to the API comment. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9119 Reviewed By: hx235 Differential Revision: D32124238 Pulled By: ajkr fbshipit-source-id: d1f82037417d883f2000f2d62995a7708dda77c6 03 November 2021, 04:43:50 UTC
82afa01 Some API clarifications (#9080) Summary: * Clarify that RocksDB is not exception safe on many of our callback and extension interfaces * Clarify FSRandomAccessFile::MultiRead implementations must accept non-sorted inputs (see https://github.com/facebook/rocksdb/issues/8953) * Clarify ConcurrentTaskLimiter and SstFileManager are not (currently) extensible interfaces * Mark WriteBufferManager as `final`, so it is then clearly not a callback interface, even though it smells like one * Clarify TablePropertiesCollector Status returns are mostly ignored Pull Request resolved: https://github.com/facebook/rocksdb/pull/9080 Test Plan: comments only (except WriteBufferManager final) Reviewed By: ajkr Differential Revision: D31968782 Pulled By: pdillinger fbshipit-source-id: 11b648ce3ce3c5e5bdc02d2eafc7ea4b864bd1d2 03 November 2021, 03:30:07 UTC
f72c834 Make FileSystem a Customizable Class (#8649) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8649 Reviewed By: zhichao-cao Differential Revision: D32036059 Pulled By: mrambacher fbshipit-source-id: 4f1e7557ecac52eb849b83ae02b8d7d232112295 02 November 2021, 16:07:11 UTC
cfc57f5 Mention PR 9100 in HISTORY.md (#9111) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9111 Reviewed By: riversand963 Differential Revision: D32076310 Pulled By: ltamasi fbshipit-source-id: 81e30a02ded87c0f1a42985db42e80b62235ba11 01 November 2021, 22:22:32 UTC
ec9082d Regression tests for tickets fixed by previous change. (#9019) Summary: closes https://github.com/facebook/rocksdb/issues/5891 closes https://github.com/facebook/rocksdb/issues/2001 Java BytewiseComparator is now unsigned compliant, consistent with the default C++ comparator, which has always been thus. Consequently 2 tickets reporting the previous broken state can be closed. This test confirms that the following issues were in fact resolved by a change made between 6.2.2 and 6.22.1, to wit https://github.com/facebook/rocksdb/commit/7242dae7 which as part of its effect, changed the Java bytewise comparators. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9019 Reviewed By: pdillinger Differential Revision: D31610910 Pulled By: mrambacher fbshipit-source-id: 664230f1377a1aa270136edd63eea2c206b907e9 01 November 2021, 22:06:47 UTC
560fe70 Add new API CacheReservationManager::GetDummyEntrySize() (#9072) Summary: Note: it might conflict with another CRM related PR https://github.com/facebook/rocksdb/pull/9071 and so will merge after that's merged. Context: As `CacheReservationManager` being used by more memory users, it is convenient to retrieve the dummy entry size for `CacheReservationManager` instead of hard-coding `256 * 1024` in writing tests. Plus it allows more flexibility to change our implementation on dummy entry size. A follow-up PR is needed to replace those hard-coded dummy entry size value in `db_test2.cc`, `db_write_buffer_manager_test.cc`, `write_buffer_manager_test.cc`, `table_test.cc` and the ones introduced in https://github.com/facebook/rocksdb/pull/9072#issue-1034326069. - Exposed the private static constexpr `kDummyEntrySize` through public static `CacheReservationManager::GetDummyEntrySize()` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9072 Test Plan: - Passing new tests - Passing existing tests Reviewed By: ajkr Differential Revision: D32043684 Pulled By: hx235 fbshipit-source-id: ddefc6921c052adab6a2cda2394eb26da3076a50 01 November 2021, 21:46:09 UTC
a2b9be4 Try to start TTL earlier with kMinOverlappingRatio is used (#8749) Summary: Right now, when options.ttl is set, compactions are triggered around the time when TTL is reached. This might cause extra compactions which are often bursty. This commit tries to mitigate it by picking those files earlier in normal compaction picking process. This is only implemented using kMinOverlappingRatio with Leveled compaction as it is the default value and it is more complicated to change other styles. When a file is aged more than ttl/2, RocksDB starts to boost the compaction priority of files in normal compaction picking process, and hope by the time TTL is reached, very few extra compaction is needed. In order for this to work, another change is made: during a compaction, if an output level file is older than ttl/2, cut output files based on original boundary (if it is not in the last level). This is to make sure that after an old file is moved to the next level, and new data is merged from the upper level, the new data falling into this range isn't reset with old timestamp. Without this change, in many cases, most files from one level will keep having old timestamp, even if they have newer data and we stuck in it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8749 Test Plan: Add a unit test to test the boosting logic. Will add a unit test to test it end-to-end. Reviewed By: jay-zhuang Differential Revision: D30735261 fbshipit-source-id: 503c2d89250b22911eb99e72b379be154de3428e 01 November 2021, 21:36:31 UTC
a5ec5e3 Minor improvement to #8428 (Account for dictionary-building buffer in global memory limit) (#9032) Summary: Summary/Context: - Renamed `cache_rev_mng` to `compression_dict_buffer_cache_res_mgr` - It is to distinguish with other potential `cache_res_mgr` in `BlockBasedTableBuilder` and to use correct short-hand for the words "reservation", "manager" - Added `table_options.block_cache == nullptr` in additional to `table_options.no_block_cache == true` to be conditions where we don't create a `CacheReservationManager` - Theoretically `table_options.no_block_cache == true` is equivalent to `table_options.block_cache == nullptr` by API. But since segment fault will be generated by passing `nullptr` into `CacheReservationManager`'s constructor, it does not hurt to directly verify `table_options.block_cache != nullptr` before passing in - Renamed `is_cache_full` to `exceeds_global_block_cache_limit` - It is to hide implementation detail of cache reservation and to emphasize on the concept/design intent of caping memory within global block cache limit Pull Request resolved: https://github.com/facebook/rocksdb/pull/9032 Test Plan: - Passing existing tests Reviewed By: ajkr Differential Revision: D32005807 Pulled By: hx235 fbshipit-source-id: 619fd17bb924199de3db5924d8ab7dae53b1efa2 01 November 2021, 21:28:09 UTC
230c98f fix histogram NUM_FILES_IN_SINGLE_COMPACTION (#9026) Summary: currently histogram `NUM_FILES_IN_SINGLE_COMPACTION` just counted files in first level of compaction input, this fix counts files in all levels of compaction input. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9026 Reviewed By: ajkr Differential Revision: D31668241 Pulled By: jay-zhuang fbshipit-source-id: c02f6c4a5df9fbf0b7510036594811152e8738af 01 November 2021, 19:57:27 UTC
b1c27a5 Add a consistency check that prevents the overflow of garbage in blob files (#9100) Summary: The number or total size of garbage blobs in any given blob file can never exceed the number or total size of all blobs in the file. (This would be a similar error to e.g. attempting to remove from the LSM tree an SST file that has already been removed.) The patch builds on https://github.com/facebook/rocksdb/issues/9085 and adds a consistency check to `VersionBuilder` that prevents the above from happening. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9100 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D32048982 Pulled By: ltamasi fbshipit-source-id: 6f7e0793bf534ad04c3359cc0f696b8e4e5ef81c 01 November 2021, 19:32:14 UTC
73e6b89 Java wrapper for blob_gc_force_threshold as blobGarbageCollectionForceThreshold (#9109) Summary: Extra option added as a supplement to https://github.com/facebook/rocksdb/pull/8999 Closes https://github.com/facebook/rocksdb/issues/8221 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9109 Reviewed By: mrambacher Differential Revision: D32065039 Pulled By: ltamasi fbshipit-source-id: 6c484050a30fe0523850a8a3c95dc85b0a501362 01 November 2021, 18:59:10 UTC
2b70224 remove bad extra RecordTick(stats_, WRITE_WITH_WAL) (#9064) Summary: This PR fix wrong ticker `WRITE_WITH_WAL`. `RecordTick(WRITE_WITH_WAL)` will be called later in `WriteToWAL` and `ConcurrentWriteToWAL`. Fixes: 1. Delete these two extra `RecordTick(WRITE_WITH_WAL)` 2. Fix corresponding test case Pull Request resolved: https://github.com/facebook/rocksdb/pull/9064 Reviewed By: ajkr Differential Revision: D31944459 Pulled By: riversand963 fbshipit-source-id: f1aa8d2a4320456bc357bc5b0902032f7dcad086 01 November 2021, 18:43:14 UTC
0e12b1d Update buckify scripts (#9104) Summary: https://github.com/facebook/rocksdb/commit/49af999954c0c130fefdb5f4bafc919c18341521 updates RocksDB buckifier script directly via fbcode. We need to make sure that the following command run in RocksDB repo generate the same TARGETS file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9104 Test Plan: ``` $python buckifier/buckify_rocksdb.py ``` Verify that TARGETS file does not have uncommitted changes. Reviewed By: jay-zhuang Differential Revision: D32055387 Pulled By: riversand963 fbshipit-source-id: 19cf1b8145095b6df625958458189680e543e3ba 01 November 2021, 17:11:18 UTC
01bd86a InternalStats::DumpCFMapStat: fix sum.w_amp (#9065) Summary: sum `w_amp` will be a very large number`(bytes_written + bytes_written_blob)` when there is no any flush and ingest. This PR set sum `w_amp` to zero if there is no any flush and ingest, this is conform to per-level `w_amp` computation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9065 Reviewed By: ajkr Differential Revision: D31943994 Pulled By: riversand963 fbshipit-source-id: acbef5e331debebfad09e0e0d8d0885ebbc00609 01 November 2021, 06:11:43 UTC
d263505 Avoid div-by-zero error in db_stress (#9086) Summary: If a column family has 0 levels, then existing `TestCompactFiles(...)` may hit divide-by-zero. To fix, return early if the cf is empty. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9086 Test Plan: TBD Reviewed By: ajkr Differential Revision: D31986799 Pulled By: riversand963 fbshipit-source-id: 48f7dfb2b2b47cfc1315cb71ca80eb230d947f17 01 November 2021, 05:16:03 UTC
8e59a1d Attempt to deflake ListenerTest.MultiCF (#9084) Summary: EventListenerTest.MultiCF uses TestFlushListener which has members flushed_dbs_ and flushed_column_family_names_ that are not protected by locks. This implicitly indicates that we need to ensure the methods accessing these data structures in a single threaded way. In other tests, e.g. MultiDBMultiListeners, we use TEST_WaitForFlushMemtable() to wait until all memtables of a given column family are flushed, hence no pending flush threads will concurrently call OnFlushCompleted() and cause data race for flushed_dbs_. To fix a test failure, we should do the same for MultiCF. Example data race stack traces reported by TSAN ``` Read of size 8 at 0x7b6000002840 by main thread: #0 std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> >::size() const /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/stl_vector.h:655:40 https://github.com/facebook/rocksdb/issues/1 rocksdb::EventListenerTest_MultiCF_Test::TestBody() /home/circleci/project/db/listener_test.cc:380:7 Previous write of size 8 at 0x7b6000002840 by thread T2: #0 void std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> >::_M_emplace_back_aux<rocksdb::DB* const&>(rocksdb::DB* const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/vector.tcc:442:26 https://github.com/facebook/rocksdb/issues/1 std::vector<rocksdb::DB*, std::allocator<rocksdb::DB*> >::push_back(rocksdb::DB* const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/stl_vector.h:923:4 https://github.com/facebook/rocksdb/issues/2 rocksdb::TestFlushListener::OnFlushCompleted(rocksdb::DB*, rocksdb::FlushJobInfo const&) /home/circleci/project/db/listener_test.cc:255:18 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9084 Test Plan: ./listener_test --gtest_filter=EventListenerTest.MultiCF Reviewed By: jay-zhuang Differential Revision: D31952259 Pulled By: riversand963 fbshipit-source-id: 94a7f29e4e9466ead42418944eb2247fc32bd499 01 November 2021, 05:12:15 UTC
8f4f302 Attempt to deflake DBFlushTest.FireOnFlushCompletedAfterCommittedResult (#9083) Summary: DBFlushTest.FireOnFlushCompletedAfterCommittedResult uses test sync points to coordinate interleaving of different threads. Before this PR, the test writes some data to memtable, triggers a manual flush, and triggers a second manual flush after a first bg flush thread starts executing. Though unlikely, it is possible for the second bg flush thread to run faster than the first bg flush thread and deques flush queue first. In this case, the original test will fail. The fix is to wait until the first bg flush thread deques the flush queue before triggering second manual flush. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9083 Test Plan: ./db_flush_test --gtest_filter=DBFlushTest.FireOnFlushCompletedAfterCommittedResult Reviewed By: jay-zhuang Differential Revision: D31951239 Pulled By: riversand963 fbshipit-source-id: f32d7cdabe6ad6808fd18e54e663936dc0a9edb4 01 November 2021, 05:08:48 UTC
49af999 internal_repo_rocksdb/repo Reviewed By: DrMarcII Differential Revision: D32033741 fbshipit-source-id: af12d9d72f109a4a2837cb64e02fa0dbc9175711 30 October 2021, 02:34:39 UTC
back to top