https://github.com/facebook/rocksdb

sort by:
Revision Author Date Message Commit Date
49ce8a1 Update version.h and HISTORY.md for 8.9.1 08 December 2023, 20:44:09 UTC
b8d41c5 Turn the default Timer in PeriodicTaskScheduler into a leaky Meyers singleton (#12128) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12128 The patch turns the `Timer` Meyers singleton in `PeriodicTaskScheduler::Default()` into one of the leaky variety in order to prevent static destruction order issues. Reviewed By: akankshamahajan15 Differential Revision: D51963950 fbshipit-source-id: 0fc34113ad03c51fdc83bdb8c2cfb6c9f6913948 08 December 2023, 20:38:07 UTC
3b1ce12 Update version.h to correct version 8.9 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: 30 November 2023, 19:19:11 UTC
17e3628 Merge memtable only UDT bugfix 22 November 2023, 00:35:45 UTC
8bcbed2 Update HISTORY.md and version.h for 8.9.fb release 22 November 2023, 00:33:46 UTC
84a54e1 Fix some bugs in index builder and reader for the UDT in memtable only feature (#12062) Summary: These bugs surfaced while I was trying to add the stress test for the feature: Bug 1) On the index building path: the optimization to use user key instead of internal key as separator needed a bit tweak for when user defined timestamps can be removed. Because even though the user key look different now and eligible to be used as separator, when their user-defined timestamps are removed, they could be equal and that invariant no longer stands. Bug 2) On the index reading path: one path that builds the second level index iterator for `PartitionedIndexReader` are not passing the corresponding `user_defined_timestamps_persisted` flag. As a result, the default `true` value be used leading to no minimum timestamps padded when they should be. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12062 Test Plan: For bug 1): added separate unit test `BlockBasedTableReaderTest::Get` to exercise the `Get` API. It's a different code path from `MultiGet` so worth having its own test. Also in order to cover the bug, the test is modified to generate key values with the same user provided key, different timestamps and different sequence numbers. The test reads back different versions of the same user provided key. `MultiGet` takes one `ReadOptions` with one read timestamp so we cannot test retrieving different versions of the same key easily. For bug 2): simply added options `BlockBasedTableOptions.metadata_cache_options.partition_pinning = PinningTier::kAll` to exercise all the index iterator creating paths. Reviewed By: ltamasi Differential Revision: D51508280 Pulled By: jowlyzhang fbshipit-source-id: 8b174d3d70373c0599266ac1f467f2bd4d7ea6e5 21 November 2023, 22:05:02 UTC
d3e015f Fix compact_files_example (#12084) Summary: The option "write_buffer_size" has changed from 4MB for 64MB by default, and the compact_files_example will not work as expected, as the test data written is only about 50MB and will not trigger compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12084 Reviewed By: cbi42 Differential Revision: D51499959 Pulled By: ajkr fbshipit-source-id: 4f4b25ebc4b6bb568501adc8e97813edcddceea8 21 November 2023, 17:34:59 UTC
04cbc77 Add missing license to source files (#12083) Summary: Fixes https://github.com/facebook/rocksdb/issues/12079. Fixed missing licenses in "\*.h" and "\*.cc" files Pull Request resolved: https://github.com/facebook/rocksdb/pull/12083 Reviewed By: cbi42 Differential Revision: D51489634 Pulled By: ajkr fbshipit-source-id: 764bfee257b9d6603fd7606a55664b7537e1898f 21 November 2023, 16:36:30 UTC
336a74d Add some asserts in ~CacheWithSecondaryAdapter (#12082) Summary: Add some asserts in the `CacheWithSecondaryAdapter` destructor to help debug a crash test failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12082 Reviewed By: cbi42 Differential Revision: D51486041 Pulled By: anand1976 fbshipit-source-id: 76537beed31ba27ab9ac8b4ce6deb775629e3be5 21 November 2023, 01:48:17 UTC
fb5c8c7 Do not compare op_type in `WithinPenultimateLevelOutputRange()` (#12081) Summary: `WithinPenultimateLevelOutputRange()` is updated in https://github.com/facebook/rocksdb/issues/12063 to check internal key range. However, op_type of a key can change during compaction, e.g. MERGE -> PUT, which makes a key larger and becomes out of penultimate output range. This has caused stress test failures with error message "Unsafe to store Seq later than snapshot in the last level if per_key_placement is enabled". So update `WithinPenultimateLevelOutputRange()` to only check user key and sequence number. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12081 Test Plan: * This repro can produce the corruption within a few runs. Ran it a few times after the fix and did not see Corruption failure. ``` python3 ./tools/db_crashtest.py whitebox --test_tiered_storage --random_kill_odd=888887 --use_merge=1 --writepercent=100 --readpercent=0 --prefixpercent=0 --delpercent=0 --delrangepercent=0 --iterpercent=0 --write_buffer_size=419430 --column_families=1 --read_fault_one_in=0 --write_fault_one_in=0 ``` Reviewed By: ajkr Differential Revision: D51481202 Pulled By: cbi42 fbshipit-source-id: cad6b65099733e03071b496e752bbdb09cf4db82 21 November 2023, 01:07:28 UTC
39d3347 Fix build on FreeBSD (#11218) (#12078) Summary: Fixes https://github.com/facebook/rocksdb/issues/11218 Changes from https://github.com/facebook/rocksdb/issues/10881 broke FreeBSD builds with: env/io_posix.h:39:9: error: 'POSIX_MADV_NORMAL' macro redefined [-Werror,-Wmacro-redefined] This commit fixes FreeBSD builds by ignoring MADV defines. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12078 Reviewed By: cbi42 Differential Revision: D51452802 Pulled By: ajkr fbshipit-source-id: 0a1f5a90954e7d257a95794277a843ac77f3a709 20 November 2023, 18:11:16 UTC
b059c56 Add missing copyright header (#12076) Summary: Required for open source repo. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12076 Reviewed By: ajkr Differential Revision: D51449839 Pulled By: cbi42 fbshipit-source-id: 4a25a3422880db3f28a2834d966341935db32530 19 November 2023, 17:50:59 UTC
7780e98 add write_buffer_manager setter into options and tests in c bindings, (#12007) Summary: following https://github.com/facebook/rocksdb/pull/11710 - add test on wbm c api - add a setter for WBM in `DBOptions` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12007 Reviewed By: cbi42 Differential Revision: D51430042 Pulled By: ajkr fbshipit-source-id: 608bc4d3ed35a84200459d0230b35be64b3475f7 17 November 2023, 19:34:05 UTC
4e58cc6 Check internal key range when compacting from last level to penultimate level (#12063) Summary: The test failure in https://github.com/facebook/rocksdb/issues/11909 shows that we may compact keys outside of internal key range of penultimate level input files from last level to penultimate level, which can potentially cause overlapping files in the penultimate level. This PR updates the `Compaction::WithinPenultimateLevelOutputRange()` to check internal key range instead of user key. Other fixes: * skip range del sentinels when deciding output level for tiered compaction Pull Request resolved: https://github.com/facebook/rocksdb/pull/12063 Test Plan: - existing unit tests - apply the fix to https://github.com/facebook/rocksdb/issues/11905 and run `./tiered_compaction_test --gtest_filter="*RangeDelsCauseFileEndpointsToOverlap*"` Reviewed By: ajkr Differential Revision: D51288985 Pulled By: cbi42 fbshipit-source-id: 70085db5f5c3b15300bcbc39057d57b83fd9902a 17 November 2023, 18:50:40 UTC
2f9ea81 Add HyperClockCache Java API. (#12065) Summary: Fix https://github.com/facebook/rocksdb/issues/11510 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12065 Reviewed By: ajkr Differential Revision: D51406695 Pulled By: cbi42 fbshipit-source-id: b9e32da5f9bcafb5365e4349f7295be90d5aa7ba 16 November 2023, 23:46:31 UTC
a9bd525 Add Qdrant to USERS.md (#12072) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12072 Reviewed By: cbi42 Differential Revision: D51398080 Pulled By: ajkr fbshipit-source-id: 1043f2b012bd744e9c53c638e1ba56a3e0392e11 16 November 2023, 18:35:08 UTC
6d10f8d add WriteBufferManager to c api (#11710) Summary: I want to use the `WriteBufferManager` in my rust project, which requires exposing it through the c api, just like `Cache` is. Hopefully the changes are fairly straightfoward! Pull Request resolved: https://github.com/facebook/rocksdb/pull/11710 Reviewed By: cbi42 Differential Revision: D51166518 Pulled By: ajkr fbshipit-source-id: cd266ff1e4a7ab145d05385cd125a8390f51f3fc 16 November 2023, 18:34:00 UTC
9202db1 Consider archived WALs for deletion more frequently (#12069) Summary: Fixes https://github.com/facebook/rocksdb/issues/11000. That issue pointed out that RocksDB was slow to delete archived WALs in case time-based and size-based expiration were enabled, and the time-based threshold (`WAL_ttl_seconds`) was small. This PR prevents the delay by taking into account `WAL_ttl_seconds` when deciding the frequency to process archived WALs for deletion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12069 Reviewed By: pdillinger Differential Revision: D51262589 Pulled By: ajkr fbshipit-source-id: e65431a06ee96f4c599ba84a27d1aedebecbb003 15 November 2023, 23:42:28 UTC
2222cae Make CacheWithSecondaryAdapter reservation accounting more robust (#12059) Summary: `CacheWithSecondaryAdapter` can distribute placeholder reservations across the primary and secondary caches. The current implementation of the accounting is quite complicated in order to avoid using a mutex. This may cause the accounting to be slightly off after changes to the cache capacity and ratio, resulting in assertion failures. There's also a bug in the unlikely event that the total reservation exceeds the cache capacity. Furthermore, the current implementation is difficult to reason about. This PR simplifies it by doing the accounting while holding a mutex. The reservations are processed in 1MB chunks in order to avoid taking a lock too frequently. As a side effect, this also removes the restriction of not allowing to increase the compressed secondary cache capacity after decreasing it to 0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12059 Test Plan: Existing unit tests, and a new test for capacity increase from 0 Reviewed By: pdillinger Differential Revision: D51278686 Pulled By: anand1976 fbshipit-source-id: 7e1ad2c50694772997072dd59cab35c93c12ba4f 15 November 2023, 00:25:52 UTC
a660e07 Build RocksDBJava on Windows with Java8. (#12068) Summary: At the moment RocksDBJava uses the default CIrcleCI JVM on Windows builds. This can and has changed in the past and can cause some incompatibilities. This PR addresses the problem of explicitly installing and using Liberica JDK 8 as Java 8 Is the primary target for RocksdbJava. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12068 Reviewed By: cbi42 Differential Revision: D51307233 Pulled By: ajkr fbshipit-source-id: 9cb4e173d8a9ac42e5f9fda1daf012302942fdbc 14 November 2023, 22:39:31 UTC
37064d6 Add encfs plugin link (#12070) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12070 Reviewed By: jaykorean Differential Revision: D51307148 Pulled By: ajkr fbshipit-source-id: d04335506becd5970802f87ab0573b6307479222 14 November 2023, 15:33:21 UTC
65d71ee Fix warnings when using API (#12066) Summary: Fixes https://github.com/facebook/rocksdb/issues/11457. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12066 Reviewed By: cbi42 Differential Revision: D51259966 Pulled By: ajkr fbshipit-source-id: a158b6f341b6b48233d917bfe4d00b639dbd8619 14 November 2023, 04:03:44 UTC
e7896f0 Enable unit test `PrecludeLastLevelTest.RangeDelsCauseFileEndpointsToOverlap` (#12064) Summary: Fixes https://github.com/facebook/rocksdb/issues/11909. The test passes after the change in https://github.com/facebook/rocksdb/issues/11917 to start mock clock from a non-zero time. The reason for test failing is a bit complicated: - The Put here https://github.com/pdillinger/rocksdb/blob/e4ad4a0ef1b852dc203311fb885c673c891f08e0/db/compaction/tiered_compaction_test.cc#L2045 happens before mock clock advances beyond 0. - This causes oldest_key_time_ to be 0 for memtable. - oldest_ancester_time of the first L0 file becomes 0 - L0 -> L5/6 compaction output files sets `oldest_ancestoer_time` to the current time due to these lines: https://github.com/facebook/rocksdb/blob/509947ce2c970d296fd0d868455d560c7f778a57/db/compaction/compaction_job.cc#L1898C34-L1904. - This causes some small sequence number to be mapped to current time: https://github.com/facebook/rocksdb/blob/509947ce2c970d296fd0d868455d560c7f778a57/db/compaction/compaction_job.cc#L301 - Keys in L6 is being moved up to L5 due to the unexpected seqno_to_time mapping - When compacting keys from last level to the penultimate level, we only check keys to be within user key range of penultimate level input files. If we compact the following file 3 with file 1 and output keys to L5, we can get the reported inconsistency bug. ``` L5: file 1 [K5@20, K10@kMaxSeqno], file 2 [K10@30, K14@34) L6: file 3 [K6@5, K10@20] ``` https://github.com/facebook/rocksdb/issues/12063 will add fixes to check internal key range when compacting keys from last level up to the penultimate level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12064 Test Plan: the unit test passes Reviewed By: ajkr Differential Revision: D51281149 Pulled By: cbi42 fbshipit-source-id: 00b7f026c453454d9f3af5b2de441383a96f0c62 13 November 2023, 23:26:52 UTC
8b8f6c6 ColumnFamilyHandle Nullcheck in GetEntity and MultiGetEntity (#12057) Summary: - Add missing null check for ColumnFamilyHandle in `GetEntity()` - `FailIfCfHasTs()` now returns `Status::InvalidArgument()` if `column_family` is null. `MultiGetEntity()` can rely on this for cfh null check. - Added `DeleteRange` API using Default Column Family to be consistent with other major APIs (This was also causing Java Test failure after the `FailIfCfHasTs()` change) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12057 Test Plan: - Updated `DBWideBasicTest::GetEntityAsPinnableAttributeGroups` to include null CF case - Updated `DBWideBasicTest::MultiCFMultiGetEntityAsPinnableAttributeGroups` to include null CF case Reviewed By: jowlyzhang Differential Revision: D51167445 Pulled By: jaykorean fbshipit-source-id: 1c1e44fd7b7df4d2dc3bb2d7d251da85bad7d664 13 November 2023, 22:30:04 UTC
b3ffca0 DBImpl::DelayWrite: Remove bad WRITE_STALL histogram (#12067) Summary: When delay didn't happen, histogram WRITE_STALL is still recorded, and ticker STALL_MICROS is not recorded. This is a bug, neither WRITE_STALL or STALL_MICROS should not be recorded when delay did not happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12067 Reviewed By: cbi42 Differential Revision: D51263133 Pulled By: ajkr fbshipit-source-id: bd82d8328fe088d613991966e83854afdabc6a25 13 November 2023, 20:48:44 UTC
9fb6851 fix(StackableDB): Resume API (#12060) Summary: When I call `DBWithTTLImpl::Resume()`, it returns `Status::NotSupported`. Did `StackableDB` miss this API ? Thanks ! Pull Request resolved: https://github.com/facebook/rocksdb/pull/12060 Reviewed By: jaykorean Differential Revision: D51202742 Pulled By: ajkr fbshipit-source-id: 5e01a54a42efd81fd57b3c992b9af8bc45c59c9c 13 November 2023, 20:09:58 UTC
509947c Quarantine files in a limbo state after a manifest error (#12030) Summary: Part of the procedures to handle manifest IO error is to disable file deletion in case some files in limbo state get deleted prematurely. This is not ideal because: 1) not all the VersionEdits whose commit encounter such an error contain updates for files, disabling file deletion sometimes are not necessary. 2) `EnableFileDeletion` has a force mode that could make other threads accidentally disrupt this procedure in recovery. 3) Disabling file deletion as a whole is also not as efficient as more precisely tracking impacted files from being prematurely deleted. This PR replaces this mechanism with tracking such files and quarantine them from being deleted in `ErrorHandler`. These are the types of files being actively tracked in quarantine in this PR: 1) new table files and blob files from a background job 2) old manifest file whose immediately following new manifest file's CURRENT file creation gets into unclear state. Current handling is not sufficient to make sure the old manifest file is kept in case it's needed. Note that WAL logs are not part of the quarantine because `min_log_number_to_keep` is a safe mechanism and it's only updated after successful manifest commits so it can prevent this premature deletion issue from happening. We track these files' file numbers because they share the same file number space. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12030 Test Plan: Modified existing unit tests Reviewed By: ajkr Differential Revision: D51036774 Pulled By: jowlyzhang fbshipit-source-id: 84ef26271fbbc888ef70da5c40fe843bd7038716 11 November 2023, 16:11:11 UTC
0ffc0c7 Allow `TtlMergeOperator` to wrap an unregistered `MergeOperator` (#12056) Summary: Followed mrambacher's first suggestion in https://github.com/facebook/rocksdb/pull/12044#issuecomment-1800706148. This change allows serializing a `TtlMergeOperator` that wraps an unregistered `MergeOperator`. Such a `TtlMergeOperator` cannot be loaded (validation will fail in `TtlMergeOperator::ValidateOptions()`), but that is OK for us currently. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12056 Reviewed By: hx235 Differential Revision: D51125097 Pulled By: ajkr fbshipit-source-id: 8ed3705e8d36ab473673b9198eea6db64397ed15 11 November 2023, 00:57:17 UTC
c6c683a Remove the default force behavior for `EnableFileDeletion` API (#12001) Summary: Disabling file deletion can be critical for operations like making a backup, recovery from manifest IO error (for now). Ideally as long as there is one caller requesting file deletion disabled, it should be kept disabled until all callers agree to re-enable it. So this PR removes the default forcing behavior for the `EnableFileDeletion` API, and users need to explicitly pass the argument if they insisted on doing so knowing the consequence of what can be potentially disrupted. This PR removes the API's default argument value so it will cause breakage for all users that are relying on the default value, regardless of whether the forcing behavior is critical for them. When fixing this breakage, it's good to check if the forcing behavior is indeed needed and potential disruption is OK. This PR also makes unit test that do not need force behavior to do a regular enable file deletion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12001 Reviewed By: ajkr Differential Revision: D51214683 Pulled By: jowlyzhang fbshipit-source-id: ca7b1ebf15c09eed00f954da2f75c00d2c6a97e4 10 November 2023, 22:35:54 UTC
5ef92b8 Add rocksdb_options_set_cf_paths (#11151) Summary: This PR adds a missing set function for rocksdb_options in the C-API: rocksdb_options_set_cf_paths(). Without this function, users cannot specify different paths for different column families as it will fall back to db_paths. As a bonus, this PR also includes rocksdb_sst_file_metadata_get_directory() to the C api -- a missing public function that will also make the test easier to write. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11151 Test Plan: Augment existing c_test to verify the specified cf_path. Reviewed By: hx235 Differential Revision: D51201888 Pulled By: ajkr fbshipit-source-id: 62a96451f26fab60ada2005ede3eea8e9b431f30 10 November 2023, 19:36:11 UTC
73d223c Add auto_tuned option to RateLimiter C API (#12058) Summary: #### Problem While the RocksDB C API does have the RateLimiter API, it does not expose the auto_tuned option. #### Summary of Change This PR exposes auto_tuned RateLimiter option in RocksDB C API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12058 Test Plan: Augment the C API existing test to cover the new API. Reviewed By: cbi42 Differential Revision: D51201933 Pulled By: ajkr fbshipit-source-id: 5bc595a9cf9f88f50fee797b729ba96f09ed8266 10 November 2023, 17:53:09 UTC
dfaf4dc Stubs for piping write time (#12043) Summary: As titled. This PR contains the API and stubbed implementation for piping write time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12043 Reviewed By: pdillinger Differential Revision: D51076575 Pulled By: jowlyzhang fbshipit-source-id: 3b341263498351b9ccaff27cf35d5aeb5bdf0cf1 09 November 2023, 23:58:07 UTC
c4c62c2 Support to use environment variable to test customer encryption plugins (#12025) Summary: The CreateEnvTest.CreateEncryptedFileSystem unit test is to verify the creation functionality of EncryptedFileSystem, but now it just support the builtin CTREncryptionProvider class. This patch make it flexible to use environment variable `TEST_FS_URI`, it is useful to test customer encryption plugins. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12025 Reviewed By: anand1976 Differential Revision: D50799656 Pulled By: ajkr fbshipit-source-id: dbcacfefbf07de9c7803f7707b34c5193bec17bf 09 November 2023, 18:45:13 UTC
e90e982 Drop wal record when sequence is illegal (#11985) Summary: - Our database is corrupted, causing some sequences of wal record to be invalid (but the `record_checksum` looks fine). - When we RecoverLogFiles in WALRecoveryMode::kPointInTimeRecovery, `assert(seq <= kMaxSequenceNumber)` will be failed. - When it is found that sequence is illegal, can we drop the file to recover as much data as possible ? Thx ! Pull Request resolved: https://github.com/facebook/rocksdb/pull/11985 Reviewed By: anand1976 Differential Revision: D50698039 Pulled By: ajkr fbshipit-source-id: 1e42113b58823088d7c0c3a92af5b3efbb5f5296 09 November 2023, 18:43:16 UTC
f9b7877 Ensure `target_include_directories()` is called with correct target name (#12055) Summary: `${PROJECT_NAME}` isn't guaranteed to match a target name when an artefact suffix is specified. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12055 Reviewed By: anand1976 Differential Revision: D51125532 Pulled By: ajkr fbshipit-source-id: cd1f4a5b11eb517c379e3ee3f78592f7e606a034 09 November 2023, 18:41:38 UTC
f337533 Ensure and clarify how RocksDB calls TablePropertiesCollector's functions (#12053) Summary: **Context/Summary:** It's intuitive for users to assume `TablePropertiesCollector::Finish()` is called only once by RocksDB internal by the word "finish". However, this is currently not true as RocksDB also calls this function in `BlockBased/PlainTableBuilder::GetTableProperties()` to populate user collected properties on demand. This PR avoids that by moving that populating to where we first call `Finish()` (i.e, `NotifyCollectTableCollectorsOnFinish`) Bonus: clarified in the API that `GetReadableProperties()` will be called after `Finish()` and added UT to ensure that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12053 Test Plan: - Modified test `DBPropertiesTest.GetUserDefinedTableProperties` to ensure `Finish()` only called once. - Existing test particularly `db_properties_test, table_properties_collector_test` verify the functionality `NotifyCollectTableCollectorsOnFinish` and `GetReadableProperties()` are not broken by this change. Reviewed By: ajkr Differential Revision: D51095434 Pulled By: hx235 fbshipit-source-id: 1c6275258f9b99dedad313ee8427119126817973 08 November 2023, 22:00:36 UTC
65cde19 Safer wrapper for std::atomic, use in HCC (#12051) Summary: See new atomic.h file comments for motivation. I have updated HyperClockCache to use the new atomic wrapper, fixing a few cases where an implicit conversion was accidentally used and therefore mixing std::memory_order_seq_cst where release/acquire ordering (or relaxed) was intended. There probably wasn't a real bug because I think all the cases happened to be in single-threaded contexts like constructors/destructors or statistical ops like `GetCapacity()` that don't need any particular ordering constraints. Recommended follow-up: * Replace other uses of std::atomic to help keep them safe from bugs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12051 Test Plan: Did some local correctness stress testing with cache_bench. Also triggered 15 runs of fbcode_blackbox_crash_test and saw no related failures (just 3 failures in ~CacheWithSecondaryAdapter(), already known) No performance difference seen before & after running simultaneously: ``` (while ./cache_bench -cache_type=fixed_hyper_clock_cache -populate_cache=0 -cache_size=3000000000 -ops_per_thread=500000 -threads=12 -histograms=0 2>&1 | grep parallel; do :; done) | awk '{ s += $3; c++; print "Avg time: " (s/c);}' ``` ... for both fixed_hcc and auto_hcc. Reviewed By: jowlyzhang Differential Revision: D51090518 Pulled By: pdillinger fbshipit-source-id: eeb324facb3185584603f9ea0c4de6f32919a2d7 08 November 2023, 21:28:43 UTC
e406c26 Update the API comments of NewRandomRWFile() (#11820) Summary: Env::NewRandomRWFile() will not create the file if it doesn't exist, as the test saying https://github.com/facebook/rocksdb/blob/main/env/env_test.cc#L2208. This patch correct the comments of Env::NewRandomRWFile(), it may mislead the developers who use rocksdb Env() as an utility. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11820 Reviewed By: ajkr Differential Revision: D50176707 Pulled By: jowlyzhang fbshipit-source-id: a6ee469f549360de8d551a4fe8517b4450df7b15 08 November 2023, 20:28:00 UTC
9af25a3 Clean up AutoHyperClockTable::PurgeImpl (#12052) Summary: There was some unncessary logic (e.g. a dead assignment to home_shift) left over from earlier revision of the code. Also, rename confusing ChainRewriteLock::new_head_ / GetNewHead() to saved_head_ / GetSavedHead(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/12052 Test Plan: existing tests Reviewed By: jowlyzhang Differential Revision: D51091499 Pulled By: pdillinger fbshipit-source-id: 4b191b60a2b16085681e59d49c4d97e802869db8 08 November 2023, 00:35:19 UTC
58f2a29 Expose Options::periodic_compaction_seconds through C API (#12019) Summary: fixes [11090](https://github.com/facebook/rocksdb/issues/11090) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12019 Reviewed By: jaykorean Differential Revision: D51076427 Pulled By: cbi42 fbshipit-source-id: de353ff66c7f73aba70ab3379e20d8c40f50d873 07 November 2023, 20:46:50 UTC
c181667 FIX new blog post (JNI performance) Locate images correctly (#12050) Summary: We set up the images / references to the images wrongly in https://github.com/facebook/rocksdb/pull/11818 Images should be in the docs/static/images/… directory with an absolute reference to /static/images/… Make it so. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12050 Reviewed By: pdillinger Differential Revision: D51079811 Pulled By: jaykorean fbshipit-source-id: 4c1ab80d313b70d0e60eec94086451d7b2814922 07 November 2023, 19:58:58 UTC
c06309c Not to print unnecessary commands in Makefile (#11978) Summary: When I run `make check`, there is a command that should not be printed to screen, which is shown below. ```text ... ... Generating parallel test scripts for util_merge_operators_test Generating parallel test scripts for write_batch_with_index_test make[2]: Leaving directory '/home/z/rocksdb' make[1]: Leaving directory '/home/z/rocksdb' GEN check make[1]: Entering directory '/home/z/rocksdb' $DEBUG_LEVEL is 1, $LIB_MODE is shared Makefile:185: Warning: Compiling in debug mode. Don't use the resulting binary in production printf '%s\n' '' \ 'To monitor subtest <duration,pass/fail,name>,' \ ' run "make watch-log" in a separate window' ''; \ { \ printf './%s\n' db_bloom_filter_test deletefile_test env_test c_test; \ find t -name 'run-*' -print; \ } \ | perl -pe 's,(^.*MySQLStyleTransactionTest.*$|^.*SnapshotConcurrentAccessTest.*$|^.*SeqAdvanceConcurrentTest.*$|^t/run-table_test-HarnessTest.Randomized$|^t/run-db_test-.*(?:FileCreationRandomFailure|EncodeDecompressedBlockSizeTest)$|^.*RecoverFromCorruptedWALWithoutFlush$),100 $1,' | sort -k1,1gr | sed 's/^[.0-9]* //' \ | grep -E '.' \ | grep -E -v '"^$"' \ | build_tools/gnu_parallel -j100% --plain --joblog=LOG --eta --gnu \ --tmpdir=/dev/shm/rocksdb.6lop '{} >& t/log-{/} || bash -c "cat t/log-{/}; exit $?"' ; \ parallel_retcode=$? ; \ awk '{ if ($7 != 0 || $8 != 0) { if ($7 == "Exitval") { h = $0; } else { if (!f) print h; print; f = 1 } } } END { if(f) exit 1; }' < LOG ; \ awk_retcode=$?; \ if [ $parallel_retcode -ne 0 ] || [ $awk_retcode -ne 0 ] ; then exit 1 ; fi To monitor subtest <duration,pass/fail,name>, run "make watch-log" in a separate window Computers / CPU cores / Max jobs to run 1:local / 16 / 16 ``` The `printf` command will make the output confusing. It would be better not to print it. **Before Change** ![image](https://github.com/facebook/rocksdb/assets/30565051/92cf681a-40b7-462e-ae5b-23eeacbb8f82) **After Change** ![image](https://github.com/facebook/rocksdb/assets/30565051/4a70b04b-e4ef-4bed-9ce0-d942ed9d132e) **Test Plan** Not applicable. This is a trivial change, only to add a `@` before a Makefile command, and it will not impact any workflows. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11978 Reviewed By: jaykorean Differential Revision: D51076606 Pulled By: cbi42 fbshipit-source-id: dc079ab8f60a5a5b9d04a83888884657b2e442ff 07 November 2023, 19:44:20 UTC
16ae354 AutoHCC: Improve/fix allocation/detection of grow homes (#12047) Summary: This change simplifies some code and logic by introducing a new atomic field that tracks the next slot to grow into. It should offer slightly better performance during the growth phase (not measurable; see Test Plan below) and fix a suspected (but unconfirmed) bug like this: * Thread 1 is in non-trivial SplitForGrow() with grow_home=n. * Thread 2 reaches Grow() with grow_home=2n, and waits at the start of SplitForGrow() for the rewrite lock on n. By this point, the head at 2n is marked with the new shift amount but no chain is locked. * Thread 3 reaches Grow() with grow_home=4n, and waits before SplitForGrow() for the rewrite lock on n. By this point, the head at 4n is marked with the new shift amount but no chain is locked. * Thread 4 reaches Grow() with grow_home=8n and meets no resistance to proceeding through a SplitForGrow() on an empty chain, permanently missing out on any entries from chain n that should have ended up here. This is fixed by not updating the shift amount at the grow_home head until we have checked the preconditions that Grow()s feeding into this one have completed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12047 Test Plan: Some manual cache_bench stress runs, and about 20 triggered runs of fbcode_blackbox_crash_test No discernible performance difference on this benchmark, running before & after in parallel for a few minutes: ``` (while ./cache_bench -cache_type=auto_hyper_clock_cache -populate_cache=0 -cache_size=3000000000 -ops_per_thread=50000 -threads=12 -histograms=0 2>&1 | grep parallel; do :; done) | awk '{ s += $3; c++; print "Avg time: " (s/c);}' ``` Reviewed By: jowlyzhang Differential Revision: D51017007 Pulled By: pdillinger fbshipit-source-id: 5f6d6a6194fc966f94693f3205ed75c87cdad269 07 November 2023, 18:40:39 UTC
2adef53 AttributeGroups - PutEntity Implementation (#11977) Summary: Write Path for AttributeGroup Support. The new `PutEntity()` API uses `WriteBatch` and atomically writes WideColumns entities in multiple Column Families. Combined the release note from PR https://github.com/facebook/rocksdb/issues/11925 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11977 Test Plan: - `DBWideBasicTest::MultiCFMultiGetEntityAsPinnableAttributeGroups` updated - `WriteBatchTest::AttributeGroupTest` added - `WriteBatchTest::AttributeGroupSavePointTest` added Reviewed By: ltamasi Differential Revision: D50457122 Pulled By: jaykorean fbshipit-source-id: 4997b265e415588ce077933082dcd1ac3eeae2cd 07 November 2023, 00:52:51 UTC
92dc5f3 AutoHCC: fix a bug with "blind" Insert (#12046) Summary: I have finally tracked down and fixed a bug affecting AutoHCC that was causing CI crash test assertion failures in AutoHCC when using secondary cache, but I was only able to reproduce locally a couple of times, after very long runs/repetitions. It turns out that the essential feature used by secondary cache to trigger the bug is Insert without keeping a handle, which is otherwise rarely used in RocksDB and not incorporated into cache_bench (also used for targeted correctness stress testing) until this change (new option `-blind_insert_percent`). The problem was in copying some logic from FixedHCC that makes the entry "sharable" but unreferenced once populated, if no reference is to be saved. The problem in AutoHCC is that we can only add the entry to a chain after it is in the sharable state, and must be removed from the chain while in the "under (de)construction" state and before it is back in the "empty" state. Also, it is possible for Lookup to find entries that are not connected to any chain, by design for efficiency, and for Release to erase_if_last_ref. Therefore, we could have * Thread 1 starts to Insert a cache entry without keeping ref, and pauses before adding to the chain. * Thread 2 finds it with Lookup optimizations, and then does Release with `erase_if_last_ref=true` causing it to trigger erasure on the entry. It successfully locks the home chain for the entry and purges any entries pending erasure. It is OK that this entry is not found on the chain, as another thread is allowed to remove it from the chain before we are able to (but after is it marked for (de)construction). And after the purge of the chain, the entry is marked empty. * Thread 1 resumes in adding the slot (presumed entry) to the home chain for what was being inserted, but that now violates invariants and sets up a race or double-chain-reference as another thread could insert a new entry in the slot and try to insert into a different chain. This is easily fixed by holding on to a reference until inserted onto the chain. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12046 Test Plan: As I don't have a reliable local reproducer, I triggered 20 runs of internal CI on fbcode_blackbox_crash_test that were previously failing in AutoHCC with about 1/3 probability, and they all passed. Also re-enabling AutoHCC in the crash test with this change. (Revert https://github.com/facebook/rocksdb/issues/12000) Reviewed By: jowlyzhang Differential Revision: D51016979 Pulled By: pdillinger fbshipit-source-id: 3840fb829d65b97c779d8aed62a4a4a433aeff2b 07 November 2023, 00:06:01 UTC
0ecfc4f AttributeGroups - GetEntity Implementation (#11943) Summary: Implementation of `GetEntity()` API that returns wide-column entities as AttributeGroups from multiple column families for a single key. Regarding the definition of Attribute groups, please see the detailed example description in PR https://github.com/facebook/rocksdb/issues/11925 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11943 Test Plan: - `DBWideBasicTest::GetEntityAsPinnableAttributeGroups` added will enable the new API in the `db_stress` after merging Reviewed By: ltamasi Differential Revision: D50195794 Pulled By: jaykorean fbshipit-source-id: 218d54841ac7e337de62e13b1233b0a99bd91af3 06 November 2023, 23:04:41 UTC
2dab137 Mark more files for periodic compaction during offpeak (#12031) Summary: - The struct previously named `OffpeakTimeInfo` has been renamed to `OffpeakTimeOption` to indicate that it's a user-configurable option. Additionally, a new struct, `OffpeakTimeInfo`, has been introduced, which includes two fields: `is_now_offpeak` and `seconds_till_next_offpeak_start`. This change prevents the need to parse the `daily_offpeak_time_utc` string twice. - It's worth noting that we may consider adding more fields to the `OffpeakTimeInfo` struct, such as `elapsed_seconds` and `total_seconds`, as needed for further optimization. - Within `VersionStorageInfo::ComputeFilesMarkedForPeriodicCompaction()`, we've adjusted the `allowed_time_limit` to include files that are expected to expire by the next offpeak start. - We might explore further optimizations, such as evenly distributing files to mark during offpeak hours, if the initial approach results in marking too many files simultaneously during the first scoring in offpeak hours. The primary objective of this PR is to prevent periodic compactions during non-offpeak hours when offpeak hours are configured. We'll start with this straightforward solution and assess whether it suffices for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12031 Test Plan: Unit Tests added - `DBCompactionTest::LevelPeriodicCompactionOffpeak` for Leveled - `DBTestUniversalCompaction2::PeriodicCompaction` for Universal Reviewed By: cbi42 Differential Revision: D50900292 Pulled By: jaykorean fbshipit-source-id: 267e7d3332d45a5d9881796786c8650fa0a3b43d 06 November 2023, 19:43:59 UTC
a399bbc More fixes and enhancements for cache_bench (#12041) Summary: Mostly things for using cache_bench for stress/correctness testing. * Make secondary_cache_uri option work with HCC (forgot to update when secondary support was added for HCC) * Add -pinned_ratio option to keep more than just one entry per thread pinned. This can be important for testing eviction stress. * Add -vary_capacity_ratio for testing dynamically changing capacity. Also added some overrides to CacheWrapper to help with diagnostic output. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12041 Test Plan: manual, make check Reviewed By: jowlyzhang Differential Revision: D51013430 Pulled By: pdillinger fbshipit-source-id: 7914adc1218f0afacace05ccd77d3bfb91a878d0 06 November 2023, 17:59:09 UTC
6979e9d Create blog post from report on JNI performance work (#11818) Summary: We did some investigation into the performance of JNI for workloads emulating how data is carried between Java and C++ for RocksDB. The repo for our performance work lives at https://github.com/evolvedbinary/jni-benchmarks This is a report text from that work, extracted as a blog post. Along with some supporting files (png, pdf of graphs). Pull Request resolved: https://github.com/facebook/rocksdb/pull/11818 Reviewed By: jaykorean Differential Revision: D50907467 Pulled By: pdillinger fbshipit-source-id: ec6a43c83bd9ad94a3d11cfd87031e613acf7659 06 November 2023, 17:15:00 UTC
520c64f Add missing status check in ExternalSstFileIngestionJob and ImportColumnFamilyJob (#12042) Summary: .. and update some unit tests that failed with this change. See comment in ExternalSSTFileBasicTest.IngestFileWithCorruptedDataBlock for more explanation. The missing status check is not caught by `ASSERT_STATUS_CHECKED=1` due to this line: https://github.com/facebook/rocksdb/blob/8505b26db19871a8c8782a35a7b5be9d321d45e0/table/block_based/block.h#L394. Will explore if we can remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12042 Test Plan: existing unit tests. Reviewed By: ajkr Differential Revision: D50994769 Pulled By: cbi42 fbshipit-source-id: c91615bccd6094a91634c50b98401d456cbb927b 06 November 2023, 15:41:36 UTC
19768a9 Add jni Support for API CreateColumnFamilyWithImport (#11646) Summary: - Add the following missing options to src/main/java/org/rocksdb/ImportColumnFamilyOptions.java and in java/rocksjni/import_column_family_options.cc in RocksJava. - Add the struct to src/main/java/org/rocksdb/ExportImportFilesMetaData.java and in java/rocksjni/export_import_files_metadatajni.cc in RocksJava. - Add New Java API `createColumnFamilyWithImport` to src/main/java/org/rocksdb/RocksDB.java - Add New Java API `exportColumnFamily` to src/main/java/org/rocksdb/Checkpoint.java Pull Request resolved: https://github.com/facebook/rocksdb/pull/11646 Test Plan: - added unit tests for exportColumnFamily in org.rocksdb.CheckpointTest - added unit tests for createColumnFamilyWithImport to org.rocksdb.ImportColumnFamilyTest Reviewed By: ajkr Differential Revision: D50889700 Pulled By: cbi42 fbshipit-source-id: d623b35e445bba62a0d3c007d74352e937678f6c 06 November 2023, 15:38:42 UTC
b48480c Enable `TestIterateAgainstExpected()` in more crash tests (#12040) Summary: db_stress flag `verify_iterator_with_expected_state_one_in` is only enabled for in crash test if --simple flag is set. This PR enables it for all supported crash tests by enabling it by default. This adds coverage for --txn and --enable_ts crash tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12040 Test Plan: ran crash tests that disabled this flag before for a few hours ``` python3 ./tools/db_crashtest.py blackbox --verify_iterator_with_expected_state_one_in=1 --txn --txn_write_policy=[0,1,2] python3 ./tools/db_crashtest.py blackbox --verify_iterator_with_expected_state_one_in=1 --enable_ts ``` Reviewed By: ajkr, hx235 Differential Revision: D50980001 Pulled By: cbi42 fbshipit-source-id: 3daf6b4c32bdddc5df057240068162aa1a907587 03 November 2023, 23:27:11 UTC
8505b26 Fix stress test error message for black/whitebox test to catch failures (#12039) Summary: black/whitebox crash test relies on error/fail keyword in stderr to catch stress test failure. If a db_stress run prints an error message without these keyword, and then is killed before it graceful exits and prints out "Verification failed" here (https://github.com/facebook/rocksdb/blob/2648e0a747303e63796315049b9005c7320356c0/db_stress_tool/db_stress_driver.cc#L256), the error won't be caught. This is more likely to happen if db_stress is printing a stack trace. This PR fixes some error messages. Ideally in the future we should not rely on searching for keywords in stderr to determine failed stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12039 Test Plan: ``` Added the following change on top of this PR to simulate exit without relevant keyword: @@ -1586,6 +1587,8 @@ class NonBatchedOpsStressTest : public StressTest { assert(thread); assert(!rand_column_families.empty()); assert(!rand_keys.empty()); + fprintf(stderr, "Inconsistency"); + thread->shared->SafeTerminate(); python3 ./tools/db_crashtest.py blackbox --simple --verify_iterator_with_expected_state_one_in=1 --interval=10 will print a stack trace but continue to run db_stress. ``` Reviewed By: jaykorean Differential Revision: D50960076 Pulled By: cbi42 fbshipit-source-id: 5c60a1be04ce4a43adbd33f040d54434f2ae24c9 03 November 2023, 16:53:22 UTC
2648e0a Fix a bug when ingest plaintable sst file (#11969) Summary: Plaintable doesn't support SeekToLast. And GetIngestedFileInfo is using SeekToLast without checking the validity. We are using IngestExternalFile or CreateColumnFamilyWithImport with some sst file in PlainTable format . But after running for a while, compaction error often happens. Such as ![image](https://github.com/facebook/rocksdb/assets/13954644/b4fa49fc-73fc-49ce-96c6-f198a30800b8) I simply add some std::cerr log to find why. ![image](https://github.com/facebook/rocksdb/assets/13954644/2cf1d5ff-48cc-4125-b917-87090f764fcd) It shows that the smallest key is always equal to largest key. ![image](https://github.com/facebook/rocksdb/assets/13954644/6d43e978-0be0-4306-aae3-f9e4ae366395) Then I found the root cause is that PlainTable do not support SeekToLast, so the smallest key is always the same with the largest I try to write an unit test. But it's not easy to reproduce this error. (This PR is similar to https://github.com/facebook/rocksdb/pull/11266. Sorry for open another PR) Pull Request resolved: https://github.com/facebook/rocksdb/pull/11969 Reviewed By: ajkr Differential Revision: D50933854 Pulled By: cbi42 fbshipit-source-id: 6c6af53c1388922cbabbe64ed3be1cdc58df5431 02 November 2023, 20:45:37 UTC
a429105 Save the correct user comparator name in OPTIONS file (#12037) Summary: I noticed the user comparator name in OPTIONS file can be incorrect when working on a recent stress test failure. The name of the comparator retrieved via the "Comparator::GetRootComparator" API is saved in OPTIONS file as the user comparator. The intention was to get the user comparator wrapped in the internal comparator. However `ImmutableCFOptions.user_comparator` has always been a user comparator of type `Comparator`. The corresponding `GetRootComparator` API is also defined only for user comparator type `Comparator`, not the internal key comparator type `InternalKeyComparator`. For built in comparator `BytewiseComparator` and `ReverseBytewiseComparator`, there is no difference between `Comparator::Name` and `Comparator::GetRootComparator::Name` because these built in comparators' root comparator is themselves. However, for built in comparator `BytewiseComparatorWithU64Ts` and `ReverseBytewiseComparatorWithU64Ts`, there are differences. So this change update the logic to persist the user comparator's name, not its root comparator's name. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12037 Test Plan: The restore flow in stress test, which relies on converting Options object to string and back to Options object is updated to help validate comparator object can be correctly serialized and deserialized with the OPTIONS file mechanism Updated unit test to use a comparator that has a root comparator that is not itself. Reviewed By: cbi42 Differential Revision: D50909750 Pulled By: jowlyzhang fbshipit-source-id: 9086d7135c7a6f4b5565fb47fce194ea0a024f52 02 November 2023, 20:27:59 UTC
8e1adab add RocksDB#clipColumnFamily to Java API (#11868) Summary: ### main change: - add java clipColumnFamily api in Rocksdb.java The method signature of the new API is ``` public void clipColumnFamily(final ColumnFamilyHandle columnFamilyHandle, final byte[] beginKey, final byte[] endKey) ``` ### Test add unit test RocksDBTest#clipColumnFamily() Pull Request resolved: https://github.com/facebook/rocksdb/pull/11868 Reviewed By: jaykorean Differential Revision: D50889783 Pulled By: cbi42 fbshipit-source-id: 7f545171ad9adb9c20bdd92efae2e6bc55d5703f 02 November 2023, 15:00:08 UTC
4b013dc Remove VersionEdit's friends pattern (#12024) Summary: Almost each of VersionEdit private member has its own getter and setter. Current code access them with a combination of directly accessing private members and via getter and setters. There is no obvious benefits to have this pattern except potential performance gains. I tried this simple benchmark for removing the friends pattern completely, and there is no obvious regression. So I think it would good to remove VersionEdit's friends completely. ```TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num_column_families=10 -num=50000000``` With change: fillseq : 2.994 micros/op 333980 ops/sec 149.710 seconds 50000000 operations; 36.9 MB/s fillseq : 3.033 micros/op 329656 ops/sec 151.673 seconds 50000000 operations; 36.5 MB/s fillseq : 2.991 micros/op 334369 ops/sec 149.535 seconds 50000000 operations; 37.0 MB/s Without change: fillseq : 3.015 micros/op 331715 ops/sec 150.732 seconds 50000000 operations; 36.7 MB/s fillseq : 3.044 micros/op 328553 ops/sec 152.182 seconds 50000000 operations; 36.3 MB/s fillseq : 3.091 micros/op 323520 ops/sec 154.550 seconds 50000000 operations; 35.8 MB/s Pull Request resolved: https://github.com/facebook/rocksdb/pull/12024 Reviewed By: pdillinger Differential Revision: D50806066 Pulled By: jowlyzhang fbshipit-source-id: 35d287ce638a38c30f243f85992e615b4c90eb27 01 November 2023, 19:04:11 UTC
04225a2 Fix for RecoverFromRetryableBGIOError starting with recovery_in_prog_ false (#11991) Summary: cbi42 helped investigation and found a potential scenario where `RecoverFromRetryableBGIOError()` may start with `recovery_in_prog_ ` set as false. (and other booleans like `bg_error_` and `soft_error_no_bg_work_`) **Thread 1** - `StartRecoverFromRetryableBGIOError()`): (mutex held) sets `recovery_in_prog_ = true` **Thread 1's `recovery_thread_`** - (waits for mutex and acquires it) - `RecoverFromRetryableBGIOError()` -> `ResumeImpl()` -> `ClearBGError()`: sets `recovery_in_prog_ = false` - `ClearBGError()` -> `NotifyOnErrorRecoveryEnd()`: releases `mutex` **Thread 2** - `StartRecoverFromRetryableBGIOError()`): (mutex held) sets `recovery_in_prog_ = true` - Waits for Thread 1 (`recovery_thread_`) to finish **Thread 1's `recovery_thread_`** - re-lock mutex in `NotifyOnErrorRecoveryEnd()` - Still inside `RecoverFromRetryableBGIOError()`: sets `recovery_in_prog_ = false` - Done **Thread 2's `recovery_thread_`** - recovery thread started with `recovery_in_prog_` set as `false` # Fix - Remove double-clearing `bg_error_`, `recovery_in_prog_` and other fields after `ResumeImpl()` already returned `OK()`. - Minor typo and linter fixes in `DBErrorHandlingFSTest` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11991 Test Plan: - `DBErrorHandlingFSTest::MultipleRecoveryThreads` added to reproduce the scenario. - Adding `assert(recovery_in_prog_);` at the start of `ErrorHandler::RecoverFromRetryableBGIOError()` fails the test without the fix and succeeds with the fix as expected. Reviewed By: cbi42 Differential Revision: D50506113 Pulled By: jaykorean fbshipit-source-id: 6dabe01e9ecd3fc50bbe9019587f2f4858bed9c6 31 October 2023, 23:13:36 UTC
0b057a7 Initialize comparator explicitly in PrepareOptionsForRestoredDB() (#12034) Summary: This is to fix below error seeing in stress test: ``` Failure in DB::Open in backup/restore with: Invalid argument: Cannot open a column family and disable user-defined timestamps feature if its existing persist_user_defined_timestamps flag is not false. ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12034 Reviewed By: cbi42 Differential Revision: D50860689 Pulled By: jowlyzhang fbshipit-source-id: ebc6cf0a75caa43d3d3bd58e3d5c2ac754cc637c 31 October 2023, 23:10:48 UTC
e0c45c1 Fix the ZStd checksum (#12005) Summary: Somehow we had the wrong checksum when validating the ZStd 1.5.5 download for RocksJava in the previous Pull Request - https://github.com/facebook/rocksdb/pull/9304. This PR fixes that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12005 Reviewed By: jaykorean Differential Revision: D50840338 Pulled By: cbi42 fbshipit-source-id: 8a92779d3bef013d812eecb89aaaf33fc73991ec 31 October 2023, 19:23:34 UTC
2818a74 Initialize merge operator explicitly in PrepareOptionsForRestoredDB() (#12033) Summary: We are seeing the following stress test failure: `Failure in DB::Get in backup/restore with: Invalid argument: merge_operator is not properly initialized. Verification failed: Backup/restore failed: Invalid argument: merge_operator is not properly initialized.`. The reason is likely that `GetColumnFamilyOptionsFromString()` does not set merge operator if it's a customized merge operator. Fixing it by initializing merge operator explicitly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12033 Test Plan: this repro gives the error consistently before this PR ``` ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --async_io=0 --atomic_flush=1 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=1048576000000 --backup_one_in=50 --batch_protection_bytes_per_key=8 --block_protection_bytes_per_key=2 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=31.014388066505518 --bottommost_compression_type=lz4hc --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_size=33554432 --cache_type=fixed_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_readahead_size=0 --compaction_ttl=10 --compressed_secondary_cache_ratio=0.0 --compressed_secondary_cache_size=0 --compression_checksum=1 --compression_max_dict_buffer_bytes=4095 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_thread_tracking=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=xxh64 --flush_one_in=1000000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=10 --index_type=2 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=1000000 --long_running_snapshots=1 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=8388608 --memtable_max_range_deletions=1000 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=2 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=50 --recycle_log_file_num=0 --reopen=0 --secondary_cache_fault_one_in=0 --set_options_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --subcompactions=1 --sync=0 --sync_fault_injection=1 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=1 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=10 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=35 ``` Reviewed By: hx235 Differential Revision: D50825558 Pulled By: cbi42 fbshipit-source-id: 8468dc0444c112415a515af8291ef3abec8a42de 31 October 2023, 14:39:41 UTC
76402c0 Fix incorrect parameters order in env_basic_test.cc (#11997) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11997 Reviewed By: jaykorean Differential Revision: D50608182 Pulled By: ajkr fbshipit-source-id: d33cfdb5adfea91175c8fa21e8b80e22f728f6c6 30 October 2023, 17:47:04 UTC
b3fd383 Remove build dependencies for java tests. (#12021) Summary: Final fix for https://github.com/facebook/rocksdb/issues/12013 - Reverting back changes on CirleCI explicit image declaration. - Removed CMake dependencies between java classed and java test classes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12021 Reviewed By: akankshamahajan15 Differential Revision: D50745392 Pulled By: ajkr fbshipit-source-id: 6a7a1da1e7e4da8da72130c9272915974e10fffc 30 October 2023, 17:08:19 UTC
60df39e Rate limiting stale sst files' deletion during recovery (#12016) Summary: As titled. If SstFileManager is available, deleting stale sst files will be delegated to it so it can be rate limited. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12016 Reviewed By: hx235 Differential Revision: D50670482 Pulled By: jowlyzhang fbshipit-source-id: bde5b76ea1d98e67f6b4f08bfba3db48e46aab4e 28 October 2023, 16:50:52 UTC
212b5bf Deep-copy Options in restored db for stress test to avoid race with SetOptions() (#12015) Summary: **Context** DB open will persist the `Options` in memory to options file and verify the file right after the write. The verification is done by comparing the options from parsing the written options file against the `Options` object in memory. Upon inconsistency, corruption such as https://github.com/facebook/rocksdb/blob/main/options/options_parser.cc#L725 will be returned. This verification assumes the `Options` object in memory is not changed from before the write till the verification. This assumption can break during [opening the restored db in stress test](https://github.com/facebook/rocksdb/blob/0f141352d8de2f743d222a6f2ff493a31dd2838c/db_stress_tool/db_stress_test_base.cc#L1784-L1799). This [line](https://github.com/facebook/rocksdb/blob/0f141352d8de2f743d222a6f2ff493a31dd2838c/db_stress_tool/db_stress_test_base.cc#L1770) makes it shares some pointer options (e.g, `std::shared_ptr<const FilterPolicy> filter_policy`) with other threads (e.g, SetOptions()) in db stress. And since https://github.com/facebook/rocksdb/pull/11838, filter_policy's field `bloom_before_level ` has now been mutable by SetOptions(). Therefore we started to see stress test failure like below: ``` Failure in DB::Open in backup/restore with: IO error: DB::Open() failed --- Unable to persist Options file: IO error: Unable to persist options.: Corruption: [RocksDBOptionsParser]:failed the verification on BlockBasedTable::: filter_policy.id Verification failed: Backup/restore failed: IO error: DB::Open() failed --- Unable to persist Options file: IO error: Unable to persist options.: Corruption: [RocksDBOptionsParser]:failed the verification on BlockBasedTable::: filter_policy.id db_stress: db_stress_tool/db_stress_test_base.cc:479: void rocksdb::StressTest::ProcessStatus(rocksdb::SharedState*, std::string, rocksdb::Status) const: Assertion `false' failed. ``` **Summary** This PR uses "deep copy" of the `options_` by CreateXXXFromString() to avoid sharing pointer options. **Test plan** Run the below db stress command that failed before this PR and pass after ``` ./db_stress --column_families=1 --threads=2 --preserve_unverified_changes=0 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_data_in_errors=True --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=10 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=1 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=0 --bottommost_compression_type=disable --bottommost_file_compaction_delay=86400 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_size=33554432 --cache_type=tiered_auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kXXH3 --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=2 --compaction_readahead_size=0 --compaction_ttl=0 --compressed_secondary_cache_ratio=0.3333333333333333 --compressed_secondary_cache_size=0 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=lz4 --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=1 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox --db_write_buffer_size=8388608 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_thread_tracking=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=14 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=1000000 --long_running_snapshots=1 --manual_wal_flush_one_in=1000 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=2500 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=4194304 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=500000 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=1000 --readahead_size=0 --readpercent=50 --recycle_log_file_num=0 --reopen=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --set_options_one_in=5 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=600 --subcompactions=3 --sync=0 --sync_fault_injection=0 --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=2 --unpartitioned_pinning=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=1 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=35 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12015 Reviewed By: pdillinger Differential Revision: D50666136 Pulled By: hx235 fbshipit-source-id: 804acc23aecb4eedfe5c44f732e86291f2420b2b 28 October 2023, 00:07:39 UTC
e230e4d Make OffpeakTimeInfo available in VersionSet (#12018) Summary: As mentioned in https://github.com/facebook/rocksdb/issues/11893, we are going to use the offpeak time information to pre-process TTL-based compactions. To do so, we need to access `daily_offpeak_time_utc` in `VersionStorageInfo::ComputeCompactionScore()` where we pick the files to compact. This PR is to make the offpeak time information available at the time of compaction-scoring. We are not changing any compaction scoring logic just yet. Will follow up in a separate PR. There were two ways to achieve what we want. 1. Make `MutableDBOptions` available in `ColumnFamilyData` and `ComputeCompactionScore()` take `MutableDBOptions` along with `ImmutableOptions` and `MutableCFOptions`. 2. Make `daily_offpeak_time_utc` and `IsNowOffpeak()` available in `VersionStorageInfo`. We chose the latter as it involves smaller changes. This change includes the following - Introduction of `OffpeakTimeInfo` and `IsNowOffpeak()` has been moved from `MutableDBOptions` - `OffpeakTimeInfo` added to `VersionSet` and it can be set during construction and by `ChangeOffpeakTimeInfo()` - During `SetDBOptions()`, if offpeak time info needs to change, it calls `MaybeScheduleFlushOrCompaction()` to re-compute compaction scores and process compactions as needed Pull Request resolved: https://github.com/facebook/rocksdb/pull/12018 Test Plan: - `DBOptionsTest::OffpeakTimes` changed to include checks for `MaybeScheduleFlushOrCompaction()` calls and `VersionSet`'s OffpeakTimeInfo value change during `SetDBOptions()`. - `VersionSetTest::OffpeakTimeInfoTest` added to test `ChangeOffpeakTimeInfo()`. `IsNowOffpeak()` tests moved from `DBOptionsTest::OffpeakTimes` Reviewed By: pdillinger Differential Revision: D50723881 Pulled By: jaykorean fbshipit-source-id: 3cff0291936f3729c0e9c7750834b9378fb435f6 27 October 2023, 22:56:48 UTC
526f36b Remove extra semicolon (#12017) Summary: As titled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12017 Reviewed By: hx235 Differential Revision: D50670406 Pulled By: jowlyzhang fbshipit-source-id: 28b3acd930ee676d78ebb47144047ce233fc11c5 26 October 2023, 00:48:21 UTC
52be8f5 Add APIs to query secondary cache capacity and usage for TieredCache (#12011) Summary: In `TieredCache`, the underlying compressed secondary cache is hidden from the user. So we need a way to query the capacity, as well as the portion of cache reservation charged to the compressed secondary cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12011 Test Plan: Update the unit tests Reviewed By: akankshamahajan15 Differential Revision: D50651943 Pulled By: anand1976 fbshipit-source-id: 06d1cb5edb75a790c919bce718e2ff65f5908220 25 October 2023, 23:54:50 UTC
8ee009f Downgrade windows 2019 build to older image. (#12014) Summary: This should fix failed java windows build https://github.com/facebook/rocksdb/issues/12013 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12014 Reviewed By: ajkr Differential Revision: D50664503 Pulled By: akankshamahajan15 fbshipit-source-id: 3466ce42d3cae3f8e0beba88a18744d647a32a2c 25 October 2023, 22:43:05 UTC
0f14135 Fix race between flush error recovery and db destruction (#12002) Summary: **Context:** DB destruction will wait for ongoing error recovery through `EndAutoRecovery()` and join the recovery thread: https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/db_impl/db_impl.cc#L525 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L250 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L808-L823 However, due to a race between flush error recovery and db destruction, recovery can actually start after such wait during the db shutdown. The consequence is that the recovery thread created as part of this recovery will not be properly joined upon its destruction as part the db destruction. It then crashes the program as below. ``` std::terminate() std::default_delete<std::thread>::operator()(std::thread*) const std::unique_ptr<std::thread, std::default_delete<std::thread>>::~unique_ptr() rocksdb::ErrorHandler::~ErrorHandler() (rocksdb/db/error_handler.h:31) rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725) rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725) rocksdb::DBTestBase::Close() (rocksdb/db/db_test_util.cc:678) ``` **Summary:** This PR fixed it by considering whether EndAutoRecovery() has been called before creating such thread. This fix is similar to how we currently [handle](https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L688-L694) such case inside the created recovery thread. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12002 Test Plan: A new UT repro-ed the crash before this fix and and pass after. Reviewed By: ajkr Differential Revision: D50586191 Pulled By: hx235 fbshipit-source-id: b372f6d7a94eadee4b9283b826cc5fb81779a093 25 October 2023, 18:59:09 UTC
f2c9075 Fix dead loop with kSkipAnyCorruptedRecords mode selected in some cases (#11955) (#11979) Summary: With fragmented record span across multiple blocks, if any following blocks corrupted with arbitary data, and intepreted log number less than the current log number, program will fall into infinite loop due to not skipping buffer leading bytes Pull Request resolved: https://github.com/facebook/rocksdb/pull/11979 Test Plan: existing unit tests Reviewed By: ajkr Differential Revision: D50604408 Pulled By: jowlyzhang fbshipit-source-id: e50a0c7e7c3d293fb9d5afec0a3eb4a1835b7a3b 25 October 2023, 16:16:24 UTC
dc87847 Fix windows build errors (rdtsc and fnptr) (#12008) Summary: Combining best parts of https://github.com/facebook/rocksdb/issues/11794 and https://github.com/facebook/rocksdb/issues/11766, fixing the CircleCI config in the latter. I was going to amend the latter but wasn't granted access. Ideally this would be ported back to 8.4 branch and crc32c part into 8.3 branch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12008 Test Plan: CI Reviewed By: hx235 Differential Revision: D50616172 Pulled By: pdillinger fbshipit-source-id: fa7f778bc281e881a140522e774f480c6d1e5f48 24 October 2023, 23:20:37 UTC
0ff7665 Fix low priority write may cause crash when it is rate limited (#11932) Summary: Fixed https://github.com/facebook/rocksdb/issues/11902 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11932 Reviewed By: akankshamahajan15 Differential Revision: D50573356 Pulled By: hx235 fbshipit-source-id: adeb1abdc43b523b0357746055ce4a2eabde56a1 24 October 2023, 21:41:46 UTC
ab15d33 Update history, version and format testing for 8.8 (#12004) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12004 Reviewed By: cbi42 Differential Revision: D50586984 Pulled By: hx235 fbshipit-source-id: 1480a8c2757340ebf83510557104aaa0e437b3ae 24 October 2023, 19:03:07 UTC
917fd87 Error out in case of std errors in blackbox test and export file in TARGETS Summary: - Right now in blackbox test we don't exit if there are std::error as we do in whitebox crash tests. As result those errors are swallowed. It only errors out if state is unexpected. One example that was noticed in blackbox crash test - ``` stderr has error message: ***Error restoring historical expected values: Corruption: DB is older than any restorable expected state*** Running db_stress with pid=30454: /packages/rocksdb_db_stress_internal_repo/rocks_db_stress .... ``` - This diff also provided support to export files - db_crashtest.py file to be used by different repo. Reviewed By: ajkr Differential Revision: D50564889 fbshipit-source-id: 7bafbbc6179dc79467ca2b680fe83afc7850616a 24 October 2023, 18:46:18 UTC
99b371b Skip subsequent trace writes after encountering trace write failure (#11996) Summary: **Context/Summary:** We ignore trace writing status e.g, https://github.com/facebook/rocksdb/blob/543191f2eacadf14e3aa6ff9a08f85a8ad82da95/db/db_impl/db_impl_write.cc#L221-L222 If a write into the trace file fails, subsequent trace write will continue onto the same file. This will trigger the assertion `assert(sync_without_flush_called_)` intended to catch write to a file that has previously seen error, added in https://github.com/facebook/rocksdb/pull/10489, https://github.com/facebook/rocksdb/pull/10555 Alternative (rejected) is to handle trace writing status at a higher level at e.g, https://github.com/facebook/rocksdb/blob/543191f2eacadf14e3aa6ff9a08f85a8ad82da95/db/db_impl/db_impl_write.cc#L221-L222. However, it makes sense to ignore such status considering tracing is not a critical but assistant component to db operation. And this alternative requires more code change. So it's better to handle the failure at a lower level as this PR Pull Request resolved: https://github.com/facebook/rocksdb/pull/11996 Test Plan: Add new UT failed before this PR and pass after Reviewed By: akankshamahajan15 Differential Revision: D50532467 Pulled By: hx235 fbshipit-source-id: f2032abafd94917adbf89a20841d15b448782a33 24 October 2023, 16:58:02 UTC
519f2a4 Add cache_bench to buck build (#11990) Summary: as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/11990 Test Plan: buck build in fbcode Reviewed By: hx235 Differential Revision: D50502851 Pulled By: pdillinger fbshipit-source-id: b046e4d8b90f1496e5a134faf2b936dec10922de 23 October 2023, 22:12:36 UTC
e81393e Add some stats to observe the usefulness of scan prefetching (#11981) Summary: Add stats for better observability of scan prefetching. Its only implemented for sync scan right now. These stats can help inform future improvements in scan prefetching. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11981 Test Plan: Add a new unit test Reviewed By: akankshamahajan15 Differential Revision: D50516505 Pulled By: anand1976 fbshipit-source-id: cb1cc6cf02df8295930a49c62b11870020df3f97 23 October 2023, 21:42:44 UTC
4d9f973 Disable AutoHCC in crash test (#12000) Summary: ... until I can reproduce and resolve assertion failures (mostly in PurgeImplLocked) seen in crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12000 Test Plan: make blackbox_crash_test Reviewed By: hx235 Differential Revision: D50565984 Pulled By: pdillinger fbshipit-source-id: 5eea1638ff2683c41b4f65ee1ffc2398071911e7 23 October 2023, 19:23:13 UTC
4155087 Use manifest to persist pre-allocated seqnos (#11995) Summary: ... and other fixes for crash test after https://github.com/facebook/rocksdb/issues/11922. * When pre-allocating sequence numbers for establishing a time history, record that last sequence number in the manifest so that it is (most likely) restored on recovery even if no user writes were made or were recovered (e.g. no WAL). * When pre-allocating sequence numbers for establishing a time history, only do this for actually new DBs. * Remove the feature that ensures non-zero sequence number on creating the first column family with preserve/preclude option after initial DB::Open. Until fixed in a way compatible with the crash test, this creates a gap where some data written with active preserve/preclude option won't have a known associated time. Together, these ensure we don't upset the crash test by manipulating sequence numbers after initial DB creation (esp when re-opening with different options). (The crash test expects that the seqno after re-open corresponds to a known point in time from previous crash test operation, matching an expected DB state.) Follow-up work: * Re-fill the gap to ensure all data written under preserve/preclude settings have a known time estimate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11995 Test Plan: Added to unit test SeqnoTimeTablePropTest.PrePopulateInDB Verified fixes two crash test scenarios: ## 1st reproducer First apply ``` diff --git a/db_stress_tool/expected_state.cc b/db_stress_tool/expected_state.cc index b483e154c..ef63b8d6c 100644 --- a/db_stress_tool/expected_state.cc +++ b/db_stress_tool/expected_state.cc @@ -333,6 +333,7 @@ Status FileExpectedStateManager::SaveAtAndAfter(DB* db) { s = NewFileTraceWriter(Env::Default(), soptions, trace_file_path, &trace_writer); } + if (getenv("CRASH")) assert(false); if (s.ok()) { TraceOptions trace_opts; trace_opts.filter |= kTraceFilterGet; ``` Then ``` mkdir -p /dev/shm/rocksdb_test/rocksdb_crashtest_expected mkdir -p /dev/shm/rocksdb_test/rocksdb_crashtest_whitebox rm -rf /dev/shm/rocksdb_test/rocksdb_crashtest_*/* CRASH=1 ./db_stress --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --destroy_db_initially=1 --manual_wal_flush_one_in=1000000 --clear_column_family_one_in=0 --preserve_internal_time_seconds=36000 ./db_stress --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --destroy_db_initially=0 --manual_wal_flush_one_in=1000000 --clear_column_family_one_in=0 --preserve_internal_time_seconds=0 ``` Without the fix you get ``` ... DB path: [/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox] (Re-)verified 34 unique IDs Error restoring historical expected values: Corruption: DB is older than any restorable expected state ``` ## 2nd reproducer First apply ``` diff --git a/db_stress_tool/db_stress_test_base.cc b/db_stress_tool/db_stress_test_base.cc index 62ddead7b..f2654980f 100644 --- a/db_stress_tool/db_stress_test_base.cc +++ b/db_stress_tool/db_stress_test_base.cc @@ -1126,6 +1126,7 @@ void StressTest::OperateDb(ThreadState* thread) { // OPERATION write TestPut(thread, write_opts, read_opts, rand_column_families, rand_keys, value); + if (getenv("CRASH")) assert(false); } else if (prob_op < del_bound) { assert(write_bound <= prob_op); // OPERATION delete ``` Then ``` rm -rf /dev/shm/rocksdb_test/rocksdb_crashtest_*/* CRASH=1 ./db_stress --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --destroy_db_initially=1 --manual_wal_flush_one_in=1000000 --clear_column_family_one_in=0 --disable_wal=1 --reopen=0 --preserve_internal_time_seconds=0 ./db_stress --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --destroy_db_initially=0 --manual_wal_flush_one_in=1000000 --clear_column_family_one_in=0 --disable_wal=1 --reopen=0 --preserve_internal_time_seconds=3600 ``` Without the fix you get ``` DB path: [/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox] (Re-)verified 34 unique IDs db_stress: db_stress_tool/expected_state.cc:380: virtual rocksdb::{anonymous}::ExpectedStateTraceRecordHandler::~ ExpectedStateTraceRecordHandler(): Assertion `IsDone()' failed. ``` Reviewed By: jowlyzhang Differential Revision: D50533346 Pulled By: pdillinger fbshipit-source-id: 1056be45c5b9e537c8c601b28c4b27431a782477 23 October 2023, 16:20:59 UTC
543191f Add bounds checking to WBWIIteratorImpl and respect bounds of ReadOptions in Transaction (#11680) Summary: Fix https://github.com/facebook/rocksdb/issues/11607 Fix https://github.com/facebook/rocksdb/issues/11679 Fix https://github.com/facebook/rocksdb/issues/11606 Fix https://github.com/facebook/rocksdb/issues/2343 Add bounds checking to `WBWIIteratorImpl`, which will be reflected in `BaseDeltaIterator::delta_iterator_::Valid()`, just like `BaseDeltaIterator::base_iterator_::Valid()`. In this way, the two sub itertors become more aligned from `BaseDeltaIterator`'s perspective. Like `DBIter`, the added bounds checking caps in either bound when seeking and disvalidates the `WBWIIteratorImpl` iterator when the lower bound is past or the upper bound is reached. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11680 Test Plan: - A simple test added to write_batch_with_index_test.cc to exercise the bounds checking in `WBWIIteratorImpl`. - A sophisticated test added to transaction_test.cc to assert that `Transaction` with different write policies honor bounds in `ReadOptions`. It should be so as long as the `BaseDeltaIterator` is correctly coordinating the two sub iterators to perform iterating and bounds checking. Reviewed By: ajkr Differential Revision: D48125229 Pulled By: cbi42 fbshipit-source-id: c9acea52595aed1471a63d7ca6ef15d2a2af1367 20 October 2023, 20:28:28 UTC
d7567d5 Update libs for RocksJava Static build (#9304) Summary: Updates ZStd and Snappy to the latest versions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9304 Reviewed By: ajkr Differential Revision: D33176708 Pulled By: cbi42 fbshipit-source-id: eb50db50557c433e19fcc7c2874329d1d6cba93f 20 October 2023, 17:38:27 UTC
ef0c3f0 Fix rare destructor bug in AutoHCC (#11988) Summary: and some other small enhancements/fixes: * The main bug fixed is that in some rare cases, the "published" table size might be smaller than the actual table size. This is a transient state that can happen with concurrent growth that is normally fixed after enough insertions, but if the cache is destroyed soon enough after growth, it could fail to fully destroy some entries and cause assertion failures. We can fix this by detecting the true table size in the destructor. * Change the "too many iterations" debug threshold from 512 to 768. We might have hit at least one false positive failure. (Failed despite legitimate operation.) * Added some stronger assertions in some places to aid in debugging. * Use COERCE_CONTEXT_SWITCH to make behavior of Grow less predictable in terms of thread interleaving. (Might add in more places.) This was useful in reproducing the destructor bug. * Fix some comments with typos or that were based on earlier revisions of the code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11988 Test Plan: Variants of this bug-finding command: ``` USE_CLANG=1 COMPILE_WITH_ASAN=1 COMPILE_WITH_UBSAN=1 COERCE_CONTEXT_SWITCH=1 DEBUG_LEVEL=2 make -j32 cache_bench && while ROCKSDB_DEBUG=1 ./cache_bench -cache_type=auto_hyper_clock_cache -histograms=0 -cache_size=80000000 -threads=32 -populate_cache=0 -ops_per_thread=1000 -num_shard_bits=0; do :; done ``` Reviewed By: jowlyzhang Differential Revision: D50470318 Pulled By: pdillinger fbshipit-source-id: d407a8bb0b6d2ddc598a954c319a1640136f12f2 19 October 2023, 21:51:22 UTC
2e514e4 Fix copyright header (#11986) Summary: Add missing copyright header Pull Request resolved: https://github.com/facebook/rocksdb/pull/11986 Test Plan: N/A Reviewed By: hx235 Differential Revision: D50461904 Pulled By: jaykorean fbshipit-source-id: b1b2704890f4a0bb3c9b464b01468e85a5365a8e 19 October 2023, 17:30:54 UTC
0836a2b New tickers on deletion compactions grouped by reasons (#11957) Summary: Context/Summary: as titled Pull Request resolved: https://github.com/facebook/rocksdb/pull/11957 Test Plan: piggyback on existing tests; fixed a failed test due to adding new stats Reviewed By: ajkr, cbi42 Differential Revision: D50294310 Pulled By: hx235 fbshipit-source-id: d99b97ebac41efc1bdeaf9ca7a1debd2927d54cd 19 October 2023, 01:00:07 UTC
5559001 Add RocksJava tests to CMake (#11756) Summary: Refactor CMake build to allow run java tests via CMake, including Windows. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11756 Reviewed By: hx235 Differential Revision: D50370739 Pulled By: akankshamahajan15 fbshipit-source-id: ae05cc08a0f9bb2a0a4f1ece02c523fb465bb817 18 October 2023, 19:51:50 UTC
a80e3f6 Add keyExists Java API (#11705) Summary: Add a new method to check if a key exists in the database. It avoids copying data between C++ and Java code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11705 Reviewed By: ajkr Differential Revision: D50370934 Pulled By: akankshamahajan15 fbshipit-source-id: ab2d42213fbebcaff919b0ffbbef9d45e88ca365 18 October 2023, 19:46:35 UTC
0bb3a26 Lazy load native library in Statistics constructor. (#11953) Summary: Should fix https://github.com/facebook/rocksdb/issues/9667 and follow the same appropoach as https://github.com/facebook/rocksdb/issues/11919 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11953 Reviewed By: hx235 Differential Revision: D50307456 Pulled By: ajkr fbshipit-source-id: 43e7e671e8b04875185b38284cefd4c3e11981fa 18 October 2023, 18:01:04 UTC
d5bc30b Enforce status checking after Valid() returns false for IteratorWrapper (#11975) Summary: ... when compiled with ASSERT_STATUS_CHECKED = 1. The main change is in iterator_wrapper.h. The remaining changes are just fixing existing unit tests. Adding this check to IteratorWrapper gives a good coverage as the class is used in many places, including child iterators under merging iterator, merging iterator under DB iter, file_iter under level iterator, etc. This change can catch the bug fixed in https://github.com/facebook/rocksdb/issues/11782. Future follow up: enable `ASSERT_STATUS_CHECKED=1` for stress test and for DEBUG_LEVEL=0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11975 Test Plan: * `ASSERT_STATUS_CHECKED=1 DEBUG_LEVEL=2 make -j32 J=32 check` * I tried to run stress test with `ASSERT_STATUS_CHECKED=1`, but there are a lot of existing stress code that ignore status checking, and fail without the change in this PR. So defer that to a follow up task. Reviewed By: ajkr Differential Revision: D50383790 Pulled By: cbi42 fbshipit-source-id: 1a28ce0f5fdf1890f93400b26b3b1b3a287624ce 18 October 2023, 16:38:38 UTC
4226693 Remove documentation that marks user-defined timestamps feature as experimental (#11974) Summary: As titled. The most notable place that marks the feature as experimental is its wiki page. That was updated. And this PR removes the experimental marker from a few places for this feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11974 Reviewed By: ltamasi Differential Revision: D50383640 Pulled By: jowlyzhang fbshipit-source-id: 0bfe26ceda0793515f54b602cf3cd13d0737ec25 17 October 2023, 22:25:40 UTC
933ee29 Fix a race condition between recovery and backup (#11955) Summary: A race condition between recovery and backup can happen with error messages like this: ```Failure in BackupEngine::CreateNewBackup with: IO error: No such file or directory: While opening a file for sequentially reading: /dev/shm/rocksdb_test/rocksdb_crashtest_whitebox/002653.log: No such file or directory``` PR https://github.com/facebook/rocksdb/issues/6949 introduced disabling file deletion during error handling of manifest IO errors. Aformentioned race condition is caused by this chain of event: [Backup engine] disable file deletion [Recovery] disable file deletion <= this is optional for the race condition, it may or may not get called [Backup engine] get list of file to copy/link [Recovery] force enable file deletion .... some files refered by backup engine get deleted [Backup engine] copy/link file <= error no file found This PR fixes this with: 1) Recovery thread is currently forcing enabling file deletion as long as file deletion is disabled. Regardless of whether the previous error handling is for manifest IO error and that disabled it in the first place. This means it could incorrectly enabling file deletions intended by other threads like backup threads, file snapshotting threads. This PR does this check explicitly before making the call. 2) `disable_delete_obsolete_files_` is designed as a counter to allow different threads to enable and disable file deletion separately. The recovery thread currently does a force enable file deletion, because `ErrorHandler::SetBGError()` can be called multiple times by different threads when they receive a manifest IO error(details per PR https://github.com/facebook/rocksdb/issues/6949), resulting in `DBImpl::DisableFileDeletions` to be called multiple times too. Making a force enable file deletion call that resets the counter `disable_delete_obsolete_files_` to zero is a workaround for this. However, as it shows in the race condition, it can incorrectly suppress other threads like a backup thread's intention to keep the file deletion disabled. <strike>This PR adds a `std::atomic<int> disable_file_deletion_count_` to the error handler to track the needed counter decrease more precisely</strike>. This PR tracks and caps file deletion enabling/disabling in error handler. 3) for recovery, the section to find obsolete files and purge them was moved to be done after the attempt to enable file deletion. The actual finding and purging is more likely to happen if file deletion was previously disabled and get re-enabled now. An internal function `DBImpl::EnableFileDeletionsWithLock` was added to support change 2) and 3). Some useful logging was explicitly added to keep those log messages around. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11955 Test Plan: existing unit tests Reviewed By: anand1976 Differential Revision: D50290592 Pulled By: jowlyzhang fbshipit-source-id: 73aa8331ca4d636955a5b0324b1e104a26e00c9b 17 October 2023, 20:18:04 UTC
9135a61 Fix corruption error in stress test for auto_readahead_size enabled (#11961) Summary: Fix corruption error - "Corruption: first key in index doesn't match first key in block". when auto_readahead_size is enabled. Error is because of bug when index_iter_ moves forward, first_internal_key of that index_iter_ is not copied. So the Slice points to a different key resulting in wrong comparison when doing comparison. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11961 Test Plan: Ran stress test which reproduced this error. Reviewed By: anand1976 Differential Revision: D50310589 Pulled By: akankshamahajan15 fbshipit-source-id: 95d8320b8388f1e3822c32024f84754f3a20a631 17 October 2023, 19:21:08 UTC
2296c62 Perform java static checks in CI (#11221) Summary: Integrate pmd on the Java API to catch and report common Java coding problems; fix or suppress a basic set of PMD checks. Link pmd into java build / CI Add a pmd dependency to maven Add a jpmd target to Makefile which runs pmd Add a workflow to Circle CI which runs pmd Configure an initial default pmd for CI Repair lots of very simple PMD reports generated when we apply pmd-rules.xml Repair or exception for PMD rules in the CloseResource category, which finds unclosed AutoCloseable resources. We special-case the configuration of CloseResource and use the // NOPMD comment in source the avoid reports where we are the API creating an AutoCloseable, and therefore returning an unclosed resource is correct behaviour. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11221 Reviewed By: akankshamahajan15 Differential Revision: D50369930 Pulled By: jowlyzhang fbshipit-source-id: a41c36b44b3bab7644df3e9cc16afbdf33b84f6b 17 October 2023, 17:04:35 UTC
84af7cf Sanitize db_stress arguments when secondary_cache_uri is not empty (#11967) Summary: When `secondary_cache_uri` is non-empty and the `cache_type` is not a tiered cache, then sanitize `compressed_secondary_cache_size` to 0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11967 Test Plan: Run crash test Reviewed By: akankshamahajan15 Differential Revision: D50346157 Pulled By: anand1976 fbshipit-source-id: 57bcbad2ec81fa736f1539a0a41ed6854ded2077 17 October 2023, 00:28:36 UTC
018eede Remove assertion from PrefetchAsync (#11965) Summary: Remove assertion from PrefetchAsync (roundup_len2 >= alignment) as for non direct_io, buffer size can be less than alignment resulting in assertion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11965 Test Plan: Ran the issue causing db_stress without this assertion and the verification completes successfully. Reviewed By: anand1976 Differential Revision: D50328955 Pulled By: akankshamahajan15 fbshipit-source-id: 65f55ca230d2bbc63f4e2cc34c7273b22b515879 16 October 2023, 22:14:58 UTC
25d4379 Make rate limiter single burst bytes runtime changeable (#11923) Summary: Context/Summary: as titled Pull Request resolved: https://github.com/facebook/rocksdb/pull/11923 Test Plan: new UT Reviewed By: ajkr Differential Revision: D49941161 Pulled By: hx235 fbshipit-source-id: f75a4d07f3cdd86863ea22c57f2bcd3a621baaf3 16 October 2023, 17:21:35 UTC
2fd850c Remove write queue synchronization from WriteOptionsFile (#11951) Summary: This has become obsolete with the new `options_mutex_` in https://github.com/facebook/rocksdb/pull/11929 * Remove now-unnecessary parameter from WriteOptionsFile * Rename (and negate) other parameter for better clarity (the caller shouldn't tell the callee what the callee needs, just what the caller knows, provides, and requests) * Move a ROCKS_LOG_WARN (I/O) in WriteOptionsFile to outside of holding DB mutex. * Also *avoid* (but not always eliminate) write queue synchronization in SetDBOptions. Still needed if there was a change to WAL size limit or other configuration. * Improve some comments Pull Request resolved: https://github.com/facebook/rocksdb/pull/11951 Test Plan: existing unit tests and TSAN crash test local run Reviewed By: ajkr Differential Revision: D50247904 Pulled By: pdillinger fbshipit-source-id: 7dfe445c705ec013886a2adb7c50abe50d83af69 16 October 2023, 15:58:47 UTC
9ded9f7 Fix db_stress FaultInjectionTestFS set up before DB open (#11958) Summary: We saw frequent stress test failures with error messages like: ``` Verification failed for column family 0 key ...: value_from_db: , value_from_expected: ..., msg: GetEntity verification: Value not found: NotFound: ``` One cause for this is that data in WAL is lost after a crash. We initialize FaultInjectionTestFS to be not direct writable when write_fault_injection is enabled (see code change). This can cause the first WAL created during DB open to be lost if a db_stress is killed before the first WAL is synced. This PR initializes FaultInjectionTestFS to be direct writable. Note that FaultInjectionTestFS will be configured propertly for write fault injection after DB open in `RunStressTestImpl()`. So this change should not affect write fault injection coverage. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11958 Test Plan: a repro for the above bug: ``` Simulate crash before first WAL is sealed: --- a/db_stress_tool/db_stress_driver.cc +++ b/db_stress_tool/db_stress_driver.cc @@ -256,6 +256,7 @@ bool RunStressTestImpl(SharedState* shared) { fprintf(stderr, "Verification failed :(\n"); return false; } + exit(1); return true; } ./db_stress --clear_column_family_one_in=0 --column_families=1 --preserve_internal_time_seconds=60 --destroy_db_initially=0 --db=/dev/shm/rocksdb_crashtest_blackbox --db_write_buffer_size=2097152 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --reopen=0 --test_batches_snapshots=0 --threads=1 --ops_per_thread=100 --write_fault_one_in=1000 --sync_fault_injection=0 ./db_stress_main --clear_column_family_one_in=0 --column_families=1 --preserve_internal_time_seconds=60 --destroy_db_initially=0 --db=/dev/shm/rocksdb_crashtest_blackbox --db_write_buffer_size=2097152 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --reopen=0 --test_batches_snapshots=0 --sync_fault_injection=1 ``` Reviewed By: akankshamahajan15 Differential Revision: D50300347 Pulled By: cbi42 fbshipit-source-id: 3a4881d72197f5ece82364382a0100912e16c2d6 14 October 2023, 20:33:55 UTC
f3aef8c Add write operation to tracer only after successful callback (#11954) Summary: We saw optimistic transaction stress test failures like the following: ``` Verification failed for column family 0 key 000000000001E9AF000000000000012B00000000000000B5 (12535491): value_from_db: 010000000504070609080B0A0D0C0F0E111013121514171619181B1A1D1C1F1E212023222524272629282B2A2D2C2F2E313033323534373639383B3A3D3C3F3E, value_from_expected: , msg: Iterator verification: Unexpected value found``` ``` With ajkr's repro (see test plan), I found that we record duplicated writes to tracer when an optimistic transaction conflict checking fails. This PR fixes it by checking callback status before record a write operation to tracer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11954 Test Plan: this reproduces the failure consistently ``` #!/bin/bash db=/dev/shm/rocksdb_crashtest_blackbox exp=/dev/shm/rocksdb_crashtest_expected rm -rf $db $exp && mkdir -p $exp && while ./db_stress \ --atomic_flush=1 \ --clear_column_family_one_in=0 \ --db=$db \ --db_write_buffer_size=2097152 \ --delpercent=0 \ --delrangepercent=0 \ --destroy_db_initially=0 \ --disable_wal=1 \ --expected_values_dir=$exp \ --iterpercent=0 \ --max_bytes_for_level_base=2097152 \ --max_key=250000 \ --memtable_prefix_bloom_size_ratio=0.5 \ --memtable_whole_key_filtering=1 \ --occ_lock_bucket_count=100 \ --occ_validation_policy=0 \ --ops_per_thread=10 \ --prefixpercent=0 \ --readpercent=0 \ --reopen=0 \ --target_file_size_base=524288 \ --test_batches_snapshots=0 \ --use_optimistic_txn=1 \ --use_txn=1 \ --value_size_mult=32 \ --write_buffer_size=524288 \ --writepercent=100 ; do : ; done ``` Reviewed By: akankshamahajan15 Differential Revision: D50284976 Pulled By: cbi42 fbshipit-source-id: 793e3cee186c8b4f406b29166efd8d9028695206 14 October 2023, 19:00:31 UTC
50b0879 Do not fail stress test when file ingestion return injected error (#11956) Summary: Currently, if file ingestion hit injected error, stress test is considered failed since it prints a message to stderr containing the keyword "error" and db_crashtest.py looks for it in stderr. This PR fixes it by print injected error to stdout. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11956 Test Plan: Check future stress test runs. Reviewed By: akankshamahajan15 Differential Revision: D50293537 Pulled By: cbi42 fbshipit-source-id: e74915b1b3c6876a61ab6933c4529780362ec02b 14 October 2023, 17:08:03 UTC
back to top