Revision history - refs/tags/v6.5.3 - origin: https://github.com/facebook/rocksdb

visit type:

Revision	Author	Date	Message	Commit Date
f48aa1c	Levi Tamasi	10 January 2020, 17:58:14 UTC	Bump up version to 6.5.3	10 January 2020, 17:58:14 UTC
e7d7b10	Levi Tamasi	16 December 2019, 23:15:42 UTC	Update HISTORY.md with the recent memtable trimming fixes Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6194 Differential Revision: D19125292 Pulled By: ltamasi fbshipit-source-id: d41aca2755ec4bec07feedd6b561e8d18606a931	10 January 2020, 17:57:50 UTC
c0a5673	Levi Tamasi	16 December 2019, 21:13:42 UTC	Fix a data race related to memtable trimming (#6187) Summary: https://github.com/facebook/rocksdb/pull/6177 introduced a data race involving `MemTableList::InstallNewVersion` and `MemTableList::NumFlushed`. The patch fixes this by caching whether the current version has any memtable history (i.e. flushed memtables that are kept around for transaction conflict checking) in an `std::atomic<bool>` member called `current_has_history_`, similarly to how `current_memory_usage_excluding_last_` is handled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6187 Test Plan: ``` make clean COMPILE_WITH_TSAN=1 make db_test -j24 ./db_test ``` Differential Revision: D19084059 Pulled By: ltamasi fbshipit-source-id: 327a5af9700fb7102baea2cc8903c085f69543b9	10 January 2020, 17:54:56 UTC
8f121d8	Levi Tamasi	14 December 2019, 03:09:53 UTC	Do not schedule memtable trimming if there is no history (#6177) Summary: We have observed an increase in CPU load caused by frequent calls to `ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory` when using `max_write_buffer_size_to_maintain` to limit the amount of memtable history maintained for transaction conflict checking. Part of the issue is that trimming can potentially be scheduled even if there is no memtable history. The patch adds a check that fixes this. See also https://github.com/facebook/rocksdb/pull/6169. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6177 Test Plan: Compared `perf` output for ``` ./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32 ``` before and after the change. There is a significant reduction for the call chain `rocksdb::DBImpl::TrimMemtableHistory` -> `rocksdb::ColumnFamilyData::InstallSuperVersion` -> `rocksdb::ThreadLocalPtr::StaticMeta::Scrape` even without https://github.com/facebook/rocksdb/pull/6169. Differential Revision: D19057445 Pulled By: ltamasi fbshipit-source-id: dff81882d7b280e17eda7d9b072a2d4882c50f79	10 January 2020, 17:53:31 UTC
afbfb3f	Levi Tamasi	13 December 2019, 20:45:49 UTC	Do not create/install new SuperVersion if nothing was deleted during memtable trim (#6169) Summary: We have observed an increase in CPU load caused by frequent calls to `ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory` when using `max_write_buffer_size_to_maintain` to limit the amount of memtable history maintained for transaction conflict checking. As it turns out, this is caused by the code creating and installing a new `SuperVersion` even if no memtables were actually trimmed. The patch adds a check to avoid this. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6169 Test Plan: Compared `perf` output for ``` ./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32 ``` before and after the change. With the fix, the call chain `rocksdb::DBImpl::TrimMemtableHistory` -> `rocksdb::ColumnFamilyData::InstallSuperVersion` -> `rocksdb::ThreadLocalPtr::StaticMeta::Scrape` no longer registers in the `perf` report. Differential Revision: D19031509 Pulled By: ltamasi fbshipit-source-id: 02686fce594e5b50eba0710e4b28a9b808c8aa20	10 January 2020, 17:53:31 UTC
4cfbd87	Fosco Marotto	15 November 2019, 21:56:29 UTC	Update history and version for 6.5.2	15 November 2019, 21:56:29 UTC
8a72bb1	Peter Dillinger	14 November 2019, 14:18:23 UTC	More fixes to auto-GarbageCollect in BackupEngine (#6023) Summary: Production: * Fixes GarbageCollect (and auto-GC triggered by PurgeOldBackups, DeleteBackup, or CreateNewBackup) to clean up backup directory independent of current settings (except max_valid_backups_to_open; see issue https://github.com/facebook/rocksdb/issues/4997) and prior settings used with same backup directory. * Fixes GarbageCollect (and auto-GC) not to attempt to remove "." and ".." entries from directories. * Clarifies contract with users in modifying BackupEngine operations. In short, leftovers from any incomplete operation are cleaned up by any subsequent call to that same kind of operation (PurgeOldBackups and DeleteBackup considered the same kind of operation). GarbageCollect is available to clean up after all kinds. (NB: right now PurgeOldBackups and DeleteBackup will clean up after incomplete CreateNewBackup, but we aren't promising to continue that behavior.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6023 Test Plan: * Refactors open parameters to use an option enum, for readability, etc. (Also fixes an unused parameter bug in the redundant OpenDBAndBackupEngineShareWithChecksum.) * Fixes an apparent bug in ShareTableFilesWithChecksumsTransition in which old backup data was destroyed in the transition to be tested. That test is now augmented to ensure GarbageCollect (or auto-GC) does not remove shared files when BackupEngine is opened with share_table_files=false. * Augments DeleteTmpFiles test to ensure that CreateNewBackup does auto-GC when an incompletely created backup is detected. Differential Revision: D18453559 Pulled By: pdillinger fbshipit-source-id: 5e54e7b08d711b161bc9c656181012b69a8feac4	15 November 2019, 20:15:12 UTC
a6d4183	Peter Dillinger	09 November 2019, 03:13:41 UTC	Auto-GarbageCollect on PurgeOldBackups and DeleteBackup (#6015) Summary: Only if there is a crash, power failure, or I/O error in DeleteBackup, shared or private files from the backup might be left behind that are not cleaned up by PurgeOldBackups or DeleteBackup-- only by GarbageCollect. This makes the BackupEngine API "leaky by default." Even if it means a modest performance hit, I think we should make Delete and Purge do as they say, with ongoing best effort: i.e. future calls will attempt to finish any incomplete work from earlier calls. This change does that by having DeleteBackup and PurgeOldBackups do a GarbageCollect, unless (to minimize performance hit) this BackupEngine has already done a GarbageCollect and there have been no deletion-related I/O errors in that GarbageCollect or since then. Rejected alternative 1: remove meta file last instead of first. This would in theory turn partially deleted backups into corrupted backups, but code changes would be needed to allow the missing files and consider it acceptably corrupt, rather than failing to open the BackupEngine. This might be a reasonable choice, but I mostly rejected it because it doesn't solve the legacy problem of cleaning up existing lingering files. Rejected alternative 2: use a deletion marker file. If deletion started with creating a file that marks a backup as flagged for deletion, then we could reliably detect partially deleted backups and efficiently finish removing them. In addition to not solving the legacy problem, this could be precarious if there's a disk full situation, and we try to create a new file in order to delete some files. Ugh. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6015 Test Plan: Updated unit tests Differential Revision: D18401333 Pulled By: pdillinger fbshipit-source-id: 12944e372ce6809f3f5a4c416c3b321a8927d925	15 November 2019, 20:15:12 UTC
cb1dc29	anand76	12 November 2019, 00:57:49 UTC	Fix a buffer overrun problem in BlockBasedTable::MultiGet (#6014) Summary: The calculation in BlockBasedTable::MultiGet for the required buffer length for reading in compressed blocks is incorrect. It needs to take the 5-byte block trailer into account. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6014 Test Plan: Add a unit test DBBasicTest.MultiGetBufferOverrun that fails in asan_check before the fix, and passes after. Differential Revision: D18412753 Pulled By: anand1976 fbshipit-source-id: 754dfb66be1d5f161a7efdf87be872198c7e3b72	12 November 2019, 18:57:32 UTC
98e5189	anand76	07 November 2019, 20:00:45 UTC	Fix MultiGet crash when no_block_cache is set (#5991) Summary: This PR fixes https://github.com/facebook/rocksdb/issues/5975. In ```BlockBasedTable::RetrieveMultipleBlocks()```, we were calling ```MaybeReadBlocksAndLoadToCache()```, which is a no-op if neither uncompressed nor compressed block cache are configured. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5991 Test Plan: 1. Add unit tests that fail with the old code and pass with the new 2. make check and asan_check Cc spetrunia Differential Revision: D18272744 Pulled By: anand1976 fbshipit-source-id: e62fa6090d1a6adf84fcd51dfd6859b03c6aebfe	12 November 2019, 18:56:03 UTC
3353b71	Vijay Nadimpalli	21 October 2019, 19:07:58 UTC	Making platform 007 (gcc 7) default in build_detect_platform.sh (#5947) Summary: Making platform 007 (gcc 7) default in build_detect_platform.sh. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5947 Differential Revision: D18038837 Pulled By: vjnadimpalli fbshipit-source-id: 9ac2ddaa93bf328a416faec028970e039886378e	30 October 2019, 17:32:53 UTC
d72cceb	sdong	21 October 2019, 18:37:09 UTC	Fix VerifyChecksum readahead with mmap mode (#5945) Summary: A recent change introduced readahead inside VerifyChecksum(). However it is not compatible with mmap mode and generated wrong checksum verification failure. Fix it by not enabling readahead in mmap mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5945 Test Plan: Add a unit test that used to fail. Differential Revision: D18021443 fbshipit-source-id: 6f2eb600f81b26edb02222563a4006869d576bff	22 October 2019, 18:42:36 UTC
1d5083a	myabandeh	16 October 2019, 17:55:02 UTC	Bump up the version to 6.5.1	16 October 2019, 17:55:02 UTC
6ea6aa7	Maysam Yabandeh	16 October 2019, 14:56:51 UTC	Update HISTORY for SeekForPrev bug fix (#5925) Summary: Update history for the bug fix in https://github.com/facebook/rocksdb/pull/5907 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5925 Differential Revision: D17952605 Pulled By: maysamyabandeh fbshipit-source-id: 609afcbb2e4087f9153822c4d11193a75a7b0e7a	16 October 2019, 17:52:59 UTC
4229f6d	Maysam Yabandeh	12 October 2019, 03:28:36 UTC	Fix SeekForPrev bug with Partitioned Filters and Prefix (#5907) Summary: Partition Filters make use of a top-level index to find the partition that might have the bloom hash of the key. The index is with internal key format (before format version 3). Each partition contains the i) blooms of the keys in that range ii) bloom of prefixes of keys in that range, iii) the bloom of the prefix of the last key in the previous partition. When ::SeekForPrev(key), we first perform a prefix bloom test on the SST file. The partition however is identified using the full internal key, rather than the prefix key. The reason is to be compatible with the internal key format of the top-level index. This creates a corner case. Example: - SST k, Partition N: P1K1, P1K2 - SST k, top-level index: P1K2 - SST k+1, Partition 1: P2K1, P3K1 - SST k+1 top-level index: P3K1 When SeekForPrev(P1K3), it should point us to P1K2. However SST k top-level index would reject P1K3 since it is out of range. One possible fix would be to search with the prefix P1 (instead of full internal key P1K3) however the details of properly comparing prefix with full internal key might get complicated. The fix we apply in this PR is to look into the last partition anyway even if the key is out of range. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5907 Differential Revision: D17889918 Pulled By: maysamyabandeh fbshipit-source-id: 169fd7b3c71dbc08808eae5a8340611ebe5bdc1e	16 October 2019, 17:51:46 UTC
73a35c6	anand76	07 October 2019, 23:54:33 UTC	Update HISTORY.md with a bug fix Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:	07 October 2019, 23:54:33 UTC
fc53ac8	anand76	04 October 2019, 03:52:17 UTC	Fix data block upper bound checking for iterator reseek case (#5883) Summary: When an iterator reseek happens with the user specifying a new iterate_upper_bound in ReadOptions, and the new seek position is at the end of the same data block, the Seek() ends up using a stale value of data_block_within_upper_bound_ and may return incorrect results. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5883 Test Plan: Added a new test case DBIteratorTest.IterReseekNewUpperBound. Verified that it failed due to the assertion failure without the fix, and passes with the fix. Differential Revision: D17752740 Pulled By: anand1976 fbshipit-source-id: f9b635ff5d6aeb0e1bef102cf8b2f900efd378e3	07 October 2019, 23:39:18 UTC
2060a00	sdong	01 October 2019, 23:58:47 UTC	Fix a previous revert	01 October 2019, 23:58:47 UTC
8986577	sdong	01 October 2019, 18:20:50 UTC	Revert "Merging iterator to avoid child iterator reseek for some cases (#5286)" (#5871) Summary: This reverts commit 9fad3e21eb90d215b6719097baba417bc1eeca3c. Iterator verification in stress tests sometimes fail for assertion table/block_based/block_based_table_reader.cc:2973: void rocksdb::BlockBasedTableIterator<TBlockIter, TValue>::FindBlockForward() [with TBlockIter = rocksdb::DataBlockIter; TValue = rocksdb::Slice]: Assertion `!next_block_is_out_of_bound \|\| user_comparator_.Compare(*read_options_.iterate_upper_bound, index_iter_->user_key()) <= 0' failed. It is likely to be linked to https://github.com/facebook/rocksdb/pull/5286 together with https://github.com/facebook/rocksdb/pull/5468 as the former PR makes some child iterator's seek being avoided, so that upper bound condition fails to be updated there. Strictly speaking, the former PR was merged before the latter one, but the latter one feels a more important improvement so I choose to revert the former one for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5871 Differential Revision: D17689196 fbshipit-source-id: 4ded5be68f67bee2782d31a29cb72ea68f59dd8c	01 October 2019, 21:41:58 UTC
749b35d	Fosco Marotto	13 September 2019, 18:53:36 UTC	Update history and version for 6.5 branch	13 September 2019, 18:53:36 UTC
6a17172	Peter Dillinger	13 September 2019, 18:04:52 UTC	Clean up + fix build scripts re: USE_SSE= and PORTABLE= (#5800) Summary: In preparing to utilize a new Intel instruction extension, I noticed problems with the existing build script in regard to the existing utilized extensions, either with USE_SSE or PORTABLE flags. * PORTABLE=0 was interpreted the same as PORTABLE=1. Now empty and 0 mean the same. (I guess you were not supposed to set PORTABLE= if you wanted non-portable--except that...) * The Facebook build script extensions would set PORTABLE=1 even if it's already set in a make var or environment. Now it does not override a non-empty setting, so use PORTABLE=0 for fully optimized build, overriding Facebook environment default. * Put in an explanation of the USE_SSE flag where it's used by build_detect_platform, and cleaned up some confusing/redundant associated logic. * If USE_SSE was set and expected intrinsics were not available, build_detect_platform would exit early but build would proceed with broken, incomplete configuration. Now warning is gracefully recovered. * If USE_SSE was set and expected intrinsics were not available, build would still try to use flags like -msse4.2 etc. which could lead to unexpected compilation failure or binary incompatibility. Now those flags are not used if the warning is issued. This should not break or change existing, valid build scripts. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5800 Test Plan: manual case testing Differential Revision: D17369543 Pulled By: pdillinger fbshipit-source-id: 4ee244911680ae71144d272c40aceea548e3ce88	13 September 2019, 18:07:13 UTC
9ba88a1	Lingjing You	13 September 2019, 17:47:47 UTC	Update history.md for option memtable_insert_hint_per_batch (#5799) Summary: Update history.md for option memtable_insert_hint_per_batch Pull Request resolved: https://github.com/facebook/rocksdb/pull/5799 Differential Revision: D17369186 fbshipit-source-id: 71d82f9d99d9a52d1475d1b0153670957b6111e9	13 September 2019, 17:51:32 UTC
27f516a	Ronak Sisodia	13 September 2019, 17:41:32 UTC	Update HISTORY.md for option to make write group size configurable (#5798) Summary: Update HISTORY.md for option to make write group size configurable . Pull Request resolved: https://github.com/facebook/rocksdb/pull/5798 Differential Revision: D17369062 fbshipit-source-id: 390a3fa0b01675e91879486a729cf2cc7624d106	13 September 2019, 17:43:09 UTC
aa2486b	Peter Dillinger	13 September 2019, 17:24:38 UTC	Refactor some confusing logic in PlainTableReader Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5780 Test Plan: existing plain table unit test Differential Revision: D17368629 Pulled By: pdillinger fbshipit-source-id: f25409cdc2f39ebe8d5cbb599cf820270e6b5d26	13 September 2019, 17:26:36 UTC
1a928c2	Lingjing You	12 September 2019, 23:53:31 UTC	Add insert hints for each writebatch (#5728) Summary: Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it. Bench result (qps): `./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4` master: \| batch size \ thread num \| 1 \| 2 \| 4 \| 8 \| \| ----------------------- \| ------- \| ------- \| ------- \| ------- \| \| 1 \| 387883 \| 220790 \| 308294 \| 490998 \| \| 10 \| 1397208 \| 978911 \| 1275684 \| 1733395 \| \| 100 \| 2045414 \| 1589927 \| 1798782 \| 2681039 \| \| 1000 \| 2228038 \| 1698252 \| 1839877 \| 2863490 \| fillseq with writebatch hint: \| batch size \ thread num \| 1 \| 2 \| 4 \| 8 \| \| ----------------------- \| ------- \| ------- \| ------- \| ------- \| \| 1 \| 286005 \| 223570 \| 300024 \| 466981 \| \| 10 \| 970374 \| 813308 \| 1399299 \| 1753588 \| \| 100 \| 1962768 \| 1983023 \| 2676577 \| 3086426 \| \| 1000 \| 2195853 \| 2676782 \| 3231048 \| 3638143 \| Pull Request resolved: https://github.com/facebook/rocksdb/pull/5728 Differential Revision: D17297240 fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c	13 September 2019, 00:15:18 UTC
a378a4c	HouBingjian	12 September 2019, 23:52:41 UTC	arm64 crc prefetch optimise (#5773) Summary: prefetch data for following block，avoid cache miss when doing crc caculate I do performance test at kunpeng-920 server(arm-v8, 64core@2.6GHz) ./db_bench --benchmarks=crc32c --block_size=500000000 before optimise : 587313.500 micros/op 1 ops/sec; 811.9 MB/s (500000000 per op) after optimise : 289248.500 micros/op 3 ops/sec; 1648.5 MB/s (500000000 per op) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5773 Differential Revision: D17347339 fbshipit-source-id: bfcd74f0f0eb4b322b959be68019ddcaae1e3341	12 September 2019, 23:59:44 UTC
d35ffd5	Levi Tamasi	12 September 2019, 19:10:05 UTC	Temporarily disable hash index in stress tests (#5792) Summary: PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash tests, resulting in assertion failures in Block. This patch disables the hash index until we can pinpoint the root cause of these issues. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792 Test Plan: Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2 (binary search and partitioned index). Differential Revision: D17346777 Pulled By: ltamasi fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551	12 September 2019, 19:11:34 UTC
e8c2e68	Adam Retter	12 September 2019, 01:35:03 UTC	Fix RocksDB bug in block_cache_trace_analyzer.cc on Windows (#5786) Summary: This is required to compile on Windows with Visual Studio 2015. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5786 Differential Revision: D17335994 fbshipit-source-id: 8f9568310bc6f697e312b5e24ad465e9084f0011	12 September 2019, 01:36:41 UTC
d05c0fe	Ronak Sisodia	12 September 2019, 01:26:22 UTC	Option to make write group size configurable (#5759) Summary: The max batch size that we can write to the WAL is controlled by a static manner. So if the leader write is less than 128 KB we will have the batch size as leader write size + 128 KB else the limit will be 1 MB. Both of them are statically defined. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5759 Differential Revision: D17329298 fbshipit-source-id: a3d910629d8d8ca84ea39ad89c2b2d284571ded5	12 September 2019, 01:28:33 UTC
9eb3e1f	Shylock Hg	12 September 2019, 01:07:12 UTC	Use delete to disable automatic generated methods. (#5009) Summary: Use delete to disable automatic generated methods instead of private, and put the constructor together for more clear.This modification cause the unused field warning, so add unused attribute to disable this warning. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5009 Differential Revision: D17288733 fbshipit-source-id: 8a767ce096f185f1db01bd28fc88fef1cdd921f3	12 September 2019, 01:09:00 UTC
fcda80f	Wilfried Goesgens	12 September 2019, 00:58:34 UTC	record the timestamp on first configure (#4799) Summary: cmake doesn't re-generate the timestamp on subsequent builds causing rebuilds of the lib This improves compile time turn-arounds if you have rocksdb as a compileable library include, since with the state its now it will re-generate the time stamp .cc file each time you build, and thus re-compile + re-link the rocksdb library though anything in the source actually changed. The original timestamp is recorded into `CMakeCache.txt` and will remain there until you flush this cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4799 Differential Revision: D17290040 fbshipit-source-id: 28357fef3422693c9c19e88fa2873c8db0f662ed	12 September 2019, 01:00:02 UTC
dd2a35f	Andrew Kryczka	11 September 2019, 21:11:38 UTC	Support partitioned index and filters in stress/crash tests (#4020) Summary: - In `db_stress`, support choosing index type and whether to enable filter partitioning, and randomly set those options in crash test - When partitioned filter is enabled by crash test, force partitioned index to also be enabled since it's a prerequisite Pull Request resolved: https://github.com/facebook/rocksdb/pull/4020 Test Plan: currently this is blocked on fixing the bug that crash test caught: ``` $ TEST_TMPDIR=/data/compaction_bench python ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000 ... Verification failed for column family 0 key 937501: Value not found: NotFound: Crash-recovery verification failed :( ``` Differential Revision: D8508683 Pulled By: maysamyabandeh fbshipit-source-id: 0337e5d0558bcef26b1f3699f47265a2c1e99629	11 September 2019, 21:13:38 UTC
20dd828	Andrew Kryczka	11 September 2019, 21:04:54 UTC	Avoid clock_gettime on pre-10.12 macOS versions (#5570) Summary: On older macOS like 10.10 we saw the following compiler error: ``` /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/env/env_posix.cc:845:19: error: use of undeclared identifier 'CLOCK_THREAD_CPUTIME_ID' clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts); ^ ``` According to mac's `man clock_gettime`: "These functions first appeared in Mac OSX 10.12". So we should not try to compile it on earlier versions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5570 Test Plan: verified it compiles now on 10.10. Also did some investigation to ensure it does not cause regression on macOS 10.12+, although I do not have access to such an environment to really test. Differential Revision: D17322629 Pulled By: riversand963 fbshipit-source-id: e0a412223854f826b4d83e6d15c3739ff4620d7d	11 September 2019, 21:07:25 UTC
c85c87a	tongyingrui	11 September 2019, 19:02:49 UTC	test size was wrong in 'fillbatch' benchmark (#5198) Summary: for fillbatch benchmar, the numEntries should be [num_] but not [num_ / 1000] because numEntries is just the total entries we want to test Pull Request resolved: https://github.com/facebook/rocksdb/pull/5198 Differential Revision: D17274664 Pulled By: anand1976 fbshipit-source-id: f96e952babdbac63fb99d14e1254d478a10437be	11 September 2019, 19:04:44 UTC
2becafd	anand76	10 September 2019, 21:32:38 UTC	Fix Appveyor build due to signed/unsigned comparison Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5788 Test Plan: Travis CI and Appveyor should complete successfully. Differential Revision: D17287422 Pulled By: anand1976 fbshipit-source-id: d9408b692f78be95d0088b29b33f6a8ff40ec97b	10 September 2019, 21:34:37 UTC
eb9026f	anand76	10 September 2019, 18:03:31 UTC	Add a db_bench benchmark to warm up the row cache Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5707 Differential Revision: D17242698 Pulled By: anand1976 fbshipit-source-id: 5d1bfda3c9e8f56176ae391cae6c91e6262016b8	10 September 2019, 18:06:36 UTC
4d945c5	jsteemann	10 September 2019, 16:40:21 UTC	do a bit less work in the normal case (#5695) Summary: i.e. if alive logfile is not being moved to archive while we are in GetSortedWalsOfType() Pull Request resolved: https://github.com/facebook/rocksdb/pull/5695 Differential Revision: D17279489 Pulled By: vjnadimpalli fbshipit-source-id: 02bcf920a75b812edba8b87c6079b4e6fd5e683c	10 September 2019, 16:41:45 UTC
699e1b5	Richard He	10 September 2019, 01:11:02 UTC	Added support for SstFileReader JNI interface (#5556) Summary: Feature request as per https://github.com/facebook/rocksdb/issues/5538 issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5556 Differential Revision: D17219008 fbshipit-source-id: e31f18dec318416eac9dea8213bab31da96e1f3a	10 September 2019, 01:12:53 UTC
7af6ced	Peter Dillinger	09 September 2019, 22:24:54 UTC	Fix block allocation bug in new DynamicBloom (#5783) Summary: Bug found by valgrind. New DynamicBloom wasn't allocating in block sizes. New assertion added that probes starting in final word would be in bounds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5783 Test Plan: ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 valgrind --leak-check=full ./dynamic_bloom_test Differential Revision: D17270623 Pulled By: pdillinger fbshipit-source-id: 1e0407504b875133a771383cd488c70f91be2b87	09 September 2019, 22:26:43 UTC
108c619	Peter Dillinger	09 September 2019, 21:49:39 UTC	Add regression test for serialized Bloom filters (#5778) Summary: Check that we don't accidentally change the on-disk format of existing Bloom filter implementations, including for various CACHE_LINE_SIZE (by changing temporarily). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5778 Test Plan: thisisthetest Differential Revision: D17269630 Pulled By: pdillinger fbshipit-source-id: c77017662f010a77603b7d475892b1f0d5563d8b	09 September 2019, 21:51:30 UTC
fbab991	Wilfried Goesgens	09 September 2019, 18:22:28 UTC	upgrade gtest 1.7.0 => 1.8.1 for json result writing Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5332 Differential Revision: D17242232 fbshipit-source-id: c0d4646556a1335e51ac7382b986ca7f6ced7b64	09 September 2019, 18:24:11 UTC
adbc25a	sdong	07 September 2019, 00:29:00 UTC	Rename InternalDBStatsType enum names (#5779) Summary: When building with clang 9, warning is reported for InternalDBStatsType type names shadowed the one for statistics. Rename them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5779 Test Plan: Build with clang 9 and see it passes. Differential Revision: D17239378 fbshipit-source-id: af28fb42066c738cd1b841f9fe21ab4671dafd18	07 September 2019, 00:31:10 UTC
cbfa729	houbingjian	07 September 2019, 00:01:45 UTC	cmakelist fix， add +crypto flag when use arm crc (#5750) Summary: cmake list add +crypto flag when use armv8 cpu the function crc32c_arm64 use HAVE_ARM64_CRYPTO to check if can enable arm-neon instructions : #ifdef HAVE_ARM64_CRYPTO /* Crc32c Parallel computation * Algorithm comes from Intel whitepaper: * crc-iscsi-polynomial-crc32-instruction-paper * * Input data is divided into three equal-sized blocks * Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes * One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes */ but the cmakelist not check and pass crypto flag now I check the default Makefile has it: ifeq (,$(shell $(CXX) -fsyntax-only -march=armv8-a+crc -xc /dev/null 2>&1)) CXXFLAGS += -march=armv8-a+crc+crypto CFLAGS += -march=armv8-a+crc+crypto ARMCRC_SOURCE=1 endif Pull Request resolved: https://github.com/facebook/rocksdb/pull/5750 Differential Revision: D17242027 fbshipit-source-id: 443c9b89755b4bc34e265205ab922db1b2e14bde	07 September 2019, 00:03:21 UTC
78b8cfc	Maysam Yabandeh	06 September 2019, 22:23:44 UTC	WriteUnPrepared: Split ReadYourOwnWriteStress to three (#5776) Summary: ReadYourOwnWriteStress occasionally times out on some platforms. The patch splits it to three. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5776 Differential Revision: D17231743 Pulled By: maysamyabandeh fbshipit-source-id: d42eeaf22f61a48d50f9c404d98b1081ae8dac94	06 September 2019, 22:25:26 UTC
2208cc0	Manuel Ung	06 September 2019, 17:16:21 UTC	Fix build break in TransactionBaseImpl::TrackKey (#5771) Summary: Fix build broken in https://github.com/facebook/rocksdb/pull/5696. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5771 Differential Revision: D17217665 Pulled By: lth fbshipit-source-id: 7aa84a2a9b4feb7a3ab1cab174e09276430fe042	06 September 2019, 17:18:04 UTC
533e477	奏之章	06 September 2019, 00:50:45 UTC	Fix WriteBatchWithIndex with MergeOperator bug (#5577) Summary: ``` TEST_F(WriteBatchWithIndexTest, TestGetFromBatchAndDBMerge3) { DB* db; Options options; options.create_if_missing = true; std::string dbname = test::PerThreadDBPath("write_batch_with_index_test"); options.merge_operator = MergeOperators::CreateFromStringId("stringappend"); DestroyDB(dbname, options); Status s = DB::Open(options, dbname, &db); assert(s.ok()); ReadOptions read_options; WriteOptions write_options; FlushOptions flush_options; std::string value; WriteBatchWithIndex batch; ASSERT_OK(db->Put(write_options, "A", "1")); ASSERT_OK(db->Flush(flush_options, db->DefaultColumnFamily())); ASSERT_OK(batch.Merge("A", "2")); ASSERT_OK(batch.GetFromBatchAndDB(db, read_options, "A", &value)); ASSERT_EQ(value, "1,2"); delete db; DestroyDB(dbname, options); } ``` Fix ASSERT in batch.GetFromBatchAndDB() Pull Request resolved: https://github.com/facebook/rocksdb/pull/5577 Differential Revision: D16379847 fbshipit-source-id: b1320e24ec8e71350c525083cc0a16180a63f752	06 September 2019, 00:52:14 UTC
cfc2001	Richard He	06 September 2019, 00:33:35 UTC	Fixed FALLOC_FL_KEEP_SIZE undefined (#5614) Summary: Fix `error: ‘FALLOC_FL_KEEP_SIZE’` undeclared error in `io_posix.cc` during Vagrant build in CentOS as per issue https://github.com/facebook/rocksdb/issues/5599 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5614 Differential Revision: D17217960 fbshipit-source-id: ef736c51b16833107fd9ccc7917ed1def2a8d02c	06 September 2019, 00:37:21 UTC
eae9f04	Jeffrey Xiao	06 September 2019, 00:28:40 UTC	Initialized pinned_pos_ and pinned_seq_pos_ in FragmentedRangeTombstoneIterator (#5720) Summary: These uninitialized member variables can cause a key to not be pinned when it should be, causing erroneous behavior. For example ingesting a file with range deletion tombstones will yield an "external file have corrupted keys" on a Mac. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5720 Differential Revision: D17217673 fbshipit-source-id: cd7df7ce3ad9cf69c841c4d3dc6fd144eff9e212	06 September 2019, 00:30:29 UTC
83b9919	Yi Wu	06 September 2019, 00:18:36 UTC	Fix EncryptedEnv assert (#5735) Summary: Fixes https://github.com/facebook/rocksdb/issues/5734. By reading the code the assert don't quite make sense to me, since `dataSize` and `fileOffset` has no correlation. But my knowledge about `EncryptedEnv` is very limited. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5735 Test Plan: run `ENCRYPTED_ENV=1 ./db_encryption_test` Signed-off-by: Yi Wu <yiwu@pingcap.com> Differential Revision: D17133849 fbshipit-source-id: bb7262d308e5b2503c400b180edc252668df0ef0	06 September 2019, 00:21:42 UTC
43a5cdb	Andrew Kryczka	06 September 2019, 00:17:18 UTC	remove unused #include to fix musl libc build (#5583) Summary: The `#include "core_local.h"` was pulling in libgcc's `posix_memalign()` declaration. That declaration specifies `throw()` whereas musl libc's declaration does not. This was leading to the following compiler error when using musl libc: ``` In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/port/jemalloc_helper.h:26:0, from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.h:11, from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.cc:6: /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: error: declaration of 'int posix_memalign(void, size_t, size_t) throw ()' has a different exception specifier # define je_posix_memalign posix_memalign ^ /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: note: from previous declaration 'int posix_memalign(void, size_t, size_t)' # define je_posix_memalign posix_memalign ^ /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:202:38: note: in expansion of macro 'je_posix_memalign' JEMALLOC_EXPORT int JEMALLOC_NOTHROW je_posix_memalign(void memptr, ^~~~~~~~~~~~~~~~~ make[4]: * [CMakeFiles/rocksdb.dir/util/jemalloc_nodump_allocator.cc.o] Error 1 ``` Since `#include "core_local.h"` is not actually used, we can just remove it. I verified that fixes the build. There was a related PR here (https://github.com/facebook/rocksdb/issues/2188), although the problem description is slightly different. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5583 Differential Revision: D16343227 fbshipit-source-id: 0386bc2b5fd55b2c3b5fba19382014efa52e44f8	06 September 2019, 00:18:49 UTC
ac97e69	HouBingjian	06 September 2019, 00:01:58 UTC	bloom test check fail on arm (#5745) Summary: FullFilterBitsBuilder::CalculateSpace use CACHE_LINE_SIZE which is 64@X86 but 128@ARM64 when it run bloom_test.FullVaryingLengths it failed on ARM64 server, the assert can be fixed by change 128->CACHE_LINE_SIZE2 as merged ASSERT_LE(FilterSize(), (size_t)((length 10 / 8) + CACHE_LINE_SIZE * 2 + 5)) << length; run bloom_test before fix: /root/rocksdb-master/util/bloom_test.cc:281: Failure Expected: (FilterSize()) <= ((size_t)((length * 10 / 8) + 128 + 5)), actual: 389 vs 383 200 [ FAILED ] FullBloomTest.FullVaryingLengths (32 ms) [----------] 4 tests from FullBloomTest (32 ms total) [----------] Global test environment tear-down [==========] 7 tests from 2 test cases ran. (116 ms total) [ PASSED ] 6 tests. [ FAILED ] 1 test, listed below: [ FAILED ] FullBloomTest.FullVaryingLengths after fix: Filters: 37 good, 0 mediocre [ OK ] FullBloomTest.FullVaryingLengths (90 ms) [----------] 4 tests from FullBloomTest (90 ms total) [----------] Global test environment tear-down [==========] 7 tests from 2 test cases ran. (174 ms total) [ PASSED ] 7 tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5745 Differential Revision: D17076047 fbshipit-source-id: e7beb5d55d4855fceb2b84bc8119a6b0759de635	06 September 2019, 00:03:24 UTC
b55b2f4	Peter Dillinger	05 September 2019, 21:57:39 UTC	Faster new DynamicBloom implementation (for memtable) (#5762) Summary: Since DynamicBloom is now only used in-memory, we're free to change it without schema compatibility issues. The new implementation is drawn from (with manifest permission) https://github.com/pdillinger/wormhashing/blob/303542a767437f56d8b66cea6ebecaac0e6a61e9/bloom_simulation_tests/foo.cc#L613 This has several speed advantages over the prior implementation: * Uses fastrange instead of % * Minimum logic to determine first (and all) probed memory addresses * (Major) Two probes per 64-bit memory fetch/write. * Very fast and effective (murmur-like) hash expansion/re-mixing. (At least on recent CPUs, integer multiplication is very cheap.) While a Bloom filter with 512-bit cache locality has about a 1.15x FP rate penalty (e.g. 0.84% to 0.97%), further restricting to two probes per 64 bits incurs an additional 1.12x FP rate penalty (e.g. 0.97% to 1.09%). Nevertheless, the unit tests show no "mediocre" FP rate samples, unlike the old implementation with more erratic FP rates. Especially for the memtable, we expect speed to outweigh somewhat higher FP rates. For example, a negative table query would have to be 1000x slower than a BF query to justify doubling BF query time to shave 10% off FP rate (working assumption around 1% FP rate). While that seems likely for SSTs, my data suggests a speed factor of roughly 50x for the memtable (vs. BF; ~1.5% lower write throughput when enabling memtable Bloom filter, after this change). Thus, it's probably not worth even 5% more time in the Bloom filter to shave off 1/10th of the Bloom FP rate, or 0.1% in absolute terms, and it's probably at least 20% slower to recoup that much FP rate from this new implementation. Because of this, we do not see a need for a 'locality' option that affects the MemTable Bloom filter and have decoupled the MemTable Bloom filter from Options::bloom_locality. Note that just 3% more memory to the Bloom filter (10.3 bits per key vs. just 10) is able to make up for the ~12% FP rate drop in the new implementation: [] # Nearly "ideal" FP-wise but reasonably fast cache-local implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out time: 3.29372 sampled_fp_rate: 0.00985956 ... [] # Close match to this new implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out 10000000 6 10.3 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.10072 sampled_fp_rate: 0.00985655 ... [] # Old locality=1 implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out time: 3.95472 sampled_fp_rate: 0.00988943 ... Also note the dramatic speed improvement vs. alternatives. -- Performance unit test: DynamicBloomTest.concurrent_with_perf is updated to report more precise timing data. (Measure running time of each thread, not just longest running thread, etc.) Results averaged over various sizes enabled with --enable_perf and 20 runs each; old dynamic bloom refers to locality=1, the faster of the old: old dynamic bloom, avg add latency = 65.6468 new dynamic bloom, avg add latency = 44.3809 old dynamic bloom, avg query latency = 50.6485 new dynamic bloom, avg query latency = 43.2186 old avg parallel add latency = 41.678 new avg parallel add latency = 24.5238 old avg parallel hit latency = 14.6322 new avg parallel hit latency = 12.3939 old avg parallel miss latency = 16.7289 new avg parallel miss latency = 12.2134 Tested on a dedicated 64-bit production machine at Facebook. Significant improvement all around. Despite now using std::atomic<uint64_t>, quick before-and-after test on a 32-bit machine (Intel Atom N270, released 2008) shows no regression in performance, in some cases modest improvement. -- Performance integration test (synthetic): with DEBUG_LEVEL=0, used TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom,readmissing,readrandom,stats --num=2000000 and optionally with -memtable_whole_key_filtering -memtable_bloom_size_ratio=0.01 300 runs each configuration. Write throughput change by enabling memtable bloom: Old locality=0: -3.06% Old locality=1: -2.37% New: -1.50% conclusion -> seems to substantially close the gap Readmissing throughput change by enabling memtable bloom: Old locality=0: +34.47% Old locality=1: +34.80% New: +33.25% conclusion -> maybe a small new penalty from FP rate Readrandom throughput change by enabling memtable bloom: Old locality=0: +31.54% Old locality=1: +31.13% New: +30.60% conclusion -> maybe also from FP rate (after memtable flush) -- Another conclusion we can draw from this new implementation is that the existing 32-bit hash function is not inherently crippling the Bloom filter speed or accuracy, below about 5 million keys. For speed, the implementation is essentially the same whether starting with 32-bits or 64-bits of hash; it just determines whether the first multiplication after fastrange is a pseudorandom expansion or needed re-mix. Note that this multiplication can occur while memory is fetching. For accuracy, in a standard configuration, you need about 5 million keys before you have about a 1.1x FP penalty due to using a 32-bit hash vs. 64-bit: [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.52069 sampled_fp_rate: 0.0118267 ... [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out time: 2.43871 sampled_fp_rate: 0.0109059 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5762 Differential Revision: D17214194 Pulled By: pdillinger fbshipit-source-id: ad9da031772e985fd6b62a0e1db8e81892520595	05 September 2019, 21:59:25 UTC
19e8c9b	jsteemann	05 September 2019, 20:57:59 UTC	use c++17's try_emplace if available (#5696) Summary: This avoids rehashing the key in TrackKey() in case the key is not already in the map of tracked keys, which will happen at least once per key used in a transaction. Additionally fix two typos. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5696 Differential Revision: D17210178 Pulled By: lth fbshipit-source-id: 7e2c28e9e505c1d1c1535d435250cf2b191a6fdf	05 September 2019, 20:59:40 UTC
20dec14	Peter Dillinger	05 September 2019, 17:03:42 UTC	Copy/split PlainTableBloomV1 from DynamicBloom (refactor) (#5767) Summary: DynamicBloom was being used both for memory-only and for on-disk filters, as part of the PlainTable format. To set up enhancements to the memtable Bloom filter, this splits the code into two copies and removes unused features from each copy. Adds test PlainTableDBTest.BloomSchema to ensure no accidental change to that format. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5767 Differential Revision: D17206963 Pulled By: pdillinger fbshipit-source-id: 6cce8d55305ed0df051b4c58bdc98c8ad81d0553	05 September 2019, 17:05:20 UTC
3f2723a	ENDOH takanao	04 September 2019, 21:31:02 UTC	fix checking the '-march' flag (#5766) Summary: Hi! guys, I got errors on the ARM machine. before: ```console $ make static_lib ... g++: error: unrecognized argument in option '-march=armv8-a+crc+crypto' g++: note: valid arguments to '-march=' are: armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5e armv5t armv5te armv6 armv6-m armv6j armv6k armv6kz armv6s-m armv6t2 armv6z armv6zk armv7 armv7-a armv7-m armv7-r armv7e-m armv7ve armv8-a armv8-a+crc armv8.1-a armv8.1-a+crc iwmmxt iwmmxt2 native ``` Thanks! Pull Request resolved: https://github.com/facebook/rocksdb/pull/5766 Differential Revision: D17191117 fbshipit-source-id: 7a61e3a2a4a06f37faeb8429bd7314da54ec5868	04 September 2019, 21:34:28 UTC
f9fb9f1	Maysam Yabandeh	04 September 2019, 21:28:39 UTC	Add a unit test to detect infinite loops with reseek optimizations (#5727) Summary: Iterators reseek to the target key after iterating over max_sequential_skip_in_iterations invalid values. The logic is susceptible to an infinite loop bug, which has been present with WritePrepared Transactions up until 6.2 release. Although the bug is not present on master, the patch adds a unit test to prevent it from resurfacing again. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5727 Differential Revision: D16952759 Pulled By: maysamyabandeh fbshipit-source-id: d0d973dddc8dfabd5a794931232aa4c862c74f51	04 September 2019, 21:31:10 UTC
229e6fb	Affan Dar	04 September 2019, 19:08:56 UTC	Adding DB::GetCurrentWalFile() API as a repliction/backup helper (#5765) Summary: Adding a light weight API to get last live WAL file name and size. Meant to be used as a helper for backup/restore tooling in a larger ecosystem such as MySQL with a MyRocks storage engine. Specifically within MySQL's backup/restore mechanism, this call can be made with a write lock on the mysql db to get a transactionally consistent snapshot of the current WAL file position along with other non-rocksdb log/data files. Without this, the alternative would be to take the aforementioned lock, scan the WAL dir for all files, find the last file and note its exact size as the rocksdb 'checkpoint'. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5765 Differential Revision: D17172717 Pulled By: affandar fbshipit-source-id: f2fabafd4c0e6fc45f126670c8c88a9f84cb8a37	04 September 2019, 19:10:17 UTC
38b17ec	Yanqin Jin	04 September 2019, 18:36:47 UTC	Replace named comparator struct with lambda (#5768) Summary: Tiny code mod: replace a named comparator struct with anonymous lambda. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5768 Differential Revision: D17185141 Pulled By: riversand963 fbshipit-source-id: fabe367649931c33a39ad035dc707d2efc3ad5fc	04 September 2019, 18:38:34 UTC
cdb6334	git-hulk	03 September 2019, 18:21:47 UTC	MOD: trim last space and comma in perf context and iostat context ToString() Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5755 Differential Revision: D17165190 Pulled By: riversand963 fbshipit-source-id: a3a4633961bfe019bf360f97a4c4d36464e7fa0b	03 September 2019, 19:27:17 UTC
979fbdc	Vijay Nadimpalli	03 September 2019, 15:50:47 UTC	Persistent globally unique DB ID in manifest (#5725) Summary: Each DB has a globally unique ID. A DB can be physically copied around, or backed-up and restored, and the users should be identify the same DB. This unique ID right now is stored as plain text in file IDENTITY under the DB directory. This approach introduces at least two problems: (1) the file is not checksumed; (2) the source of truth of a DB is the manifest file, which can be copied separately from IDENTITY file, causing the DB ID to be wrong. The goal of this PR is solve this problem by moving the DB ID to manifest. To begin with we will write to both identity file and manifest. Write to Manifest is controlled via the flag write_dbid_to_manifest in Options and default is false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5725 Test Plan: Added unit tests. Differential Revision: D16963840 Pulled By: vjnadimpalli fbshipit-source-id: 8a86a4c8c82c716003c40fd6b9d2d758030d92e9	03 September 2019, 15:52:24 UTC
44eca41	Yanqin Jin	31 August 2019, 01:27:43 UTC	Fix a bug in file ingestion (#5760) Summary: Before this PR, when the number of column families involved in a file ingestion exceeds 2, a bug in the looping logic prevents correct file number being assigned to each ingestion job. Also skip deleting non-existing hard links during cleanup-after-failure. Test plan (devserver) ``` $COMPILE_WITH_ASAN=1 make all $./external_sst_file_test --gtest_filter=ExternalSSTFileTest/ExternalSSTFileTest.IngestFilesIntoMultipleColumnFamilies_/ $makke check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5760 Differential Revision: D17142982 Pulled By: riversand963 fbshipit-source-id: 06c1847a4e7a402647bcf28d124e70f2a0f9daf6	31 August 2019, 01:29:07 UTC
672befe	Yanqin Jin	30 August 2019, 19:40:24 UTC	Fix assertion failure in FIFO compaction with TTL (#5754) Summary: Before this PR, the following sequence of events can cause assertion failure as shown below. Stack trace (partial): ``` (gdb) bt 2 0x00007f59b350ad15 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x9f8390 "mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted", file=file@entry=0x9e347c "db/compaction/compaction.cc", line=line@entry=395, function=function@entry=0xa21ec0 <rocksdb::Compaction::MarkFilesBeingCompacted(bool)::__PRETTY_FUNCTION__> "void rocksdb::Compaction::MarkFilesBeingCompacted(bool)") at assert.c:92 3 0x00007f59b350adc3 in __GI___assert_fail (assertion=assertion@entry=0x9f8390 "mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted", file=file@entry=0x9e347c "db/compaction/compaction.cc", line=line@entry=395, function=function@entry=0xa21ec0 <rocksdb::Compaction::MarkFilesBeingCompacted(bool)::__PRETTY_FUNCTION__> "void rocksdb::Compaction::MarkFilesBeingCompacted(bool)") at assert.c:101 4 0x0000000000492ccd in rocksdb::Compaction::MarkFilesBeingCompacted (this=<optimized out>, mark_as_compacted=<optimized out>) at db/compaction/compaction.cc:394 5 0x000000000049467a in rocksdb::Compaction::Compaction (this=0x7f59af013000, vstorage=0x7f581af53030, _immutable_cf_options=..., _mutable_cf_options=..., _inputs=..., _output_level=<optimized out>, _target_file_size=0, _max_compaction_bytes=0, _output_path_id=0, _compression=<incomplete type>, _compression_opts=..., _max_subcompactions=0, _grandparents=..., _manual_compaction=false, _score=4, _deletion_compaction=true, _compaction_reason=rocksdb::CompactionReason::kFIFOTtl) at db/compaction/compaction.cc:241 6 0x00000000004af9bc in rocksdb::FIFOCompactionPicker::PickTTLCompaction (this=0x7f59b31a6900, cf_name=..., mutable_cf_options=..., vstorage=0x7f581af53030, log_buffer=log_buffer@entry=0x7f59b1bfa930) at db/compaction/compaction_picker_fifo.cc:101 7 0x00000000004b0771 in rocksdb::FIFOCompactionPicker::PickCompaction (this=0x7f59b31a6900, cf_name=..., mutable_cf_options=..., vstorage=0x7f581af53030, log_buffer=0x7f59b1bfa930) at db/compaction/compaction_picker_fifo.cc:201 8 0x00000000004838cc in rocksdb::ColumnFamilyData::PickCompaction (this=this@entry=0x7f59b31b3700, mutable_options=..., log_buffer=log_buffer@entry=0x7f59b1bfa930) at db/column_family.cc:933 9 0x00000000004f3645 in rocksdb::DBImpl::BackgroundCompaction (this=this@entry=0x7f59b3176000, made_progress=made_progress@entry=0x7f59b1bfa6bf, job_context=job_context@entry=0x7f59b1bfa760, log_buffer=log_buffer@entry=0x7f59b1bfa930, prepicked_compaction=prepicked_compaction@entry=0x0, thread_pri=rocksdb::Env::LOW) at db/db_impl/db_impl_compaction_flush.cc:2541 10 0x00000000004f5e2a in rocksdb::DBImpl::BackgroundCallCompaction (this=this@entry=0x7f59b3176000, prepicked_compaction=prepicked_compaction@entry=0x0, bg_thread_pri=bg_thread_pri@entry=rocksdb::Env::LOW) at db/db_impl/db_impl_compaction_flush.cc:2312 11 0x00000000004f648e in rocksdb::DBImpl::BGWorkCompaction (arg=<optimized out>) at db/db_impl/db_impl_compaction_flush.cc:2087 ``` This can be caused by the following sequence of events. ``` Time \| thr bg_compact_thr1 bg_compact_thr2 \| write \| flush \| mark all l0 as being compacted \| write \| flush \| add cf to queue again \| mark all l0 as being \| compacted, fail the \| assertion V ``` Test plan (on devserver) Since bg_compact_thr1 and bg_compact_thr2 are two threads executing the same code, it is difficult to use sync point dependency to coordinate their execution. Therefore, I choose to use db_stress. ``` $TEST_TMPDIR=/dev/shm/rocksdb ./db_stress --periodic_compaction_seconds=1 --max_background_compactions=20 --format_version=2 --memtablerep=skip_list --max_write_buffer_number=3 --cache_index_and_filter_blocks=1 --reopen=20 --recycle_log_file_num=0 --acquire_snapshot_one_in=10000 --delpercent=4 --log2_keys_per_lock=22 --compaction_ttl=1 --block_size=16384 --use_multiget=1 --compact_files_one_in=1000000 --target_file_size_multiplier=2 --clear_column_family_one_in=0 --max_bytes_for_level_base=10485760 --use_full_merge_v1=1 --target_file_size_base=2097152 --checkpoint_one_in=1000000 --mmap_read=0 --compression_type=zstd --writepercent=35 --readpercent=45 --subcompactions=4 --use_merge=0 --write_buffer_size=4194304 --test_batches_snapshots=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox --use_direct_reads=0 --compact_range_one_in=1000000 --open_files=-1 --destroy_db_initially=0 --progress_reports=0 --compression_zstd_max_train_bytes=0 --snapshot_hold_ops=100000 --enable_pipelined_write=0 --nooverwritepercent=1 --compression_max_dict_bytes=0 --max_key=1000000 --prefixpercent=5 --flush_one_in=1000000 --ops_per_thread=40000 --index_block_restart_interval=7 --cache_size=1048576 --compaction_style=2 --verify_checksum=1 --delrangepercent=1 --use_direct_io_for_flush_and_compaction=0 ``` This should see no assertion failure. Last but not least, ``` $COMPILE_WITH_ASAN=1 make -j32 all $make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5754 Differential Revision: D17109791 Pulled By: riversand963 fbshipit-source-id: 25fc46101235add158554e096540b72c324be078	30 August 2019, 19:42:01 UTC
9a44986	Paul O'Shannessy	30 August 2019, 06:19:10 UTC	Adopt Contributor Covenant Summary: In order to foster healthy open source communities, we're adopting the [Contributor Covenant](https://www.contributor-covenant.org/). It has been built by open source community members and represents a shared understanding of what is expected from a healthy community. Reviewed By: josephsavona, danobi, rdzhabarov Differential Revision: D17104640 fbshipit-source-id: d210000de686c5f0d97d602b50472d5869bc6a49	30 August 2019, 06:21:01 UTC
a281822	Pratik Dhandharia	29 August 2019, 21:06:07 UTC	Lower the risk for users to run options.force_consistency_checks = true (#5744) Summary: Open-source users recently reported two occurrences of LSM-tree corruption (https://github.com/facebook/rocksdb/issues/5558 is one), which would be caught by options.force_consistency_checks = true. options.force_consistency_checks has a usability limitation because it crashes the service once inconsistency is detected. This makes the feature hard to use. Most users serve from multiple RocksDB shards per server and the impacts of crashing the service is higher than it should be. Instead, we just pass the error back to users without killing the service, and ask them to deal with the problem accordingly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5744 Differential Revision: D17096940 Pulled By: pdhandharia fbshipit-source-id: b6780039044e265f26ed2ad03c51f4abbe8b603c	29 August 2019, 21:07:37 UTC
1729779	anand76	29 August 2019, 19:12:05 UTC	Disable MultiGet row cache test in LITE mode (#5756) Summary: Row cache is not supported in LITE mode. So disable the test in that mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5756 Test Plan: make LITE=1 all check Differential Revision: D17115684 Pulled By: anand1976 fbshipit-source-id: e6433c2e528674645cea76cdfc80ddc473708fc2	29 August 2019, 19:13:28 UTC
c5e12eb	Jeremy Taylor	29 August 2019, 18:28:37 UTC	Add Crux to USERS.md Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5718 Differential Revision: D17096939 Pulled By: riversand963 fbshipit-source-id: 4301078d3ca3d54a1c7e841eccad95379cd1570d	29 August 2019, 18:29:48 UTC
ab0645a	Shafreeck Sea	29 August 2019, 17:55:40 UTC	Fix comment of function NotifyCollectTableCollectorsOnFinish (#5738) Summary: Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5738 Differential Revision: D17097075 Pulled By: riversand963 fbshipit-source-id: ed01b5f59e8eed262a49abe1f96552842d364af1	29 August 2019, 17:57:01 UTC
e105703	anand76	28 August 2019, 23:10:38 UTC	Support row cache with batched MultiGet (#5706) Summary: This PR adds support for row cache in ```rocksdb::TableCache::MultiGet```. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5706 Test Plan: 1. Unit tests in db_basic_test 2. db_bench results with batch size of 2 (```Get``` is faster than ```MultiGet``` for single key) - Get - readrandom : 3.935 micros/op 254116 ops/sec; 28.1 MB/s (22870998 of 22870999 found) MultiGet - multireadrandom : 3.743 micros/op 267190 ops/sec; (24047998 of 24047998 found) Command used - TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench -use_existing_db=true -use_existing_keys=false -benchmarks="readtorowcache,[read\|multiread]random" -write_buffer_size=16777216 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -row_cache_size=4194304000 -batch_size=2 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=131072 Differential Revision: D17086297 Pulled By: anand1976 fbshipit-source-id: 85784378da913e05f1baf31ec1b4e7c9345e7f57	28 August 2019, 23:11:56 UTC
1daff8f	sdong	28 August 2019, 00:54:18 UTC	crash_test to skip compaction TTL for FIFO compaction. (#5749) Summary: https://github.com/facebook/rocksdb/pull/5741 added compaction TTL to crash test, but it causes assertion fails for FIFO compaction. Disable this combination for now while we debug the assertion failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5749 Test Plan: Run crash test and observe that when compaction_style=2, compaction_ttl is always 0. Differential Revision: D17078292 fbshipit-source-id: 446821a3b9739956094d5e4f9be1251a15b57f5d	28 August 2019, 00:55:37 UTC
1b4c104	Pratik Dhandharia	27 August 2019, 17:57:28 UTC	replace some reinterpret_cast with static_cast_with_check (#5740) Summary: This PR focuses on replacing some of the reinterpret_cast<DBImpl*> to static_cast_with_check<DBImpl, DB>. Files impacted: ./db/db_impl/db_impl_compaction_flush.cc ./db/write_batch.cc ./utilities/blob_db/blob_db_impl.cc ./utilities/transactions/pessimistic_transaction_db.cc ./utilities/transactions/transaction_base.cc ./utilities/transactions/write_prepared_txn_db.cc ./utilities/transactions/write_unprepared_txn_db.cc Pull Request resolved: https://github.com/facebook/rocksdb/pull/5740 Differential Revision: D17055691 Pulled By: pdhandharia fbshipit-source-id: 0f8034d1b32eade56e37d59c04b7bf236a81d8e8	27 August 2019, 17:59:11 UTC
1d6a10f	sdong	26 August 2019, 22:00:04 UTC	Extend stress test to cover periodic compaction and compaction TTL (#5741) Summary: Covering periodic compaction and compaction TTL can help us expose potential issues. Add it there. Randomly select value for these two options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5741 Test Plan: Run crash_test and see the perameters generated. Differential Revision: D17059515 fbshipit-source-id: 8213974846a0b6a22fc13be705825c9054d1d097	26 August 2019, 22:03:25 UTC
ba0967b	Andrew Kryczka	26 August 2019, 21:59:24 UTC	Reduce severity of too many levels log message (#5742) Summary: This condition is now a normal occurrence during write burst so there is no need to warn the user about it. Here is a scenario where it happens under completely normal conditions. * Initially we have a DB of three levels (L0, L1, and L2) that is stable, i.e., compaction scores are all less than one. * Now a write burst comes along. At first L0 blows up a bit in size as compaction hasn't had a chance to catch up. * As a result of the above, `base_bytes_min` also increases since it is based on L0 size as of https://github.com/facebook/rocksdb/issues/4338 * If `base_bytes_min` increased enough (i.e., to be larger than L1), then we are shown the warning that the DB has more levels than necessary. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5742 Differential Revision: D17059221 fbshipit-source-id: e4a31d6eea42089a8d273095f19653991bd91bea	26 August 2019, 22:00:43 UTC
62829ff	jsteemann	26 August 2019, 18:24:40 UTC	reuse scratch buffer in transaction_log_reader (#5702) Summary: in order to avoid reallocations for a scratch std::string on every call to Next(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5702 Differential Revision: D16867803 fbshipit-source-id: 1391220a1b172b23336bbc71dc0c79ccf3b1c701	26 August 2019, 18:26:29 UTC
2f41ecf	Zhongyi Xie	23 August 2019, 20:54:09 UTC	Refactor trimming logic for immutable memtables (#5022) Summary: MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory. We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one. The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming. In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022 Differential Revision: D14394062 Pulled By: miasantreble fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5	23 August 2019, 20:55:34 UTC
26293c8	DaiZhiwei	23 August 2019, 18:02:06 UTC	crc32c_arm64 performance optimization (#5675) Summary: Crc32c Parallel computation coding optimization: Macro unfolding removes the "for" loop and is good to decrease branch-miss in arm64 micro architecture 1024 Bytes is divided into 8(head) + 1008( 6 * 7 * 3 * 8 ) + 8(tail) three parts Macro unfolding 42 loops to 6 CRC32C7X24BYTESs 1 CRC32C7X24BYTES containing 7 CRC32C24BYTESs 1, crc32c_test [==========] Running 4 tests from 1 test case. [----------] Global test environment set-up. [----------] 4 tests from CRC [ RUN ] CRC.StandardResults [ OK ] CRC.StandardResults (1 ms) [ RUN ] CRC.Values [ OK ] CRC.Values (0 ms) [ RUN ] CRC.Extend [ OK ] CRC.Extend (0 ms) [ RUN ] CRC.Mask [ OK ] CRC.Mask (0 ms) [----------] 4 tests from CRC (1 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test case ran. (1 ms total) [ PASSED ] 4 tests. 2, db_bench --benchmarks="crc32c" crc32c : 0.218 micros/op 4595390 ops/sec; 17950.7 MB/s (4096 per op) 3, repeated crc32c_test case 60000 times perf stat -e branch-miss -- ./crc32c_test before optimization: 739,426,504 branch-miss after optimization: 1,128,572 branch-miss Pull Request resolved: https://github.com/facebook/rocksdb/pull/5675 Differential Revision: D16989210 fbshipit-source-id: 7204e6069bb6ed066d49c2d1b3ac385065a98557	23 August 2019, 18:04:08 UTC
df8c307	Levi Tamasi	23 August 2019, 15:25:52 UTC	Revert to storing UncompressionDicts in the cache (#5645) Summary: PR https://github.com/facebook/rocksdb/issues/5584 decoupled the uncompression dictionary object from the underlying block data; however, this defeats the purpose of the digested ZSTD dictionary, since the whole point of the digest is to create it once and reuse it over and over again. This patch goes back to storing the uncompression dictionary itself in the cache (which should be now safe to do, since it no longer includes a Statistics pointer), while preserving the rest of the refactoring. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5645 Test Plan: make asan_check Differential Revision: D16551864 Pulled By: ltamasi fbshipit-source-id: 2a7e2d34bb16e70e3c816506d5afe1d842057800	23 August 2019, 15:27:30 UTC
d8a27d9	sdong	22 August 2019, 23:30:30 UTC	Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729) Summary: AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same. Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729 Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed. Differential Revision: D16969791 fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163	22 August 2019, 23:32:55 UTC
202942b	Patrick Pei	22 August 2019, 23:20:20 UTC	Fix local includes Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5722 Differential Revision: D16908380 fbshipit-source-id: 6a0e3cb2730b08d6012d3d7f31c937f01c399846	22 August 2019, 23:21:47 UTC
244e6f2	Maysam Yabandeh	22 August 2019, 15:47:36 UTC	Refactor MultiGet names in BlockBasedTable (#5726) Summary: To improve code readability, since RetrieveBlock already calls MaybeReadBlockAndLoadToCache, we avoid name similarity of the functions that call RetrieveBlock with MaybeReadBlockAndLoadToCache. The patch thus renames MaybeLoadBlocksToCache to RetrieveMultipleBlock and deletes GetDataBlockFromCache, which contains only two lines. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5726 Differential Revision: D16962535 Pulled By: maysamyabandeh fbshipit-source-id: 99e8946808ce4eb7857592b9003812e3004f92d6	22 August 2019, 15:49:00 UTC
9046bdc	anand76	21 August 2019, 17:21:41 UTC	Fix MultiGet() bug when whole_key_filtering is disabled (#5665) Summary: The batched MultiGet() implementation was not correctly handling bloom filter lookups when whole_key_filtering is disabled. It was incorrectly skipping keys not in the prefix_extractor domain, and not calling transform for keys in domain. This PR fixes both problems by moving the domain check and transformation to the FilterBlockReader. Tests: Unit test (confirmed failed before the fix) make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/5665 Differential Revision: D16902380 Pulled By: anand1976 fbshipit-source-id: a6be81ad68a6e37134a65246aec7a2c590eccf00	21 August 2019, 17:23:23 UTC
7bc18e2	Maysam Yabandeh	20 August 2019, 18:38:15 UTC	Disable snapshot refresh feature when snap_refresh_nanos is 0 (#5724) Summary: The comments of snap_refresh_nanos advertise that the snapshot refresh feature will be disabled when the option is set to 0. This contract is however not honored in the code: https://github.com/facebook/rocksdb/pull/5278 The patch fixes that and also adds an assert to ensure that the feature is not used when the option is zero. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5724 Differential Revision: D16918185 Pulled By: maysamyabandeh fbshipit-source-id: fec167287df7d85093e087fc39c0eb243e3bbd7e	20 August 2019, 18:40:07 UTC
3552473	sdong	20 August 2019, 17:40:59 UTC	Introduce IngestExternalFileOptions.verify_checksums_readahead_size (#5721) Summary: Recently readahead is introduced for checksum verifying. However, users cannot override the setting for the checksum verifying before external SST file ingestion. Introduce a new option for the purpose. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5721 Test Plan: Add a new unit test for it. Differential Revision: D16906896 fbshipit-source-id: 218ec37001ddcc05411cefddbe233d15ab308476	20 August 2019, 17:43:39 UTC
4c74dba	sdong	20 August 2019, 17:33:03 UTC	Bump up memory order of ref counting of ColumnFamilyData (#5723) Summary: We see this TSAN warning: WARNING: ThreadSanitizer: data race (pid=282806) Write of size 8 at 0x7b6c00000e38 by thread T16 (mutexes: write M1023578822185846136): #0 operator delete(void) <null> (libtsan.so.0+0x0000000795f8) https://github.com/facebook/rocksdb/issues/1 rocksdb::DBImpl::BackgroundFlush(bool, rocksdb::JobContext, rocksdb::LogBuffer, rocksdb::FlushReason, rocksdb::Env::Priority) db/db_impl/db_impl_compaction_flush.cc:2202 (db_flush_test+0x00000060b462) https://github.com/facebook/rocksdb/issues/2 rocksdb::DBImpl::BackgroundCallFlush(rocksdb::Env::Priority) db/db_impl/db_impl_compaction_flush.cc:2226 (db_flush_test+0x00000060cbd8) https://github.com/facebook/rocksdb/issues/3 rocksdb::DBImpl::BGWorkFlush(void) db/db_impl/db_impl_compaction_flush.cc:2073 (db_flush_test+0x00000060d5ac) ...... Previous atomic write of size 4 at 0x7b6c00000e38 by main thread: #0 __tsan_atomic32_fetch_sub <null> (libtsan.so.0+0x00000006d721) https://github.com/facebook/rocksdb/issues/1 std::__atomic_base<int>::fetch_sub(int, std::memory_order) /mnt/gvfs/third-party2/libgcc/c67031f0f739ac61575a061518d6ef5038f99f90/7.x/platform007/5620abc/include/c++/7.3.0/bits/atomic_base.h:524 (db_flush_test+0x0000005f9e38) https://github.com/facebook/rocksdb/issues/2 rocksdb::ColumnFamilyData::Unref() db/column_family.h:286 (db_flush_test+0x0000005f9e38) https://github.com/facebook/rocksdb/issues/3 rocksdb::DBImpl::FlushMemTable(rocksdb::ColumnFamilyData, rocksdb::FlushOptions const&, rocksdb::FlushReason, bool) db/db_impl/db_impl_compaction_flush.cc:1624 (db_flush_test+0x0000005f9e38) https://github.com/facebook/rocksdb/issues/4 rocksdb::DBImpl::TEST_FlushMemTable(rocksdb::ColumnFamilyData, rocksdb::FlushOptions const&) db/db_impl/db_impl_debug.cc:127 (db_flush_test+0x00000061ace9) https://github.com/facebook/rocksdb/issues/5 rocksdb::DBFlushTest_CFDropRaceWithWaitForFlushMemTables_Test::TestBody() db/db_flush_test.cc:320 (db_flush_test+0x0000004b44e5) https://github.com/facebook/rocksdb/issues/6 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test, void (testing::Test::)(), char const*) third-party/gtest-1.7.0/fused-src/gtest/gtest-all.cc:3824 (db_flush_test+0x000000be2988) ...... It's still very clear the cause of the warning is because that TSAN treats results from relaxed atomic::fetch_sub() as non-atomic with the operation itself. We can make it more explicit by bumping up the order to CS. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5723 Test Plan: Run all existing test. Differential Revision: D16908250 fbshipit-source-id: bf17d39ed19058372bdf97f6440a743f88153021	20 August 2019, 17:34:33 UTC
8e12638	sdong	19 August 2019, 17:50:25 UTC	Slightly adjust atomic white box test's kill odd (#5717) Summary: Atomic white box test's kill odd is the same as normal test. However, in the scenario that only WritableFileWriter::Append() is blacklisted, WritableFileWriter::Flush() dominates the killing odds. Normally, most of WritableFileWriter::Flush() are called in WAL writes, where every write triggers a WAL flush. In atomic test, WAL is disabled, so the kill happens less frequently than we antipated. In some rare cases, the kill didn't end up with happening (for reasons I still don't fully understand) and cause the stress test timeout. If WAL is disabled, make the odds 5x likely to trigger. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5717 Test Plan: Run whitebox_crash_test_with_atomic_flush and whitebox_crash_test and observe the kill odds printed out. Differential Revision: D16897237 fbshipit-source-id: cbf5d96f6fc0e980523d0f1f94bf4e72cdb82d1c	19 August 2019, 17:51:59 UTC
e1c468d	sdong	16 August 2019, 23:40:09 UTC	Do readahead in VerifyChecksum() (#5713) Summary: Right now VerifyChecksum() doesn't do read-ahead. In some use cases, users won't be able to achieve good performance. With this change, by default, RocksDB will do a default readahead, and users will be able to overwrite the readahead size by passing in a ReadOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5713 Test Plan: Add a new unit test. Differential Revision: D16860874 fbshipit-source-id: 0cff0fe79ac855d3d068e6ccd770770854a68413	16 August 2019, 23:42:56 UTC
e89b1c9	Zhongyi Xie	16 August 2019, 23:37:20 UTC	add missing check for hash index when calling BlockBasedTableIterator (#5712) Summary: Previous PR https://github.com/facebook/rocksdb/pull/3601 added support for making prefix_extractor dynamically mutable. However, there was a missing check for hash index when creating new BlockBasedTableIterator. While the check may be redundant because no other types of IndexReader makes uses of the flag, it is less error-prone to add the missing check so that future index reader implementation will not worry about violating the contract. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5712 Differential Revision: D16842052 Pulled By: miasantreble fbshipit-source-id: aef11c0ff7a690ed248f5b8fe23481cac486b381	16 August 2019, 23:39:49 UTC
f2bf0b2	Adam Retter	16 August 2019, 23:25:11 UTC	Fixes for building RocksJava releases on arm64v8 Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5674 Differential Revision: D16870338 fbshipit-source-id: c8dac644b1479fa734b491f3a8d50151772290f7	16 August 2019, 23:27:50 UTC
35fe685	Kefu Chai	16 August 2019, 22:48:09 UTC	cmake: s/SNAPPY_LIBRARIES/snappy_LIBRARIES/ (#5687) Summary: fix the regression introduced by cc9fa7fc Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5687 Differential Revision: D16870212 fbshipit-source-id: 78b5519e1d2b03262d102ca530491254ddffdc38	16 August 2019, 22:49:23 UTC
e051560	sdong	16 August 2019, 22:34:49 UTC	Blacklist TransactionTest.GetWithoutSnapshot from valgrind_test (#5715) Summary: In valgrind_test, TransactionTest.GetWithoutSnapshot ran 2 hours and still didn't finish. Black list from valgrind_test to prevent timeout. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5715 Test Plan: run "make valgrind_test" and see whether the test is still generated. Differential Revision: D16866009 fbshipit-source-id: 92c78049b0bc1c2b9a0dfc1b7c8a9206b36f02f0	16 August 2019, 22:36:49 UTC
353a68d	Yanqin Jin	16 August 2019, 22:05:56 UTC	Update HISTORY.md for 6.4.0 (#5714) Summary: Update HISTORY.md by removing a feature from "Unreleased" to 6.4.0 after cherry-picking related commits to 6.4.fb branch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5714 Differential Revision: D16865334 Pulled By: riversand963 fbshipit-source-id: f17ede905a1dfbbcdf98806ca398c618cf54748a	16 August 2019, 22:09:20 UTC
a2e46ea	jsteemann	16 August 2019, 21:36:41 UTC	fix compiling with `-DNPERF_CONTEXT` (#5704) Summary: This was previously broken, as the performance context-related macro signatures in file monitoring/perf_context_imp.h deviated for the case when NPERF_CONTEXT was defined and when it was not. Update the macros for the `-DNPERF_CONTEXT` case, so it compiles. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5704 Differential Revision: D16867746 fbshipit-source-id: 05539724cb1f7955ecc42828365836a677759ad9	16 August 2019, 21:38:08 UTC
c2404d9	Eli Pozniansky	16 August 2019, 21:16:49 UTC	Optimizing ApproximateSize to create index iterator just once (#5693) Summary: VersionSet::ApproximateSize doesn't need to create two separate index iterators and do binary search for each in BlockBasedTable. So BlockBasedTable::ApproximateSize was added that creates the iterator once and uses it to calculate the data size between start and end keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5693 Differential Revision: D16774056 Pulled By: elipoz fbshipit-source-id: 53ce262e1a057788243bf30cd9b8aa6581df1a18	16 August 2019, 21:18:28 UTC
c762efc	sheng qiu	16 August 2019, 20:55:37 UTC	fix compile error: ‘FALLOC_FL_KEEP_SIZE’ undeclared (#5708) Summary: add "linux/falloc.h" in env/io_posix.cc to fix compile error: ‘FALLOC_FL_KEEP_SIZE’ undeclared Signed-off-by: sheng qiu <herbert1984106@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5708 Differential Revision: D16832922 fbshipit-source-id: 30e787c4a1b5a9724a8acfd68962ff5ec5f27d3e	16 August 2019, 20:58:05 UTC
40712df	Kefu Chai	16 August 2019, 20:54:23 UTC	ThreadPoolImpl::Impl::BGThreadWrapper() returns void (#5709) Summary: there is no need to return void*, as std::thread::thread(Func&& f, Args&&... args ) only requires `Func` to be callable. Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5709 Differential Revision: D16832894 fbshipit-source-id: a1e1b876fa8d55589ef5feb5b27f3a435068b747	16 August 2019, 20:55:41 UTC
3a3dc29	Levi Tamasi	16 August 2019, 18:15:13 UTC	Update HISTORY.md for 6.3.2/6.4.0 and add a not-yet-released change Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5710 Test Plan: HISTORY.md-only change, no testing required. Differential Revision: D16836869 Pulled By: ltamasi fbshipit-source-id: 978148f1d14b0c46839a94d7ada8a5e8ecf73965	16 August 2019, 18:17:03 UTC
bd2c753	sdong	15 August 2019, 23:59:42 UTC	Add command "list_file_range_deletes" in ldb (#5615) Summary: Add a command in ldb so that users can print out tombstones in SST files. In order to test the code, change the interface of LDBCommandRunner::RunCommand() so that it doesn't return from the program, but return the status code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5615 Test Plan: Add a new unit test Differential Revision: D16550326 fbshipit-source-id: 88ddfe6984bdcbb3a528abdd115089df09eba52e	16 August 2019, 00:01:03 UTC
6ec2bf3	Maysam Yabandeh	15 August 2019, 21:39:47 UTC	Blog post for write_unprepared (#5711) Summary: Introducing write_unprepared feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5711 Differential Revision: D16838307 Pulled By: maysamyabandeh fbshipit-source-id: d9a4daf63dd0f855bea49c14ce84e6299f1401c7	15 August 2019, 21:41:13 UTC
d61d450	Jeffrey Xiao	15 August 2019, 03:58:59 UTC	Fix IngestExternalFile overlapping check (#5649) Summary: Previously, the end key of a range deletion tombstone was considered exclusive for the purposes of deletion, but considered inclusive when checking if two SSTables overlap. For example, an SSTable with a range deletion tombstone [a, b) would be considered overlapping with an SSTable with a range deletion tombstone [b, c). This commit fixes this check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5649 Differential Revision: D16808765 Pulled By: anand1976 fbshipit-source-id: 5c7ad1c027e4f778d35070e5dae1b8e6037e0d68	15 August 2019, 04:02:28 UTC
d92a59b	Levi Tamasi	15 August 2019, 01:13:14 UTC	Fix regression affecting partitioned indexes/filters when cache_index_and_filter_blocks is false (#5705) Summary: PR https://github.com/facebook/rocksdb/issues/5298 (and subsequent related patches) unintentionally changed the semantics of cache_index_and_filter_blocks: historically, this option only affected the main index/filter block; with the changes, it affects index/filter partitions as well. This can cause performance issues when cache_index_and_filter_blocks is false since in this case, partitions are neither cached nor preloaded (i.e. they are loaded on demand upon each access). The patch reverts to the earlier behavior, that is, partitions are cached similarly to data blocks regardless of the value of the above option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5705 Test Plan: make check ./db_bench -benchmarks=fillrandom --statistics --stats_interval_seconds=1 --duration=30 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false ./db_bench -benchmarks=readrandom --use_existing_db --statistics --stats_interval_seconds=1 --duration=10 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false --cache_size=8000000000 Relevant statistics from the readrandom benchmark with the old code: rocksdb.block.cache.index.miss COUNT : 0 rocksdb.block.cache.index.hit COUNT : 0 rocksdb.block.cache.index.add COUNT : 0 rocksdb.block.cache.index.bytes.insert COUNT : 0 rocksdb.block.cache.index.bytes.evict COUNT : 0 rocksdb.block.cache.filter.miss COUNT : 0 rocksdb.block.cache.filter.hit COUNT : 0 rocksdb.block.cache.filter.add COUNT : 0 rocksdb.block.cache.filter.bytes.insert COUNT : 0 rocksdb.block.cache.filter.bytes.evict COUNT : 0 With the new code: rocksdb.block.cache.index.miss COUNT : 2500 rocksdb.block.cache.index.hit COUNT : 42696 rocksdb.block.cache.index.add COUNT : 2500 rocksdb.block.cache.index.bytes.insert COUNT : 4050048 rocksdb.block.cache.index.bytes.evict COUNT : 0 rocksdb.block.cache.filter.miss COUNT : 2500 rocksdb.block.cache.filter.hit COUNT : 4550493 rocksdb.block.cache.filter.add COUNT : 2500 rocksdb.block.cache.filter.bytes.insert COUNT : 10331040 rocksdb.block.cache.filter.bytes.evict COUNT : 0 Differential Revision: D16817382 Pulled By: ltamasi fbshipit-source-id: 28a516b0da1f041a03313e0b70b28cf5cf205d00	15 August 2019, 01:16:06 UTC
77273d4	Aaryaman Sagar	14 August 2019, 23:58:11 UTC	Fix TSAN failures in DistributedMutex tests (#5684) Summary: TSAN was not able to correctly instrument atomic bts and btr instructions, so when TSAN is enabled implement those with std::atomic::fetch_or and std::atomic::fetch_and. Also disable tests that fail on TSAN with false negatives (we know these are false negatives because this other verifiably correct program fails with the same TSAN error <link>) ``` make clean TEST_TMPDIR=/dev/shm/rocksdb OPT=-g COMPILE_WITH_TSAN=1 make J=1 -j56 folly_synchronization_distributed_mutex_test ``` This is the code that fails with the same false-negative with TSAN ``` namespace { class ExceptionWithConstructionTrack : public std::exception { public: explicit ExceptionWithConstructionTrack(int id) : id_{folly::to<std::string>(id)}, constructionTrack_{id} {} const char* what() const noexcept override { return id_.c_str(); } private: std::string id_; TestConstruction constructionTrack_; }; template <typename Storage, typename Atomic> void transferCurrentException(Storage& storage, Atomic& produced) { assert(std::current_exception()); new (&storage) std::exception_ptr(std::current_exception()); produced->store(true, std::memory_order_release); } void concurrentExceptionPropagationStress( int numThreads, std::chrono::milliseconds milliseconds) { auto&& stop = std::atomic<bool>{false}; auto&& exceptions = std::vector<std::aligned_storage<48, 8>::type>{}; auto&& produced = std::vector<std::unique_ptr<std::atomic<bool>>>{}; auto&& consumed = std::vector<std::unique_ptr<std::atomic<bool>>>{}; auto&& consumers = std::vector<std::thread>{}; for (auto i = 0; i < numThreads; ++i) { produced.emplace_back(new std::atomic<bool>{false}); consumed.emplace_back(new std::atomic<bool>{false}); exceptions.push_back({}); } auto producer = std::thread{[&]() { auto counter = std::vector<int>(numThreads, 0); for (auto i = 0; true; i = ((i + 1) % numThreads)) { try { throw ExceptionWithConstructionTrack{counter.at(i)++}; } catch (...) { transferCurrentException(exceptions.at(i), produced.at(i)); } while (!consumed.at(i)->load(std::memory_order_acquire)) { if (stop.load(std::memory_order_acquire)) { return; } } consumed.at(i)->store(false, std::memory_order_release); } }}; for (auto i = 0; i < numThreads; ++i) { consumers.emplace_back([&, i]() { auto counter = 0; while (true) { while (!produced.at(i)->load(std::memory_order_acquire)) { if (stop.load(std::memory_order_acquire)) { return; } } produced.at(i)->store(false, std::memory_order_release); try { auto storage = &exceptions.at(i); auto exc = folly::launder( reinterpret_cast<std::exception_ptr>(storage)); auto copy = std::move(exc); exc->std::exception_ptr::~exception_ptr(); std::rethrow_exception(std::move(copy)); } catch (std::exception& exc) { auto value = std::stoi(exc.what()); EXPECT_EQ(value, counter++); } consumed.at(i)->store(true, std::memory_order_release); } }); } std::this_thread::sleep_for(milliseconds); stop.store(true); producer.join(); for (auto& thread : consumers) { thread.join(); } } } // namespace ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5684 Differential Revision: D16746077 Pulled By: miasantreble fbshipit-source-id: 8af88dcf9161c05daec1a76290f577918638f79d	15 August 2019, 00:01:31 UTC

Newer
Older