89b3708 | straw | 01 June 2018, 16:33:10 UTC | add c api rocksdb_sstfilewriter_file_size Summary: Closes https://github.com/facebook/rocksdb/pull/3922 Differential Revision: D8208528 Pulled By: ajkr fbshipit-source-id: d384fe53cf526f2aadc7b79a423ce36dbd3ff224 | 01 June 2018, 16:43:59 UTC |
2a0dfaa | Zhongyi Xie | 01 June 2018, 04:27:08 UTC | fix PrefixExtractorChanged: pass raw pointer instead shared_ptr Summary: This should resolve the performance regression caused by the unnecessary copying of the shared_ptr. Closes https://github.com/facebook/rocksdb/pull/3937 Differential Revision: D8232330 Pulled By: miasantreble fbshipit-source-id: 7885bf7cd190b6f87164c52d6edd328298c13f97 | 01 June 2018, 04:42:50 UTC |
44cf849 | Maysam Yabandeh | 01 June 2018, 02:16:11 UTC | Fix the bug of some test scenarios being put after kEnd Summary: DBTestBase::OptionConfig includes the scenarios that unit tests could iterate over them by calling ChangeOptions(). Some of the options have been mistakenly put after kEnd which makes them essentially invisible to ChangeOptions() caller. This patch fixes it except for kUniversalSubcompactions which is left as TODO since it would break some unit tests. Closes https://github.com/facebook/rocksdb/pull/3935 Differential Revision: D8230748 Pulled By: maysamyabandeh fbshipit-source-id: edddb8fffcd161af1809fef24798ce118f8593db | 01 June 2018, 02:28:00 UTC |
2807678 | QingpingWang | 01 June 2018, 00:20:56 UTC | c api set bottommost level compaction Summary: Closes https://github.com/facebook/rocksdb/pull/3928 Differential Revision: D8224962 Pulled By: ajkr fbshipit-source-id: 3caf463509a935bff46530f27232a85ae7e4e484 | 01 June 2018, 00:30:50 UTC |
82089d5 | Siying Dong | 31 May 2018, 19:53:43 UTC | DBImpl::FindObsoleteFiles() not to call GetChildren() on the same path Summary: DBImpl::FindObsoleteFiles() may call GetChildren() multiple times if different CFs are on the same path. Fix it. Closes https://github.com/facebook/rocksdb/pull/3885 Differential Revision: D8084634 Pulled By: siying fbshipit-source-id: b471fbc251f6a05e9243304dc14c0831060cc0b0 | 31 May 2018, 19:58:33 UTC |
a35451e | maoyouxiang | 31 May 2018, 18:00:23 UTC | fix deadlock with enable_pipelined_write=true and max_successive_merges > 0 Summary: fix this https://github.com/facebook/rocksdb/issues/3916 Closes https://github.com/facebook/rocksdb/pull/3923 Differential Revision: D8215192 Pulled By: yiwu-arbug fbshipit-source-id: a4c2f839a91d92dc70906d2b7c6de0fe014a2422 | 31 May 2018, 18:13:14 UTC |
aaac6cd | Manuel Ung | 31 May 2018, 17:42:44 UTC | Add write unprepared classes by inheriting from write prepared Summary: Closes https://github.com/facebook/rocksdb/pull/3907 Differential Revision: D8218325 Pulled By: lth fbshipit-source-id: ff32d8dab4a159cd2762876cba4b15e3dc51ff3b | 31 May 2018, 17:47:42 UTC |
727eb88 | Jacquin Mininger | 31 May 2018, 00:45:02 UTC | Compile error in db bench tool Summary: Small format error below causes build to fail. I believe that this : ``` fprintf(stderr, "num reads to do %lu\n", reads_); ``` Can be changed to this: ``` fprintf(stderr, "num reads to do %" PRIu64 "\n", reads_); ``` Successful build ``` CC utilities/blob_db/blob_dump_tool.o AR librocksdb_debug.a ar: creating archive librocksdb_debug.a /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: librocksdb_debug.a(rocks_lua_compaction_filter.o) has no symbols CC tools/db_bench.o CC tools/db_bench_tool.o tools/db_bench_tool.cc:4532:46: error: format specifies type 'unsigned long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat] fprintf(stderr, "num reads to do %lu\n", reads_); ~~~ ^~~~~~ %lld 1 error generated. make: *** [tools/db_bench_tool.o] Error 1 ``` ``` $ cd rocksdb $ make all $ g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 9.1.0 (clang-902.0.39.1) Target: x86_64-apple-darwin17.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin ``` Closes https://github.com/facebook/rocksdb/pull/3909 Differential Revision: D8215710 Pulled By: siying fbshipit-source-id: 15e49fb02a818fec846e9f9b2a50e372b6b67751 | 31 May 2018, 01:01:36 UTC |
4dd80de | Siying Dong | 30 May 2018, 22:33:19 UTC | Remove tests from ROCKSDB_VALGRIND_RUN Summary: In order to make valgrind check test to pass in a day, remove some tests that run prohibitively slow under valgrind. Closes https://github.com/facebook/rocksdb/pull/3924 Differential Revision: D8210184 Pulled By: siying fbshipit-source-id: 5b06fb08f3cf57571d422d05a0dbddc9f9376f7a | 30 May 2018, 23:15:16 UTC |
a736255 | Anand Ananthabhotla | 29 May 2018, 22:42:14 UTC | Delete triggered compaction for universal style Summary: This is still WIP, but I'm hoping for early feedback on the overall approach. This patch implements deletion triggered compaction, which till now only worked for leveled, for universal style. SST files are marked for compaction by the CompactOnDeletionCollertor table property. This is expected to be used when free disk space is low and the user wants to reclaim space by deleting a bunch of keys. The deletions are expected to be dense. In such a situation, we want to avoid a full compaction due to its space overhead. The strategy used in this case is similar to leveled. We pick one file from the set of files marked for compaction. We then expand the inputs to a clean cut on the same level, and then pick overlapping files from the next non-mepty level. Picking files from the next level can cause the key range to expand, and we opportunistically expand inputs in the source level to include files wholly in this key range. The main side effect of this is that it breaks the property of no time range overlap between levels. This shouldn't break any functionality. Closes https://github.com/facebook/rocksdb/pull/3860 Differential Revision: D8124397 Pulled By: anand1976 fbshipit-source-id: bfa2a9dd6817930e991b35d3a8e7e61304ed3dcf | 29 May 2018, 22:44:34 UTC |
724855c | Yi Wu | 29 May 2018, 22:11:22 UTC | Fix LRUCache missing null check on destruct Summary: Fix LRUCache missing null check on destruct. The check is needed if LRUCache::DisownData is called. Closes https://github.com/facebook/rocksdb/pull/3920 Differential Revision: D8191631 Pulled By: yiwu-arbug fbshipit-source-id: d5014f6e49b51692c18a25fb55ece935f5a023c4 | 29 May 2018, 22:13:09 UTC |
cf826de | Yanqin Jin | 29 May 2018, 19:26:57 UTC | Fix compilation error when OPT="-DROCKSDB_LITE". Summary: Closes https://github.com/facebook/rocksdb/pull/3917 Differential Revision: D8187733 Pulled By: riversand963 fbshipit-source-id: e4aa179cd0791ca77167e357f99de9afd4aef910 | 29 May 2018, 19:28:59 UTC |
03cda53 | Maysam Yabandeh | 29 May 2018, 19:09:01 UTC | Check for rep_->table_properties being nullptr Summary: The very old sst formats do not have table_properties and rep_->table_properties is thus nullptr. The recent patch in https://github.com/facebook/rocksdb/pull/3894 does not check for nullptr and hence makes it backward incompatible. This patch adds the check. Closes https://github.com/facebook/rocksdb/pull/3918 Differential Revision: D8188638 Pulled By: maysamyabandeh fbshipit-source-id: b1d986665ecf0b4d1c442adfa8a193b97707d47b | 29 May 2018, 19:13:55 UTC |
1c1bafa | 奏之章 | 28 May 2018, 18:13:17 UTC | Fix VersionStorageInfo::EstimateLiveDataSize seg fault Summary: `HandleEstimateLiveDataSize`'s `need_out_of_mutex` is true https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/internal_stats.cc#L412-L413 so , is will ref a `SuperVersion` https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/db_impl.cc#L1896-L1908 so , the param `version` of `InternalStats::HandleEstimateLiveDataSize` is safe , but `cfd_->current()` is not safe ! https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/internal_stats.cc#L790-L795 the `cfd_->current()` maybe invalid ... here's mongo-rocks crash backtrace ``` mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x41) [0x7fe3a3137c51] mongod(+0x2152E89) [0x7fe3a3136e89] mongod(+0x21534F6) [0x7fe3a31374f6] libpthread.so.0(+0xF5E0) [0x7fe39f5e45e0] mongod(rocksdb::InternalKeyComparator::Compare(rocksdb::Slice const&, rocksdb::Slice const&) const+0x17) [0x7fe3a22375a7] mongod(rocksdb::VersionStorageInfo::EstimateLiveDataSize() const+0x3AA) [0x7fe3a228daba] mongod(rocksdb::InternalStats::HandleEstimateLiveDataSize(unsigned long*, rocksdb::DBImpl*, rocksdb::Version*)+0x20) [0x7fe3a2250d70] mongod(rocksdb::DBImpl::GetIntPropertyInternal(rocksdb::ColumnFamilyData*, rocksdb::DBPropertyInfo const&, bool, unsigned long*)+0xEF) [0x7fe3a21e3dbf] ``` Closes https://github.com/facebook/rocksdb/pull/3912 Differential Revision: D8179944 Pulled By: yiwu-arbug fbshipit-source-id: 26f314a8f98f4c2dc4348745d759f26f0e8d95e1 | 28 May 2018, 18:27:08 UTC |
402b7aa | Maysam Yabandeh | 26 May 2018, 01:41:31 UTC | Exclude seq from index keys Summary: Index blocks have the same format as data blocks. The keys therefore similarly to the keys in the data blocks are internal keys, which means that in addition to the user key it also has 8 bytes that encodes sequence number and value type. This extra 8 bytes however is not necessary in index blocks since the index keys act as an separator between two data blocks. The only exception is when the last key of a block and the first key of the next block share the same user key, in which the sequence number is required to act as a separator. The patch excludes the sequence from index keys only if the above special case does not happen for any of the index keys. It then records that in the property block. The reader looks at the property block to see if it should expect sequence numbers in the keys of the index block.s Closes https://github.com/facebook/rocksdb/pull/3894 Differential Revision: D8118775 Pulled By: maysamyabandeh fbshipit-source-id: 915479f028b5799ca91671d67455ecdefbd873bd | 26 May 2018, 01:42:43 UTC |
8c3bf08 | Nathan VanBenschoten | 26 May 2018, 00:35:36 UTC | Check status when reading HashIndexPrefixesMetadataBlock Summary: This was missed in a refactor of `ReadBlockContents` (2f1a3a4). Closes https://github.com/facebook/rocksdb/pull/3906 Differential Revision: D8172648 Pulled By: ajkr fbshipit-source-id: 27e453b19795fea974bfed4721105be6f3a12090 | 26 May 2018, 00:42:51 UTC |
4543417 | Adam Retter | 25 May 2018, 22:06:47 UTC | Fix an issue with unnecessary capture in lambda expressions Summary: Closes https://github.com/facebook/rocksdb/issues/3900 Replaces https://github.com/facebook/rocksdb/pull/3901 I needed this to build v5.12.4 on Mac OS X (10.13.3). Closes https://github.com/facebook/rocksdb/pull/3904 Differential Revision: D8169357 Pulled By: sagar0 fbshipit-source-id: 85faac42168796e7def9250d0c221a9a03b84476 | 25 May 2018, 22:12:44 UTC |
aa53579 | Yanqin Jin | 25 May 2018, 18:45:12 UTC | Fix segfault caused by object premature destruction Summary: Please refer to earlier discussion in [issue 3609](https://github.com/facebook/rocksdb/issues/3609). There was also an alternative fix in [PR 3888](https://github.com/facebook/rocksdb/pull/3888), but the proposed solution requires complex change. To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed. Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised. Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion. Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler. Test plan ``` $ make -j16 $ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion ``` Expected behavior: All tests should pass. Closes https://github.com/facebook/rocksdb/pull/3898 Differential Revision: D8149421 Pulled By: riversand963 fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438 | 25 May 2018, 18:57:51 UTC |
6e08916 | 奏之章 | 25 May 2018, 17:47:56 UTC | Fix Fadvise on closed file when reads use mmap Summary: ```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called. Closes https://github.com/facebook/rocksdb/pull/2764 Differential Revision: D8152515 Pulled By: ajkr fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65 | 25 May 2018, 17:57:57 UTC |
070319f | QingpingWang | 25 May 2018, 05:26:09 UTC | add flush_before_backup parameter to c api rocksdb_backup_engine_create_new_backup Summary: Add flush_before_backup to rocksdb_backup_engine_create_new_backup. make c api able to control the flush before backup behavior. Closes https://github.com/facebook/rocksdb/pull/3897 Differential Revision: D8157676 Pulled By: ajkr fbshipit-source-id: 88998c62f89f087bf8672398fd7ddafabbada505 | 25 May 2018, 05:28:52 UTC |
bc7e8d4 | Yi Wu | 24 May 2018, 22:45:49 UTC | LRUCache midpoint insertion Summary: Implement midpoint insertion strategy where new blocks will be insert to the middle of LRU list, then move the head on the first hit in cache. Closes https://github.com/facebook/rocksdb/pull/3877 Differential Revision: D8100895 Pulled By: yiwu-arbug fbshipit-source-id: f4bd83cb8be469e5d02072cfc8bd66011391f3da | 24 May 2018, 22:57:33 UTC |
3db8504 | Dmitri Smirnov | 24 May 2018, 22:05:00 UTC | Catchup with posix features Summary: Catch up with Posix features NewWritableRWFile must fail when file does not exists Implement Env::Truncate() Adjust Env options optimization functions Implement MemoryMappedBuffer on Windows. Closes https://github.com/facebook/rocksdb/pull/3857 Differential Revision: D8053610 Pulled By: ajkr fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e | 24 May 2018, 22:13:04 UTC |
c465509 | Kefu Chai | 24 May 2018, 18:59:27 UTC | port_posix: use posix_memalign() for aligned_alloc Summary: to workaround issue of http://tracker.ceph.com/issues/21422 . and in tcmalloc aligned_alloc and posix_memalign() are basically the same thing. the same applies to GNU glibc. fixes #3175 Signed-off-by: Kefu Chai <tchaikov@gmail.com> Closes https://github.com/facebook/rocksdb/pull/3862 Differential Revision: D8147930 Pulled By: yiwu-arbug fbshipit-source-id: 355afe93c4dd0a96a0d711ef190e8b86fbe8d11d | 24 May 2018, 19:13:16 UTC |
7a99c04 | Yi Wu | 24 May 2018, 01:53:17 UTC | refactor constructor of LRUCacheShard Summary: Update LRUCacheShard constructor so that adding new params to it don't need to add extra SetXXX() methods. Closes https://github.com/facebook/rocksdb/pull/3896 Differential Revision: D8128618 Pulled By: yiwu-arbug fbshipit-source-id: 6afa715de1493a50de413678761a765e3af9b83b | 24 May 2018, 01:57:42 UTC |
01bcc34 | Andrew Kryczka | 24 May 2018, 01:33:00 UTC | Introduce library-independent default compression level Summary: Previously we were using -1 as the default for every library, which was legacy from our zlib options. That worked for a while, but after zstd introduced https://github.com/facebook/zstd/commit/a146ee04ae5866b948be0c1911418e0436d80cb4, it started giving poor compression ratios by default in zstd. This PR adds a constant to RocksDB public API, `CompressionOptions::kDefaultCompressionLevel`, which will get translated to the default value specific to the compression library being used in "util/compression.h". The constant uses a number that appears to be larger than any library's maximum compression level. Closes https://github.com/facebook/rocksdb/pull/3895 Differential Revision: D8125780 Pulled By: ajkr fbshipit-source-id: 2db157a89118cd4f94577c2f4a0a5ff31c8391c6 | 24 May 2018, 01:42:08 UTC |
4011012 | Yanqin Jin | 23 May 2018, 22:59:16 UTC | Specify the underlying type of enums. Summary: Explicitly specify the underlying type of enums help developers understand the physical storage. Closes https://github.com/facebook/rocksdb/pull/3892 Differential Revision: D8107027 Pulled By: riversand963 fbshipit-source-id: a00efecbba46df4a3c8eed0994a2d4972ad1a1d3 | 23 May 2018, 23:12:59 UTC |
6c73a46 | Zhongyi Xie | 22 May 2018, 20:51:21 UTC | Fix a backward compatibility problem with table_properties being nullptr Summary: Currently when ldb built from master tries to open a DB from version 2.2, there will be a segfault because table_properties didn't exist back then. Closes https://github.com/facebook/rocksdb/pull/3890 Differential Revision: D8100914 Pulled By: miasantreble fbshipit-source-id: b255e8aedc54695432be2e704839c857dabdd65a | 22 May 2018, 20:57:17 UTC |
4420cb4 | Jacquin Mininger | 22 May 2018, 20:34:26 UTC | Fix Issue #3771: Slice ctor checks for nullptr and creates empty string Summary: Fix Issue #3771 : Check for nullptr in Slice constructor Slice ctor checks for nullptr and creates empty string if the string does not exist Closes https://github.com/facebook/rocksdb/pull/3887 Differential Revision: D8098852 Pulled By: ajkr fbshipit-source-id: 04471077defa9776ce7b8c389a61312ce31002fb | 22 May 2018, 20:41:56 UTC |
7db721b | Andrew Kryczka | 22 May 2018, 18:58:51 UTC | Avoid sleep in DBTest.GroupCommitTest to fix flakiness Summary: DBTest.GroupCommitTest would often fail when run under valgrind because its sleeps were insufficient to guarantee a group commit had multiple entries. Instead we can use sync point to force a leader to wait until a non-leader thread has enqueued its work, thus guaranteeing a leader can do group commit work for multiple threads. Closes https://github.com/facebook/rocksdb/pull/3883 Differential Revision: D8079429 Pulled By: ajkr fbshipit-source-id: 61dc50fad29d2c85547842f681288de60fa29049 | 22 May 2018, 19:16:25 UTC |
fcb3101 | Andrew Kryczka | 22 May 2018, 17:50:09 UTC | Avoid single-deleting merge operands in db_stress Summary: I repro'd some of the "unexpected value" failures showing up in our CI lately and they always happened on keys that have a mix of single deletes and merge operands. The `SingleDelete()` API comment mentions it's incompatible with `Merge()`, so this PR prevents `db_stress` from mixing them. Closes https://github.com/facebook/rocksdb/pull/3878 Differential Revision: D8097346 Pulled By: ajkr fbshipit-source-id: 357a48c6a31156f4f8db3ce565638ad924c437a1 | 22 May 2018, 17:58:36 UTC |
3db1ada | Siying Dong | 21 May 2018, 23:37:47 UTC | PersistRocksDBOptions() to use WritableFileWriter Summary: By using WritableFileWriter rather than WritableFile directly, we can buffer multiple Append() calls to one write() file system call, which will be expensive to underlying Env without its own write buffering. Closes https://github.com/facebook/rocksdb/pull/3882 Differential Revision: D8080673 Pulled By: siying fbshipit-source-id: e0db900cb3c178166aa738f3985db65e3ae2cf1b | 21 May 2018, 23:42:22 UTC |
c3ebc75 | Zhongyi Xie | 21 May 2018, 21:33:55 UTC | Move prefix_extractor to MutableCFOptions Summary: Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users. This PR aims to make it possible to dynamically change bloom filter config. Closes https://github.com/facebook/rocksdb/pull/3601 Differential Revision: D7253114 Pulled By: miasantreble fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c | 21 May 2018, 21:43:11 UTC |
263ef52 | Yanqin Jin | 21 May 2018, 18:52:31 UTC | Update ColumnFamilyTest for multi-CF verification Summary: Change `keys_` from `set<string>` to `vector<set<string>>` so that each column family's keys are stored in one set. ajkr When you have a chance, can you PTAL? Thanks! Closes https://github.com/facebook/rocksdb/pull/3871 Differential Revision: D8056447 Pulled By: riversand963 fbshipit-source-id: 650d0f9cad02b1bc005fc329ad76edbf053e6386 | 21 May 2018, 18:57:42 UTC |
508a09f | Andrew Kryczka | 21 May 2018, 18:09:00 UTC | Print histogram count and sum in statistics string Summary: Previously it only printed percentiles, even though our histogram keeps track of count and sum (and more). There have been many times we want to know more than the percentiles. For example, we currently want sum of "rocksdb.compression.times.nanos" and sum of "rocksdb.decompression.times.nanos", which would allow us to know the relative cost of compression vs decompression. This PR adds count and sum to the string printed by `StatisticsImpl::ToString`. This is a bit risky as there are definitely parsers assuming the old format. I will mention it in HISTORY.md and hope for the best... Closes https://github.com/facebook/rocksdb/pull/3863 Differential Revision: D8038831 Pulled By: ajkr fbshipit-source-id: 0465b72e4b0cbf18ef965f4efe402601d16d5b5c | 21 May 2018, 18:12:47 UTC |
7b65521 | Andrew Kryczka | 21 May 2018, 16:42:49 UTC | Assert keys/values pinned by range deletion meta-block iterators Summary: `RangeDelAggregator` holds the pointers returned by `BlockIter::key()` and `BlockIter::value()` so requires the data to which they point is pinned. `BlockIter::key()` points into block memory and is guaranteed to be pinned if and only if prefix encoding is disabled (or, equivalently, restart interval is set to one). I think `BlockIter::value()` is always pinned. Added an assert for these and removed the wrong TODO about increasing restart interval, which would enable key prefix encoding and break the assertion. Closes https://github.com/facebook/rocksdb/pull/3875 Differential Revision: D8063667 Pulled By: ajkr fbshipit-source-id: 60b5ebcc0cdd610dd6aad9e74a23378793672c41 | 21 May 2018, 16:57:00 UTC |
e410501 | Andrew Kryczka | 21 May 2018, 16:35:46 UTC | Add missing test files to src.mk Summary: We only generate the header dependency (".cc.d") files for files mentioned in "src.mk". When we don't generate them, changes to header dependencies do not cause `make` to recompile the dependent ".o". Then it takes a while for developers (or maybe just me) to realize `make clean` is necessary. Closes https://github.com/facebook/rocksdb/pull/3876 Differential Revision: D8065389 Pulled By: ajkr fbshipit-source-id: 0f62eee7bcab15b0215791564e6ab3775d46996b | 21 May 2018, 16:43:29 UTC |
ed4d339 | Zhongyi Xie | 19 May 2018, 04:44:07 UTC | fix a division by zero bug Summary: fixes the failing clang_analyze contrun test Closes https://github.com/facebook/rocksdb/pull/3872 Differential Revision: D8059241 Pulled By: miasantreble fbshipit-source-id: e8fc1838004fe16a823456188386b8b39429803b | 19 May 2018, 04:57:24 UTC |
26da367 | Siying Dong | 18 May 2018, 19:54:09 UTC | class Block to store num_restarts_ Summary: Right now, every Block::NewIterator() reads num_restarts_ from the block, which is already read in Block::Block(). This sometimes cause a CPU cache miss. Although fetching this cacheline can usually benefit follow-up block restart offset reading, as they are close to each other, it's almost free to get ride of this read by storing it in the Block class. Closes https://github.com/facebook/rocksdb/pull/3869 Differential Revision: D8052493 Pulled By: siying fbshipit-source-id: 9c72360f0c2d7329f3c198ce4eaedd2bc14b87c1 | 18 May 2018, 19:56:55 UTC |
a0c7b4d | Yanqin Jin | 18 May 2018, 14:58:59 UTC | Set the default value of max_manifest_file_size. Summary: In the past, the default value of max_manifest_file_size is uint64_t::MAX, allowing a long running RocksDB process to grow its MANIFEST file to take up the entire disk, as reported in [issue 3851](https://github.com/facebook/rocksdb/issues/3851). It is reasonable and common to provide a default non-max value for this option. Therefore, I set the value to 1GB. siying miasantreble Please let me know whether this looks good to you. Thanks! Closes https://github.com/facebook/rocksdb/pull/3867 Differential Revision: D8051524 Pulled By: riversand963 fbshipit-source-id: 50251f0804b1fa933a19a30d19d261ea8b9d2b72 | 18 May 2018, 15:11:55 UTC |
17af09f | Siying Dong | 18 May 2018, 01:24:20 UTC | Implement key shortening functions in ReverseBytewiseComparator Summary: Right now ReverseBytewiseComparator::FindShortestSeparator() doesn't really shorten key, and ReverseBytewiseComparator::FindShortestSuccessor() seems to return wrong results. The code is confusing too as it uses BytewiseComparatorImpl::FindShortestSeparator() but the function actually won't do anything if the the first key is larger than the second. Implement ReverseBytewiseComparator::FindShortestSeparator() and override ReverseBytewiseComparator::FindShortestSuccessor() to be empty. Closes https://github.com/facebook/rocksdb/pull/3836 Differential Revision: D7959762 Pulled By: siying fbshipit-source-id: 93acb621c16ce6f23e087ae4e19f7d84d1254683 | 18 May 2018, 01:27:16 UTC |
1d7ca20 | Zhongyi Xie | 18 May 2018, 00:51:46 UTC | add override to virtual functions Summary: this will fix the failing clang_check test Closes https://github.com/facebook/rocksdb/pull/3868 Differential Revision: D8050880 Pulled By: miasantreble fbshipit-source-id: 749932e2e4025f835c961c068d601e522a126da6 | 18 May 2018, 00:57:48 UTC |
aed7abb | Xin Tong | 18 May 2018, 00:49:06 UTC | Reorder field based on esan data Summary: Running. TEST_TMPDIR=/dev/shm ./buck-out/gen/rocks/tools/rocks_db_bench --benchmarks=readwhilewriting --num=5000000 -benchmark_write_rate_limit=2000000 --threads=32 Collected esan data and reorder field. Accesses to 4th and 6th fields take majority of the access. Group them. Overall, this struct takes 10%+ of the total accesses in the program. (637773011/6107964986) ==2433831== class rocksdb::InlineSkipList ==2433831== size = 48, count = 637773011, ratio = 112412, array access = 0 ==2433831== # 0: offset = 0, size = 2, count = 455137, type = i16 ==2433831== # 1: offset = 2, size = 2, count = 6, type = i16 ==2433831== # 2: offset = 4, size = 4, count = 182303, type = i32 ==2433831== # 3: offset = 8, size = 8, count = 263953900, type = %"class.rocksdb::MemTableRep::KeyComparator"* ==2433831== # 4: offset = 16, size = 8, count = 136409, type = %"class.rocksdb::Allocator"* ==2433831== # 5: offset = 24, size = 8, count = 366628820, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Node"* ==2433831== # 6: offset = 32, size = 4, count = 6280031, type = %"struct.std::atomic" = type { %"struct.std::__atomic_base" } ==2433831== # 7: offset = 40, size = 8, count = 136405, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Splice"* ==2433831==EfficiencySanitizer: total struct field access count = 6107964986 Before re-ordering [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting without-ro.log readwhilewriting : 0.036 micros/op 27545605 ops/sec; 26.8 MB/s (45954 of 5000000 found) readwhilewriting : 0.036 micros/op 28024240 ops/sec; 27.2 MB/s (43158 of 5000000 found) readwhilewriting : 0.037 micros/op 27345145 ops/sec; 27.1 MB/s (46725 of 5000000 found) readwhilewriting : 0.037 micros/op 27072588 ops/sec; 27.3 MB/s (42605 of 5000000 found) readwhilewriting : 0.034 micros/op 29578781 ops/sec; 28.3 MB/s (44294 of 5000000 found) readwhilewriting : 0.035 micros/op 28528304 ops/sec; 27.7 MB/s (44176 of 5000000 found) readwhilewriting : 0.037 micros/op 27075497 ops/sec; 26.5 MB/s (43763 of 5000000 found) readwhilewriting : 0.036 micros/op 28024117 ops/sec; 27.1 MB/s (40622 of 5000000 found) readwhilewriting : 0.037 micros/op 27078709 ops/sec; 27.6 MB/s (47774 of 5000000 found) readwhilewriting : 0.034 micros/op 29020689 ops/sec; 28.1 MB/s (45066 of 5000000 found) AVERAGE()=27.37 MB/s After re-ordering [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting ro.log readwhilewriting : 0.036 micros/op 27542409 ops/sec; 27.7 MB/s (46163 of 5000000 found) readwhilewriting : 0.036 micros/op 28021148 ops/sec; 28.2 MB/s (46155 of 5000000 found) readwhilewriting : 0.036 micros/op 28021035 ops/sec; 27.3 MB/s (44039 of 5000000 found) readwhilewriting : 0.036 micros/op 27538659 ops/sec; 27.5 MB/s (46781 of 5000000 found) readwhilewriting : 0.036 micros/op 28028604 ops/sec; 27.6 MB/s (44689 of 5000000 found) readwhilewriting : 0.036 micros/op 27541452 ops/sec; 27.3 MB/s (43156 of 5000000 found) readwhilewriting : 0.034 micros/op 29041338 ops/sec; 28.8 MB/s (44895 of 5000000 found) readwhilewriting : 0.036 micros/op 27784974 ops/sec; 26.3 MB/s (39963 of 5000000 found) readwhilewriting : 0.036 micros/op 27538892 ops/sec; 28.1 MB/s (46570 of 5000000 found) readwhilewriting : 0.038 micros/op 26622473 ops/sec; 27.0 MB/s (43236 of 5000000 found) AVERAGE()=27.58 MB/s Closes https://github.com/facebook/rocksdb/pull/3855 Reviewed By: siying Differential Revision: D8048781 Pulled By: trentxintong fbshipit-source-id: bc9807a9845e2a92cb171ce1ecb5a2c8a51f1481 | 18 May 2018, 00:57:48 UTC |
fa43948 | Fosco Marotto | 17 May 2018, 21:20:54 UTC | Update HISTORY and version for upcoming 5.14 Summary: Closes https://github.com/facebook/rocksdb/pull/3866 Differential Revision: D8043563 Pulled By: gfosco fbshipit-source-id: da4af20e604534602ac0e07943135513fd9a9f53 | 17 May 2018, 21:27:17 UTC |
7ccb35f | Siying Dong | 17 May 2018, 19:46:23 UTC | In instrumented mutex, take timing once for both of perf_context and statistics Summary: Closes https://github.com/facebook/rocksdb/pull/3427 Differential Revision: D6827236 Pulled By: siying fbshipit-source-id: d8a2cc525c90df625510565669f2659014259a8a | 17 May 2018, 19:56:53 UTC |
8bf555f | Mike Kolupaev | 17 May 2018, 09:44:14 UTC | Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs Summary: Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are: * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted. * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not. However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours). This PR changes the convention to: * If status() is not ok, Valid() always returns false. * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.) This does sacrifice the two use cases listed above, but siying said it's ok. Overview of the changes: * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario. * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed. * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator. To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types: Iterators that didn't need changes: * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator. * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator. Iterators with changes (see inline comments for details): * DBIter - an overhaul: - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back. - It had a few code paths silently discarding subiterator's status. The stress test caught a few. - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example. - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption. - It used to not reset status on seek for some types of errors. - Some simplifications and better comments. - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer. * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status. * ForwardIterator - changed to the new convention, also slightly simplified. * ForwardLevelIterator - fixed some bugs and simplified. * LevelIterator - simplified. * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_. * BlockBasedTableIterator - minor changes. * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid. * PlainTableIterator - some seeks used to not reset status. * CuckooTableIterator - tiny code cleanup. * ManagedIterator - fixed some bugs. * BaseDeltaIterator - changed to the new convention and fixed a bug. * BlobDBIterator - seeks used to not reset status. * KeyConvertingIterator - some small change. Closes https://github.com/facebook/rocksdb/pull/3810 Differential Revision: D7888019 Pulled By: al13n321 fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d | 17 May 2018, 09:56:56 UTC |
46fde6b | Maysam Yabandeh | 16 May 2018, 19:51:31 UTC | Fix race condition between log_.erase and log_.back Summary: log_ contract specifies that it should not be modified unless both mutex_ and log_write_mutex_ are held. log_.erase however does that with only holding mutex_. This causes a race condition with two_write_queues since logs_.back is read with holding only log_write_mutex_ (which is correct according to logs_ contract) but logs_.erase is called concurrently. This is probably the cause of logs_.back returning nullptr in https://github.com/facebook/rocksdb/issues/3852 although I could not reproduce it. Fixes https://github.com/facebook/rocksdb/issues/3852 Closes https://github.com/facebook/rocksdb/pull/3859 Differential Revision: D8026103 Pulled By: maysamyabandeh fbshipit-source-id: ee394e00fe4aa520d884c5ef87981e9d6b5ccb28 | 16 May 2018, 20:01:33 UTC |
42cb477 | acelyc111 | 15 May 2018, 06:54:06 UTC | Fix geo_db may seek an error key when they have the same quadkey Summary: Closes https://github.com/facebook/rocksdb/pull/3832 Differential Revision: D7994326 Pulled By: miasantreble fbshipit-source-id: 84a81b35b97750360423a9d4eca5b5a14d002134 | 15 May 2018, 06:57:15 UTC |
12ad711 | Maysam Yabandeh | 15 May 2018, 03:56:44 UTC | Suppress tsan lock-order-inversion on FlushWAL Summary: TSAN reports a false alarm for lock-order-inversion in DBWriteTest.IOErrorOnWALWritePropagateToWriteThreadFollower but Open and FlushWAL are not run concurrently. Suppressing the error by skipping FlushWAL in the test until TSAN is fixed. The alternative would be to use ``` TSAN_OPTIONS="suppressions=tsan-suppressions.txt" ./db_write_test ``` but it does not seem straightforward to integrate it to our test infra. Closes https://github.com/facebook/rocksdb/pull/3854 Differential Revision: D8000202 Pulled By: maysamyabandeh fbshipit-source-id: fde33483d963a7ad84d3145123821f64960a4802 | 15 May 2018, 04:13:35 UTC |
3d7dc75 | Andrew Kryczka | 14 May 2018, 21:44:04 UTC | Bottommost level-based compactions in bottom-pri pool Summary: This feature was introduced for universal compaction in cc01985d. At that point we thought it'd be used only to prevent long-running universal full compactions from blocking short-lived upper-level compactions. Now we have a level compaction user who could benefit from it since they use more expensive compression algorithm in the bottom level. So enable it for level. Closes https://github.com/facebook/rocksdb/pull/3835 Differential Revision: D7957179 Pulled By: ajkr fbshipit-source-id: 177285d2cef3b650b6a4d81dc5db84bc441c9fe4 | 14 May 2018, 21:57:15 UTC |
ebb823f | Sagar Vemuri | 14 May 2018, 18:01:41 UTC | Fix db_stress build on mac Summary: I noticed, while debugging an unrelated issue, that db_stress is failing to build on mac, leading to a failed `make all`. ``` $ make db_stress -j4 ... tools/db_stress.cc:862:69: error: cannot initialize a parameter of type 'uint64_t *' (aka 'unsigned long long *') with an rvalue of type 'size_t *' (aka 'unsigned long *') status = FLAGS_env->GetFileSize(FLAGS_expected_values_path, &size); ^~~~~ ./include/rocksdb/env.h:277:66: note: passing argument to parameter 'file_size' here virtual Status GetFileSize(const std::string& fname, uint64_t* file_size) = 0; ^ 1 error generated. make: *** [tools/db_stress.o] Error 1 make: *** Waiting for unfinished jobs.... ``` Closes https://github.com/facebook/rocksdb/pull/3839 Differential Revision: D7979236 Pulled By: sagar0 fbshipit-source-id: 0615e7bb5405bade71e4203803bf723720422d62 | 14 May 2018, 18:14:07 UTC |
718c1c9 | Maysam Yabandeh | 14 May 2018, 17:53:32 UTC | Pass manual_wal_flush also to the first wal file Summary: Currently manual_wal_flush if set in the options will be used only for the wal files created during wal switch. The configuration thus does not affect the first wal file. The patch fixes that and also update the related unit tests. This PR is built on top of https://github.com/facebook/rocksdb/pull/3756 Closes https://github.com/facebook/rocksdb/pull/3824 Differential Revision: D7909153 Pulled By: maysamyabandeh fbshipit-source-id: 024ed99d2555db06bf096c902b998e432bb7b9ce | 14 May 2018, 17:57:56 UTC |
66c7aa3 | Maysam Yabandeh | 11 May 2018, 22:08:16 UTC | Clarify the ownership of root db after TransactionDB::Open Summary: The patch clarifies the ownership of the root db after TransactionDB::Open. If it is a success the ownership if with the TransactionDB, and the root db will be deleted when the destructor of the base class, StackableDB, is called. If it is failure, the temporarily created root db will also be deleted properly. The patch also includes lots of useful formatting changes. Closes https://github.com/facebook/rocksdb/pull/3714 upon which this patch is built. Closes https://github.com/facebook/rocksdb/pull/3806 Differential Revision: D7878010 Pulled By: maysamyabandeh fbshipit-source-id: f54f3942e29434143ae5a2423ceec9c7072cd4c2 | 11 May 2018, 22:14:03 UTC |
3272bc0 | Sergey Elin | 11 May 2018, 18:14:20 UTC | Fix formatting in log message Summary: Add missing space. Closes https://github.com/facebook/rocksdb/pull/3826 Differential Revision: D7956059 Pulled By: miasantreble fbshipit-source-id: 3aeba76385f8726399a3086c46de710636a31191 | 11 May 2018, 18:28:54 UTC |
072ae67 | Andrew Kryczka | 10 May 2018, 02:26:43 UTC | Apply use_direct_io_for_flush_and_compaction to writes only Summary: Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used. This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously. Closes https://github.com/facebook/rocksdb/pull/3829 Differential Revision: D7915443 Pulled By: ajkr fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279 | 10 May 2018, 02:42:58 UTC |
d19f568 | Andrew Kryczka | 09 May 2018, 20:32:03 UTC | Refactor argument handling in db_crashtest.py Summary: - Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option. - Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default - Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params. - Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication Closes https://github.com/facebook/rocksdb/pull/3809 Differential Revision: D7885779 Pulled By: ajkr fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8 | 09 May 2018, 20:42:41 UTC |
3690276 | Siying Dong | 09 May 2018, 17:24:11 UTC | Disallow to open RandomRW file if the file doesn't exist Summary: The only use of RandomRW is to change seqno when bulkloading, and in this use case, the file should exist. We should fail the file opening in this case. Closes https://github.com/facebook/rocksdb/pull/3827 Differential Revision: D7913719 Pulled By: siying fbshipit-source-id: 62cf6734f1a6acb9e14f715b927da388131c3492 | 09 May 2018, 17:27:26 UTC |
ddfd252 | Siying Dong | 09 May 2018, 17:18:05 UTC | Make BlockIter final Summary: Now BlockBasedTableIterator directly uses BlockIter. By making BlockIter final, we can prevent unintended virtual function overriding. Closes https://github.com/facebook/rocksdb/pull/3828 Differential Revision: D7933816 Pulled By: siying fbshipit-source-id: 026a08cb5c5b6d3d6f44743152b4251da4756f2c | 09 May 2018, 17:27:26 UTC |
f92cd2f | Dmitri Smirnov | 09 May 2018, 17:04:55 UTC | Introduce and use the option to disable stall notifications structures Summary: and code. Removing this helps with insert performance. Closes https://github.com/facebook/rocksdb/pull/3830 Differential Revision: D7921030 Pulled By: siying fbshipit-source-id: 84e80d50a7ef96f5441c51c9a0d089c50217cce2 | 09 May 2018, 17:13:53 UTC |
cee138c | Huachao Huang | 08 May 2018, 19:06:01 UTC | Add missing options in BuildColumnfamilyOptions Summary: soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit are added to BuildColumnfamilyOptions. Closes https://github.com/facebook/rocksdb/pull/3823 Differential Revision: D7909246 Pulled By: maysamyabandeh fbshipit-source-id: 89032efbf6b5bd302ea50cbd7a234977984a1fca | 08 May 2018, 19:13:18 UTC |
4bf169f | Andrew Kryczka | 08 May 2018, 19:03:29 UTC | Disable readahead when using mmap for reads Summary: `ReadaheadRandomAccessFile` had an unwritten assumption, which was that its wrapped file's `Read()` function always copies into the provided scratch buffer. Actually this was not true when the wrapped file was `PosixMmapReadableFile`, whose `Read()` implementation does no copying and instead returns a `Slice` pointing directly into the `mmap`'d memory region. This PR: - prevents `ReadaheadRandomAccessFile` from ever wrapping mmap readable files - adds an assert for the assumption `ReadaheadRandomAccessFile` makes about the wrapped file's use of scratch buffer Closes https://github.com/facebook/rocksdb/pull/3813 Differential Revision: D7891513 Pulled By: ajkr fbshipit-source-id: dc64a55222d6af280c39a1852ee39e9e9d7cde7d | 08 May 2018, 19:13:18 UTC |
1d9f24d | Tongliang Liao | 07 May 2018, 21:21:18 UTC | Link jemalloc Summary: Fix undefined reference to `malloc_*` linking errors on Linux. Closes https://github.com/facebook/rocksdb/pull/3817 Differential Revision: D7899066 Pulled By: ajkr fbshipit-source-id: 18c46569a59608388d6240f1b8ec20c2d2557dec | 07 May 2018, 21:28:36 UTC |
9470ee4 | Tongliang Liao | 07 May 2018, 21:17:49 UTC | Allows other cmake-specific "true" for USE_RTTI. Summary: People also use ON/OFF, TRUE/FALSE and other switch options that is allowed by cmake. Closes https://github.com/facebook/rocksdb/pull/3814 Differential Revision: D7899032 Pulled By: ajkr fbshipit-source-id: b71511af59e0a78eedafb639b5002c47050bf3c2 | 07 May 2018, 21:28:36 UTC |
6d6e01c | Tongliang Liao | 07 May 2018, 21:15:29 UTC | Search paths provided by intel's "tbbvars.sh". Summary: TBBROOT and LIBRARY_PATH are set in env by the script. With TBB 2018 the library path is $TBBROOT/lib/intel64/gcc4.7 for anything above gcc 4.7, which is both compiler and architecture related. We cannot simply do ${TBB_ROOT_DIR}/lib. Closes https://github.com/facebook/rocksdb/pull/3815 Differential Revision: D7899006 Pulled By: ajkr fbshipit-source-id: 159ab1f6a5c40452ed6aa8d79300206953d916c2 | 07 May 2018, 21:28:36 UTC |
d72a51e | Maysam Yabandeh | 07 May 2018, 19:15:54 UTC | Split FaultInjectionTest.FaultTest to avoid timeout Summary: tsan flavor of this test occasionally times out in our test infra. The patch split the test to two, each working on half of the option range. Before: [ OK ] FaultTest/FaultInjectionTest.FaultTest/0 (5918 ms) [ OK ] FaultTest/FaultInjectionTest.FaultTest/1 (5336 ms) After: [ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/0 (2930 ms) [ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/1 (2676 ms) [ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/2 (2759 ms) [ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/3 (2546 ms) Closes https://github.com/facebook/rocksdb/pull/3819 Differential Revision: D7894975 Pulled By: maysamyabandeh fbshipit-source-id: 809f1411cbcc27f8aa71a6b29a16b039f51b67c9 | 07 May 2018, 19:29:58 UTC |
72942ad | LingBin | 04 May 2018, 23:37:39 UTC | Recommit "Avoid adding tombstones of the same file to RangeDelAggregator multiple times" Summary: The origin commit #3635 will hurt performance for users who aren't using range deletions, because unneeded std::set operations, so it was reverted by commit 44653c7b7aabe821e671946e732dda7ae6b43d1b. (see #3672) To fix this, move the set to and add a check in , i.e., file will be added only if is non-nullptr. The db_bench command which find the performance regression: > ./db_bench --benchmarks=fillrandom,seekrandomwhilewriting --threads=1 --num=1000000 --reads=150000 --key_size=66 > --value_size=1262 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=1 --stats_per_interval=1 > --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --seed=1522388277 > -write_buffer_size=1048576 --level0_file_num_compaction_trigger=10000 --compression_type=none Before and after the modification, I re-run this command on the machine, the results of are as follows: **fillrandom** Table | P50 | P75 | P99 | P99.9 | P99.99 | ---- | --- | --- | --- | ----- | ------ | before commit | 5.92 | 8.57 | 19.63 | 980.97 | 12196.00 | after commit | 5.91 | 8.55 | 19.34 | 965.56 | 13513.56 | **seekrandomwhilewriting** Table | P50 | P75 | P99 | P99.9 | P99.99 | ---- | --- | --- | --- | ----- | ------ | before commit | 1418.62 | 1867.01 | 3823.28 | 4980.99 | 9240.00 | after commit | 1450.54 | 1880.61 | 3962.87 | 5429.60 | 7542.86 | Closes https://github.com/facebook/rocksdb/pull/3800 Differential Revision: D7874245 Pulled By: ajkr fbshipit-source-id: 2e8bec781b3f7399246babd66395c88619534a17 | 04 May 2018, 23:45:15 UTC |
4c5a323 | Andrew Kryczka | 04 May 2018, 23:33:38 UTC | Fix db_stress memory leak ASAN error Summary: In case `--expected_values_path` is unset, we allocate a buffer internally to hold the expected DB state. This PR makes sure it is freed. Closes https://github.com/facebook/rocksdb/pull/3804 Differential Revision: D7874694 Pulled By: ajkr fbshipit-source-id: a8f7655e009507c4e639ceebfc3525d69c856e3b | 04 May 2018, 23:45:15 UTC |
fc522bd | Maysam Yabandeh | 04 May 2018, 22:14:54 UTC | Evenly split HarnessTest.Randomized Summary: Currently HarnessTest.Randomized is already split but some of the splits are faster than the others. The reason is that each split takes a continuous range of the generated args and the test with later args takes longer to finish. The patch evenly split the args among splits in a round robin fashion. Before: ``` [ OK ] HarnessTest.Randomized1n2 (2278 ms) [ OK ] HarnessTest.Randomized3n4 (1095 ms) [ OK ] HarnessTest.Randomized5 (658 ms) [ OK ] HarnessTest.Randomized6 (1258 ms) [ OK ] HarnessTest.Randomized7 (6476 ms) [ OK ] HarnessTest.Randomized8 (8182 ms) ``` After ``` [ OK ] HarnessTest.Randomized1 (2649 ms) [ OK ] HarnessTest.Randomized2 (2645 ms) [ OK ] HarnessTest.Randomized3 (2577 ms) [ OK ] HarnessTest.Randomized4 (2490 ms) [ OK ] HarnessTest.Randomized5 (2553 ms) [ OK ] HarnessTest.Randomized6 (2560 ms) [ OK ] HarnessTest.Randomized7 (2501 ms) [ OK ] HarnessTest.Randomized8 (2574 ms) ``` Closes https://github.com/facebook/rocksdb/pull/3808 Differential Revision: D7882663 Pulled By: maysamyabandeh fbshipit-source-id: 09b749a9684b6d7d65466aa4b00c5334a49e833e | 04 May 2018, 22:28:06 UTC |
171f415 | Maysam Yabandeh | 04 May 2018, 22:14:20 UTC | Rename vars to satisfy unity built Summary: Tested by "make unity_test" Closes https://github.com/facebook/rocksdb/pull/3807 Differential Revision: D7882657 Pulled By: maysamyabandeh fbshipit-source-id: 84862c18d7f2fc762bd96ad070eaeb6936e45159 | 04 May 2018, 22:28:06 UTC |
4d40b10 | Fosco Marotto | 04 May 2018, 22:09:40 UTC | Add USE_RTTI and default behavior to CMakeLists Summary: Proposed fix for #3701 Closes https://github.com/facebook/rocksdb/pull/3801 Differential Revision: D7868264 Pulled By: gfosco fbshipit-source-id: 013963ed3d172c8dc2abd1dd5982580082ca5d2d | 04 May 2018, 22:13:03 UTC |
6fc1bcc | Andrew Kryczka | 04 May 2018, 20:40:58 UTC | Fix crash test allocation error under TSAN Summary: We were seeing the following error: "ThreadSanitizer: DenseSlabAllocator overflow. Dying." It is fixable by mmap'ing a smaller region for keys' expected values, which this PR achieves by reducing the number of keys. Closes https://github.com/facebook/rocksdb/pull/3803 Differential Revision: D7874478 Pulled By: ajkr fbshipit-source-id: 433939f5cb92410ab4777d540cb0cc2ee0fe6c2e | 04 May 2018, 20:44:04 UTC |
a703432 | Zhongyi Xie | 03 May 2018, 23:35:46 UTC | MaxFileSizeForLevel: adjust max_file_size for dynamic level compaction Summary: `MutableCFOptions::RefreshDerivedOptions` always assume base level is L1, which is not true when `level_compaction_dynamic_level_bytes=true` and Level based compaction is used. This PR fixes this by recomputing `max_file_size` at query time (in `MaxFileSizeForLevel`) Fixes https://github.com/facebook/rocksdb/issues/3229 In master: ``` Level Files Size(MB) -------------------- 0 14 846 1 0 0 2 0 0 3 0 0 4 0 0 5 15 366 6 11 481 Cumulative compaction: 3.83 GB write, 2.27 GB read ``` In branch: ``` Level Files Size(MB) -------------------- 0 9 544 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 445 935 Cumulative compaction: 2.91 GB write, 1.46 GB read ``` db_bench command used: ``` ./db_bench --benchmarks="fillrandom,deleterandom,fillrandom,levelstats,stats" --statistics -deletes=5000 -db=tmp -compression_type=none --num=20000 -value_size=100000 -level_compaction_dynamic_level_bytes=true -target_file_size_base=2097152 -target_file_size_multiplier=2 ``` Closes https://github.com/facebook/rocksdb/pull/3755 Differential Revision: D7721381 Pulled By: miasantreble fbshipit-source-id: 39afb8503190bac3b466adf9bbf2a9b3655789f8 | 03 May 2018, 23:42:13 UTC |
934f96d | Dmitri Smirnov | 03 May 2018, 23:07:41 UTC | Better destroydb Summary: Delete archive directory before WAL folder since archive may be contained as a subfolder. Also improve loop readability. Closes https://github.com/facebook/rocksdb/pull/3797 Differential Revision: D7866378 Pulled By: riversand963 fbshipit-source-id: 0c45d97677ce6fbefa3f8d602ef5e2a2a925e6f5 | 03 May 2018, 23:13:09 UTC |
a8d77ca | Maysam Yabandeh | 03 May 2018, 22:58:42 UTC | Speedup ManualCompactionTest.Test Summary: ManualCompactionTest.Test occasionally times out in tsan flavor of our test infra. The patch reduces the number of keys to make the test run faster. The change does not seem to negatively impact the coverage of the test. Closes https://github.com/facebook/rocksdb/pull/3802 Differential Revision: D7865596 Pulled By: maysamyabandeh fbshipit-source-id: b4f60e32c3ae1677e25506f71c766e33fa985785 | 03 May 2018, 23:13:09 UTC |
d595492 | Siying Dong | 03 May 2018, 22:35:11 UTC | Skip deleted WALs during recovery Summary: This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic. Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction) This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2. Closes https://github.com/facebook/rocksdb/pull/3765 Differential Revision: D7747618 Pulled By: siying fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729 | 03 May 2018, 22:43:09 UTC |
cfb8665 | Maysam Yabandeh | 03 May 2018, 01:09:55 UTC | WritePrepared Txn: enable rollback in stress test Summary: Rollback was disabled in stress test since there was a concurrency issue in WritePrepared rollback algorithm. The issue is fixed by caching the column family handles in WritePrepared to skip getting them from the db when needed for rollback. Tested by running transaction stress test under tsan. Closes https://github.com/facebook/rocksdb/pull/3785 Differential Revision: D7793727 Pulled By: maysamyabandeh fbshipit-source-id: d81ab6fda0e53186ca69944cfe0712ce4869451e | 03 May 2018, 01:13:05 UTC |
5bed8a0 | Maysam Yabandeh | 03 May 2018, 01:05:03 UTC | WritePrepared Txn: split SeqAdvanceConcurrentTest Summary: The tsan flavor of SeqAdvanceConcurrentTest times out in our test infra. The patch splits it into 10 tests. On my vm before: [ OK ] WritePreparedTransactionTest/WritePreparedTransactionTest.SeqAdvanceConcurrentTest/0 (5194 ms) after: [ OK ] OneWriteQueue/SeqAdvanceConcurrentTest.SeqAdvanceConcurrentTest/0 (1906 ms) Closes https://github.com/facebook/rocksdb/pull/3799 Differential Revision: D7854515 Pulled By: maysamyabandeh fbshipit-source-id: 4fbac42a1f974326cbc237f8cb9d6232d379c431 | 03 May 2018, 01:13:05 UTC |
6cab318 | Zhongyi Xie | 01 May 2018, 22:42:06 UTC | avoid double delete on dummy record insertion failure Summary: When the dummy record insertion fails, there is no need to explicitly delete the block as it will be registered for cleanup regardless. Closes https://github.com/facebook/rocksdb/pull/3688 Differential Revision: D7537741 Pulled By: miasantreble fbshipit-source-id: fcd3a3d3d382ee8e2c7ced0a4980e683d93a16d6 | 01 May 2018, 23:01:28 UTC |
acb61b7 | Dmitri Smirnov | 01 May 2018, 20:38:36 UTC | Adjust pread/pwrite to return Status Summary: Returning bytes_read causes the caller to call GetLastError() to report failure but the lasterror may be overwritten by then so we lose the error code. Fix up CMake file to include xpress source code only when needed. Fix warning for the uninitialized var. Closes https://github.com/facebook/rocksdb/pull/3795 Differential Revision: D7832935 Pulled By: anand1976 fbshipit-source-id: 4be21affb9b85d361b96244f4ef459f492b7cb2b | 01 May 2018, 20:42:46 UTC |
19fde54 | Andrew Kryczka | 01 May 2018, 20:18:01 UTC | initialize local variable for UBSAN in PosixEnv function Summary: this is a repeat commit of a8a28da2159648a2f72c35ea507371df8a97a2a9, which got reverted together with 6afe22db2e667799d8c903db61750d676bffe152, but forgotten about when that commit was un-reverted in 46152d53bf58748fc3ed0681d8970c342bcfc47a. Closes https://github.com/facebook/rocksdb/pull/3796 Differential Revision: D7826077 Pulled By: ajkr fbshipit-source-id: edb22375da56e2feda50c5b35f942f4d2d52b19c | 01 May 2018, 20:27:05 UTC |
46152d5 | Andrew Kryczka | 30 April 2018, 19:23:45 UTC | Second attempt at db_stress crash-recovery verification Summary: - Original commit: a4fb1f8c049ee9d61a9da8cf23b64d2c7d36a33f - Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db2e667799d8c903db61750d676bffe152 This PR includes the contents of the original commit plus two bug fixes, which are: - In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs. - Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported. Closes https://github.com/facebook/rocksdb/pull/3793 Differential Revision: D7811671 Pulled By: ajkr fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30 | 30 April 2018, 19:27:34 UTC |
282099f | Vincent Lee | 30 April 2018, 18:27:25 UTC | fix missing perfcontext destroy declare in C API Summary: `rocksdb_perfcontext_destroy` declare is missing in C API. Closes https://github.com/facebook/rocksdb/pull/3787 Differential Revision: D7816490 Pulled By: ajkr fbshipit-source-id: 3a488607bfc897c7ce846a1b3c2b7af693134d0d | 30 April 2018, 18:43:09 UTC |
c9ace1d | Victor Grishchenko | 27 April 2018, 23:46:38 UTC | expose WAL iterator in the C API Summary: A minor change: I wrapped TransactionLogIterator for the C API. I needed that for the golang binding. Closes https://github.com/facebook/rocksdb/pull/3304 Differential Revision: D6628736 Pulled By: miasantreble fbshipit-source-id: 3374f3c64b1d7b225696b8767090917761e2f30a | 27 April 2018, 23:56:59 UTC |
6afe22d | Andrew Kryczka | 27 April 2018, 19:54:16 UTC | revert db_stress crash-recovery verification Summary: crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures. Closes https://github.com/facebook/rocksdb/pull/3786 Differential Revision: D7794516 Pulled By: ajkr fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33 | 27 April 2018, 19:57:01 UTC |
459bb90 | Zhongyi Xie | 27 April 2018, 19:12:43 UTC | remove prefixscanrandom from db_bench help Summary: fix issue reported in https://github.com/facebook/rocksdb/issues/3757 Closes https://github.com/facebook/rocksdb/pull/3784 Differential Revision: D7794107 Pulled By: miasantreble fbshipit-source-id: 43535074fcb82adb5656bcb916284b2dfc5cbb64 | 27 April 2018, 19:13:19 UTC |
ed7a95b | Huachao Huang | 27 April 2018, 18:48:21 UTC | Add max_subcompactions as a compaction option Summary: Sometimes we want to compact files as fast as possible, but don't want to set a large `max_subcompactions` in the `DBOptions` by default. I add a `max_subcompactions` options to `CompactionOptions` so that we can choose a proper concurrency dynamically. Closes https://github.com/facebook/rocksdb/pull/3775 Differential Revision: D7792357 Pulled By: ajkr fbshipit-source-id: 94f54c3784dce69e40a229721a79a97e80cd6a6c | 27 April 2018, 18:57:39 UTC |
7dfbe33 | Yanqin Jin | 27 April 2018, 18:11:12 UTC | Rename pending_compaction_ to queued_for_compaction_. Summary: We use `queued_for_flush_` to indicate a column family has been added to the flush queue. Similarly and to be consistent in our naming, we need to use `queued_for_compaction_` to indicate a column family has been added to the compaction queue. In the past we used `pending_compaction_` which can also be ambiguous. Closes https://github.com/facebook/rocksdb/pull/3781 Differential Revision: D7790063 Pulled By: riversand963 fbshipit-source-id: 6786b11a4fcaea36dc9b4672233dbe042f921804 | 27 April 2018, 18:12:01 UTC |
513b5ce | Yanqin Jin | 27 April 2018, 04:09:53 UTC | Rename pending_flush_ to queued_for_flush_. Summary: With ColumnFamilyData::pending_flush_, we have the following code snippet in DBImpl::ScheedulePendingFlush ``` if (!cfd->pending_flush() && cfd->imm()->IsFlushPending()) { ... } ``` `Pending` is ambiguous, and I feel `queued_for_flush` is a better name, especially for the sake of readability. Closes https://github.com/facebook/rocksdb/pull/3777 Differential Revision: D7783066 Pulled By: riversand963 fbshipit-source-id: f1bd8c8bfe5eafd2c94da0d8566c9b2b6bb57229 | 27 April 2018, 04:12:51 UTC |
37cd617 | Nathan VanBenschoten | 27 April 2018, 04:08:46 UTC | Add virtual Truncate method to Env Summary: This change adds a virtual `Truncate` method to `Env`, which truncates the named file to the specified size. At the moment, this is only supported for `MockEnv`, but other `Env's` could be extended to override the method too. This is the same approach that methods like `LinkFile` and `AreSameFile` have taken. This is useful for any user of the in-memory `Env`. The implementation's header is not exported, so before this change, it was impossible to access it's already existing `Truncate` method. Closes https://github.com/facebook/rocksdb/pull/3779 Differential Revision: D7785789 Pulled By: ajkr fbshipit-source-id: 3bcdaeea7b7180529f7d9b496dc67b791a00bbf0 | 27 April 2018, 04:12:51 UTC |
db36f22 | Andrew Kryczka | 27 April 2018, 01:38:45 UTC | Allow options file in db_stress and db_crashtest Summary: - When options file is provided to db_stress, take supported options from the file instead of from flags - Call `BuildOptionsTable` after `Open` so it can use `options_` once it has been populated either from flags or from file - Allow options filename to be passed via `db_crashtest.py` Closes https://github.com/facebook/rocksdb/pull/3768 Differential Revision: D7755331 Pulled By: ajkr fbshipit-source-id: 5205cc5deb0d74d677b9832174153812bab9a60a | 27 April 2018, 01:42:07 UTC |
7004e45 | Andrew Kryczka | 26 April 2018, 21:33:02 UTC | Remove block-based table assertion for non-empty filter block Summary: 7a6353bd1c516fe3f7118248c77035697c5ac247 prevents empty filter blocks from being written for SST files containing range deletions only. However the assertion this PR removes is still a problem as we could be reading from a DB generated by a RocksDB build without the 7a6353bd1c516fe3f7118248c77035697c5ac247 patch. So remove the assertion. We already don't do this check when `cache_index_and_filter_blocks=false`, so it should be safe. Closes https://github.com/facebook/rocksdb/pull/3773 Differential Revision: D7769964 Pulled By: ajkr fbshipit-source-id: 7285762446f2cd2ccf16efd7a988a106fbb0d8d3 | 26 April 2018, 21:43:11 UTC |
63c965c | Siying Dong | 26 April 2018, 20:51:39 UTC | Sync parent directory after deleting a file in delete scheduler Summary: sync parent directory after deleting a file in delete scheduler. Otherwise, trim speed may not be as smooth as what we want. Closes https://github.com/facebook/rocksdb/pull/3767 Differential Revision: D7760136 Pulled By: siying fbshipit-source-id: ec131d53b61953f09c60d67e901e5eeb2716b05f | 26 April 2018, 20:58:20 UTC |
7e4e381 | Maysam Yabandeh | 26 April 2018, 20:22:08 UTC | Fix the bloom filter skipping empty prefixes Summary: bc0da4b5125ac4f43c88879522013814355338e7 optimized bloom filters by skipping duplicate entires when the whole key and prefixes are both added to the bloom. It however used empty string as the initial value of the last entry added to the bloom. This is incorrect since empty key/prefix are valid entires by themselves. This patch fixes that. Closes https://github.com/facebook/rocksdb/pull/3776 Differential Revision: D7778803 Pulled By: maysamyabandeh fbshipit-source-id: d5a065daebee17f9403cac51e9d5626aac87bfbc | 26 April 2018, 20:28:31 UTC |
e5a4dac | Maysam Yabandeh | 26 April 2018, 16:14:28 UTC | WritePrepared Txn: disable rollback in stress test Summary: WritePrepared rollback implementation is not ready to be invoked in the middle of workload. This is due the lack of synchronization to obtain the cf handle from db. Temporarily disabling this until the problem with rollback is fixed. Closes https://github.com/facebook/rocksdb/pull/3772 Differential Revision: D7769041 Pulled By: maysamyabandeh fbshipit-source-id: 0e3b0ce679bc2afba82e653a40afa3f045722754 | 26 April 2018, 16:27:55 UTC |
7c9f23e | Vincent Lee | 25 April 2018, 22:45:18 UTC | Rate limiter should be allowed to share between different rocksdb instances in C API Summary: Currently, the `rocksdb_options_set_ratelimiter` in `c.cc` will change the input to nil, which make it is not possible to use the shared rate limiter create by `rocksdb_ratelimiter_create` in different rocksdb option. In this pr, I changed it to shared ptr. Closes https://github.com/facebook/rocksdb/pull/3758 Differential Revision: D7749740 Pulled By: ajkr fbshipit-source-id: c6121f8ca75402afdb4b295ce63c2338d253a1b5 | 25 April 2018, 22:57:48 UTC |
406b951 | Anand Ananthabhotla | 25 April 2018, 22:29:36 UTC | Fix clang build failure with -Wgnu-redeclared-enum Summary: In include/rocksdb/db.h, enum EntryType is redeclared even though original declaration in types.h in included. Closes https://github.com/facebook/rocksdb/pull/3766 Differential Revision: D7765504 Pulled By: anand1976 fbshipit-source-id: 622a8ecb306993915be1b9dd5cdd79dbc6a4ea05 | 25 April 2018, 22:42:46 UTC |
13a0bd9 | Kefu Chai | 25 April 2018, 21:26:11 UTC | cmake: add options for enabling TBB and NUMA support Summary: see also https://github.com/facebook/rocksdb/issues/3036 Signed-off-by: Kefu Chai <tchaikov@gmail.com> Closes https://github.com/facebook/rocksdb/pull/3750 Differential Revision: D7765170 Pulled By: ajkr fbshipit-source-id: 455788b3131bf62a4987a65684b757e68473eed9 | 25 April 2018, 21:26:55 UTC |
dfc61e7 | Andrew Kryczka | 25 April 2018, 20:36:26 UTC | initialize local variable for UBSAN in PosixEnv function Summary: It seems clear to me that the variable is initialized before line 492, but it wasn't clear to UBSAN. The failure was: ``` In file included from ./env/io_posix.h:14:0, from env/env_posix.cc:44: ./include/rocksdb/env.h: In member function ‘virtual rocksdb::Status rocksdb::{anonymous}::PosixEnv::NewMemoryMappedFileBuffer(const string&, std::unique_ptr<rocksdb::MemoryMappedFileBuffer>*)’: ./include/rocksdb/env.h:822:36: error: ‘base’ may be used uninitialized in this function [-Werror=maybe-uninitialized] : base(_base), length(_length) {} ^ env/env_posix.cc:482:11: note: ‘base’ was declared here void* base; ``` We can just initialize to nullptr to keep UBSAN happy. Closes https://github.com/facebook/rocksdb/pull/3770 Differential Revision: D7756287 Pulled By: ajkr fbshipit-source-id: 0f2efb9594e2d3a30706a4ca7e1d4a6328031bf2 | 25 April 2018, 20:42:02 UTC |
9b89479 | Andrew Kryczka | 25 April 2018, 19:08:24 UTC | Pass -latomic to linker when using clang Summary: clang compilation is failing due to a4fb1f8c049ee9d61a9da8cf23b64d2c7d36a33f. In that commit I added a call to `std::atomic::is_lock_free` which was evidently relying on a compiler builtin only present in gcc. Drawbacks to this fix are: - users may need to install libatomic - there might be cases where clang is used even though USE_CLANG is unset (e.g., when clang is the only available compiler). I didn't figure out how to add -latomic in those cases... An alternative fix mentioned in http://lists.llvm.org/pipermail/llvm-bugs/2017-August/057263.html is using -stdlib=libc++ with clang. Closes https://github.com/facebook/rocksdb/pull/3769 Differential Revision: D7756261 Pulled By: ajkr fbshipit-source-id: 26888300683fa9970ab5950239d1aa217e8efd49 | 25 April 2018, 19:13:41 UTC |
a4fb1f8 | Andrew Kryczka | 24 April 2018, 22:46:41 UTC | Add crash-recovery correctness check to db_stress Summary: Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values. In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`. For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class. `db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup. Closes https://github.com/facebook/rocksdb/pull/3629 Differential Revision: D7463144 Pulled By: ajkr fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b | 24 April 2018, 22:58:22 UTC |
bc0da4b | Maysam Yabandeh | 24 April 2018, 17:53:38 UTC | Skip duplicate bloom keys when whole_key and prefix are mixed Summary: Currently we rely on FilterBitsBuilder to skip the duplicate keys. It does that by comparing that hash of the key to the hash of the last added entry. This logic breaks however when we have whole_key_filtering mixed with prefix blooms as their addition to FilterBitsBuilder will be interleaved. The patch fixes that by comparing the last whole key and last prefix with the whole key and prefix of the new key respectively and skip the call to FilterBitsBuilder if it is a duplicate. Closes https://github.com/facebook/rocksdb/pull/3764 Differential Revision: D7744413 Pulled By: maysamyabandeh fbshipit-source-id: 15df73bbbafdfd754d4e1f42ea07f47b03bc5eb8 | 24 April 2018, 17:58:16 UTC |