sort by:
Revision Author Date Message Commit Date
f23fed1 Delay verify compaction output table (#3979) Summary: Verify table will load SST into `TableCache` it occupy memory & `TableCache`‘s capacity ... but no logic use them it's unnecessary ... so , we verify them after all sub compact finished Closes https://github.com/facebook/rocksdb/pull/3979 Differential Revision: D8389946 Pulled By: ajkr fbshipit-source-id: 54bd4f474f9e7b3accf39c3068b1f36a27ec4c49 15 June 2018, 19:42:53 UTC
4faaab7 Benchmark sine wave write rate limit (#3914) Summary: As mentioned at the [dev forum.](https://www.facebook.com/groups/rocksdb.dev/1693425187422655/) Let me know if you would like me to do any changes! Closes https://github.com/facebook/rocksdb/pull/3914 Differential Revision: D8452824 Pulled By: siying fbshipit-source-id: 56439b3228ecdcc5a199d5198eff2fab553be961 15 June 2018, 19:12:03 UTC
f5281a5 tools/check_format_compatible.sh to cover forward option reading too (#3994) Summary: Make sure that some recent releases can read master's option files while ignoring unknown options. Also add two more recent release branches. Closes https://github.com/facebook/rocksdb/pull/3994 Differential Revision: D8409499 Pulled By: siying fbshipit-source-id: 1b025f19ba288da0517f6b4572797573e23e23c2 15 June 2018, 18:12:29 UTC
fbe3b9e Udpate db_universal_compaction_test according to PR #3970 (#3995) Summary: The SST file sizes changed slightly after the improvement of PR #3970 which reduces the size of the properties block. Before PR #3970 a size ratio compaction included all of the first four flushed files but it only includes two files after. We increase the size_ratio universal compaction option to make that compaction include all four files again. Closes https://github.com/facebook/rocksdb/pull/3995 Differential Revision: D8426925 Pulled By: fgwu fbshipit-source-id: 1429c38672e9f4fb4d4881fd4b06db45c4861d62 15 June 2018, 17:42:21 UTC
1f32dc7 Check with PosixEnv before opening LOCK file (#3993) Summary: Rebased and resubmitting #1831 on behalf of stevelittle. The problem is when a single process attempts to open the same DB twice, the second attempt fails due to LOCK file held. If the second attempt had opened the LOCK file, it'll now need to close it, and closing causes the file to be unlocked. Then, any subsequent attempt to open the DB will succeed, which is the wrong behavior. The solution was to track which files a process has locked in PosixEnv, and check those before opening a LOCK file. Fixes #1780. Closes https://github.com/facebook/rocksdb/pull/3993 Differential Revision: D8398984 Pulled By: ajkr fbshipit-source-id: 2755fe66950a0c9de63075f932f9e15768041918 14 June 2018, 00:32:04 UTC
7497f99 Run manual compaction in stress/crash tests (#3936) Summary: - Add support to `db_stress` for `CompactRange` - Enable `CompactRange` and `CompactFiles` in crash tests Closes https://github.com/facebook/rocksdb/pull/3936 Differential Revision: D8230953 Pulled By: ajkr fbshipit-source-id: 208f9980b5bc8c204b1fa726e83791ad674e21e8 13 June 2018, 23:45:28 UTC
dd216dd Choose unique keys faster in db_stress (#3990) Summary: db_stress initialization randomly chooses a set of keys to not overwrite. It was doing it separately for each column family. That caused 30+ second initialization times for the non-simple crash tests, which have 10 CFs. This PR: - reuses the same set of randomly chosen no-overwrite keys across all CFs - logs a couple more timestamps so we can more easily see initialization time Closes https://github.com/facebook/rocksdb/pull/3990 Differential Revision: D8393821 Pulled By: ajkr fbshipit-source-id: d0b263a298df607285ffdd8b0983ff6575cc6c34 13 June 2018, 20:43:23 UTC
a720401 Avoid acquiring SyncPoint mutex when it is disabled (#3991) Summary: In `db_stress` profile the vast majority of CPU time is spent acquiring the `SyncPoint` mutex. I mistakenly assumed #3939 had fixed this mutex contention problem by disabling `SyncPoint` processing. But actually the lock was still being acquired just to check whether processing is enabled. We can avoid that overhead by using an atomic to track whether it's enabled. Closes https://github.com/facebook/rocksdb/pull/3991 Differential Revision: D8393825 Pulled By: ajkr fbshipit-source-id: 5bc4e3c722ee7304e7a9c2439998c456b05a6897 13 June 2018, 20:13:18 UTC
d82f142 Fix regression bug of Prev() with upper bound (#3989) Summary: A recent change pushed down the upper bound checking to child iterators. However, this causes the logic of following sequence wrong: Seek(key); if (!Valid()) SeekToLast(); Because !Valid() may be caused by upper bounds, rather than the end of the iterator. In this case SeekToLast() points to totally wrong places. This can cause wrong results, infinite loops, or segfault in some cases. This sequence is called when changing direction from forward to backward. And this by itself also implicitly happen during reseeking optimization in Prev(). Fix this bug by using SeekForPrev() rather than this sequuence, as what is already done in prefix extrator case. Closes https://github.com/facebook/rocksdb/pull/3989 Differential Revision: D8385422 Pulled By: siying fbshipit-source-id: 429e869990cfd2dc389421e0836fc496bed67bb4 12 June 2018, 23:57:36 UTC
9d34733 Fix argument mismatch in BlockBasedTableBuilder (#3974) Summary: The sixth argument should be `key_includes_seq` bool, the seventh a `GetContext*`. We were mistakenly passing the `GetContext*` as the sixth argument and relying on the default (nullptr) for the seventh. This would make statistics inaccurate, at least. Blame: 402b7aa0 Closes https://github.com/facebook/rocksdb/pull/3974 Differential Revision: D8344907 Pulled By: ajkr fbshipit-source-id: 3ad865a0541d6d30f75dfc726352788118cfe12e 12 June 2018, 20:57:44 UTC
9c7da96 Fix a crash in WinEnvIO::GetSectorSize (#3975) Summary: Fix a crash in `WinEnvIO::GetSectorSize` that happens on old Windows systems (e.g Windows 7). On old Windows systems that don't support querying StorageAccessAlignmentProperty using IOCTL_STORAGE_QUERY_PROPERTY, the flow calls a different DeviceIoControl with nullptr as lpBytesReturned. When the code reaches this point, we get an access violation. Closes https://github.com/facebook/rocksdb/pull/3975 Differential Revision: D8385186 Pulled By: ajkr fbshipit-source-id: fae4c9b4b0a52c8a10182e1b35bcaa30dc393bbb 12 June 2018, 20:45:18 UTC
3593275 Remove restart point from the properties_block (#3970) Summary: Property block will be read sequentially and cached in a heap located object, so there's no need for restart points. Thus we set the restart interval to infinity to save space. Closes https://github.com/facebook/rocksdb/pull/3970 Differential Revision: D8332586 Pulled By: fgwu fbshipit-source-id: 899c3267832a81d0f084ec2db6b387332f461134 12 June 2018, 19:57:37 UTC
f450294 Change db path for BlockBasedTableTest.BadOptions (#3965) Summary: BadOptions test creates a temporary db path changed to table_block_based_bad_options_test to avoid collide with that created by the PrefixAndWholeKeyTest Closes https://github.com/facebook/rocksdb/pull/3965 Differential Revision: D8316080 Pulled By: fgwu fbshipit-source-id: bb8e0fdfdb9abf0e5ce94494b4388cd1622ee032 08 June 2018, 19:57:14 UTC
3470c75 Fix build errors. Summary: Closes https://github.com/facebook/rocksdb/pull/3967 Differential Revision: D8322775 Pulled By: riversand963 fbshipit-source-id: bd73067bd5d3ed4627348f0685bc499359ad6442 07 June 2018, 22:43:09 UTC
23e1d23 Fixed the fprintf of uint64_t by using PRIu64 (#3963) Summary: Fixed the fprintf format of uint64_t by using PRIu64 in file tools/ldb_cmd.cc Closes https://github.com/facebook/rocksdb/pull/3963 Differential Revision: D8306179 Pulled By: zhichao-cao fbshipit-source-id: 597dcd55321576801bbf2cf4714736ebc4750a0c 07 June 2018, 18:44:48 UTC
0a0860a Refactoring db_stress.cc (#3902) Summary: We use `db_stress.cc` intensively to test and verify the behavior of RocksDB. Sometimes we need to add new tests for recently added features. Original `StressTest` class provides many general functionality that can be leveraged by other tests. Therefore, in this refactoring PR, I try to identify the general operations as well as operations that future tests most likely want to customize. Future tests can inherit `StressTest` and overriding the virtual functions to test custom logic. Closes https://github.com/facebook/rocksdb/pull/3902 Differential Revision: D8284607 Pulled By: riversand963 fbshipit-source-id: 019302d04665a2b18334b6d05d04a477168c8ea4 07 June 2018, 17:43:00 UTC
45b6bcc ZSTD compression: should also expect type = kZSTDNotFinalCompression (#3964) Summary: Depending on the compression type, `CompressBlock` calls the compress method for each compression type. It calls ZSTD_Compress for both kZSTD and kZSTDNotFinalCompression (https://github.com/facebook/rocksdb/blob/master/table/block_based_table_builder.cc#L169). However currently ZSTD_Compress only expects the type to be kZSTD and this is causing assert failures and crashes. The same also applies to ZSTD_Uncompress. Closes https://github.com/facebook/rocksdb/pull/3964 Differential Revision: D8308715 Pulled By: miasantreble fbshipit-source-id: e5125f53edb829c9c33733167bec74e4793d0782 07 June 2018, 06:42:29 UTC
b736521 Extend format 3 to partitioned index/filters (#3958) Summary: format_version 3 changes the format of index blocks by storing user keys instead of the internal keys, which saves 8-bytes per key. This patch extends the format to top-level indexes in partitioned index/filters. Closes https://github.com/facebook/rocksdb/pull/3958 Differential Revision: D8294615 Pulled By: maysamyabandeh fbshipit-source-id: 17666cc16b8076c363972e2308e31547e835f0fe 06 June 2018, 23:58:16 UTC
5504a05 Adding advisor Rules and parser scripts with unit tests. (#3934) Summary: This adds some rules in the tools/advisor/advisor/rules.ini (refer this for more information) file and corresponding python parser scripts for parsing the rules file and the rocksdb LOG and OPTIONS files. This is WIP for adding rules depending on ODS. The starting point of the script is the rocksdb/tools/advisor/advisor/rule_parser.py file. Closes https://github.com/facebook/rocksdb/pull/3934 Reviewed By: maysamyabandeh Differential Revision: D8304059 Pulled By: poojam23 fbshipit-source-id: 47f2a50f04d46d40e225dd1cbf58ba490f79e239 06 June 2018, 21:42:59 UTC
4420df4 Check conflict at output level in CompactFiles (#3926) Summary: CompactFiles checked whether the existing files conflicted with the chosen compaction. But it missed checking whether future files would conflict, i.e., when another compaction was simultaneously writing new files to the same range at the same output level. Closes https://github.com/facebook/rocksdb/pull/3926 Differential Revision: D8218996 Pulled By: ajkr fbshipit-source-id: 21cb00a6fed4c8c62d3ed2ff810962e6bdc2fdfb 05 June 2018, 21:14:05 UTC
f1592a0 run make format for PR 3838 (#3954) Summary: PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings. Run `make format` to fix formatting as suggested by siying . Also piggyback two changes: 1) fix singleton destruction order for windows and posix env 2) fix two clang warnings Closes https://github.com/facebook/rocksdb/pull/3954 Differential Revision: D8272041 Pulled By: miasantreble fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f 05 June 2018, 19:58:02 UTC
812c737 Fix performance regression in Get() for block-based tables (#3953) Summary: This fixes a regression in one of myrocks regression tests (readwhilewriting), introduced in https://github.com/facebook/rocksdb/commit/8bf555f487d1de84a4fb19cb97b9ae1a8dbebc60 This PR changes two lines of code: one of them actually fixes the observed regression, the other is a mostly unrelated small fix that I'm piggy-backing here. EDIT: Nevermind, it fixes one line. More details in inline comments. Closes https://github.com/facebook/rocksdb/pull/3953 Differential Revision: D8270664 Pulled By: al13n321 fbshipit-source-id: a7d91e196807d1e816551591257c700f70e4ccac 05 June 2018, 18:43:16 UTC
d0c38c0 Extend some tests to format_version=3 (#3942) Summary: format_version=3 changes the format of SST index. This is however not being tested currently since tests only work with the default format_version which is currently 2. The patch extends the most related tests to also test for format_version=3. Closes https://github.com/facebook/rocksdb/pull/3942 Differential Revision: D8238413 Pulled By: maysamyabandeh fbshipit-source-id: 915725f55753dd8e9188e802bf471c23645ad035 05 June 2018, 03:13:00 UTC
2210152 Fix singleton destruction order of PosixEnv and SyncPoint (#3951) Summary: Ensure the PosixEnv singleton is destroyed first since its destructor waits for background threads to all complete. This ensures background threads cannot hit sync points after the SyncPoint singleton is destroyed, which was previously possible. Closes https://github.com/facebook/rocksdb/pull/3951 Differential Revision: D8265295 Pulled By: ajkr fbshipit-source-id: 7738dd458c5d993a78377dd0420e82badada81ab 04 June 2018, 22:58:46 UTC
ab2254b Fix clang analyze Summary: This fixes the errors as reported here: https://github.com/facebook/rocksdb/pull/3941#issuecomment-394424043 Closes https://github.com/facebook/rocksdb/pull/3950 Differential Revision: D8263086 Pulled By: lth fbshipit-source-id: 5e148d489cab2153e5846d16979a0a1f2d677d57 04 June 2018, 21:44:23 UTC
f4b72d7 Provide a way to override windows memory allocator with jemalloc for ZSTD Summary: Windows does not have LD_PRELOAD mechanism to override all memory allocation functions and ZSTD makes use of C-tuntime calloc. During flushes and compactions default system allocator fragments and the system slows down considerably. For builds with jemalloc we employ an advanced ZSTD context creation API that re-directs memory allocation to jemalloc. To reduce the cost of context creation on each block we cache ZSTD context within the block based table builder while a new SST file is being built, this will help all platform builds including those w/o jemalloc. This avoids system allocator fragmentation and improves the performance. The change does not address random reads and currently on Windows reads with ZSTD regress as compared with SNAPPY compression. Closes https://github.com/facebook/rocksdb/pull/3838 Differential Revision: D8229794 Pulled By: miasantreble fbshipit-source-id: 719b622ab7bf4109819bc44f45ec66f0dd3ee80d 04 June 2018, 19:12:48 UTC
4f297ad Fix crash test check for direct I/O Summary: We need to keep the DB directory around since the direct IO check in "db_crashtest.py" relies on it existing. This PR fixes an issue where it was removed after each stress test run during the second half of whitebox crash testing. Closes https://github.com/facebook/rocksdb/pull/3946 Differential Revision: D8247998 Pulled By: ajkr fbshipit-source-id: 4e7cffbdab9b40df125e7842d0d59916e76261d3 04 June 2018, 04:42:12 UTC
50d7ac0 Fix test for rocksdb_lite: hide incompatible option kDirectIO Summary: Previous commit https://github.com/facebook/rocksdb/pull/3935 unhide a few test options which includes kDirectIO. However it's not supported by RocksDB lite. Need to hide this option from the lite build. Closes https://github.com/facebook/rocksdb/pull/3943 Differential Revision: D8242757 Pulled By: miasantreble fbshipit-source-id: 1edfad3a5d01a46bfb7eedee765981ebe02c500a 02 June 2018, 03:42:36 UTC
fea2b1d Copy Get() result when file reads use mmap Summary: For iterator reads, a `SuperVersion` is pinned to preserve a snapshot of SST files, and `Block`s are pinned to allow `key()` and `value()` to return pointers directly into a RocksDB memory region. This works for both non-mmap reads, where the block owns the memory region, and mmap reads, where the file owns the memory region. For point reads with `PinnableSlice`, only the `Block` object is pinned. This works for non-mmap reads because the block owns the memory region, so even if the file is deleted after compaction, the memory region survives. However, for mmap reads, file deletion causes the memory region to which the `PinnableSlice` refers to be unmapped. The result is usually a segfault upon accessing the `PinnableSlice`, although sometimes it returned wrong results (I repro'd this a bunch of times with `db_stress`). This PR copies the value into the `PinnableSlice` when it comes from mmap'd memory. We can tell whether the `Block` owns its memory using `Block::cachable()`, which is unset when reads do not use the provided buffer as is the case with mmap file reads. When that is false we ensure the result of `Get()` is copied. This feels like a short-term solution as ideally we'd have the `PinnableSlice` pin the mmap'd memory so we can do zero-copy reads. It seemed hard so I chose this approach to fix correctness in the meantime. Closes https://github.com/facebook/rocksdb/pull/3881 Differential Revision: D8076288 Pulled By: ajkr fbshipit-source-id: 31d78ec010198723522323dbc6ea325122a46b08 01 June 2018, 23:57:58 UTC
88c3ee2 Configure direct I/O statically in db_stress Summary: Previously `db_stress` attempted to configure direct I/O dynamically in `SetOptions()` which had multiple problems (ummm must've never been tested): - It's a DB option so SetDBOptions should've been called instead - It's not a dynamic option so even SetDBOptions would fail - It required enabling SyncPoint to mask O_DIRECT since it had no way to detect whether the DB directory was in tmpfs or not. This required locking that consumed ~80% of db_stress CPU. In this PR I delete the broken dynamic config and instead configure it statically, only enabling it if the DB directory truly supports O_DIRECT. Closes https://github.com/facebook/rocksdb/pull/3939 Differential Revision: D8238120 Pulled By: ajkr fbshipit-source-id: 60bb2deebe6c9b54a3f788079261715b4a229279 01 June 2018, 23:42:34 UTC
01e3c30 Extend existing unit tests to run with WriteUnprepared as well Summary: As titled. I have not extended the Compatibility tests because the new WAL markers are still unimplemented. Closes https://github.com/facebook/rocksdb/pull/3941 Differential Revision: D8238394 Pulled By: lth fbshipit-source-id: 980e3d44837bbf2cfa64047f9738f559dfac4b1d 01 June 2018, 21:58:41 UTC
89b3708 add c api rocksdb_sstfilewriter_file_size Summary: Closes https://github.com/facebook/rocksdb/pull/3922 Differential Revision: D8208528 Pulled By: ajkr fbshipit-source-id: d384fe53cf526f2aadc7b79a423ce36dbd3ff224 01 June 2018, 16:43:59 UTC
2a0dfaa fix PrefixExtractorChanged: pass raw pointer instead shared_ptr Summary: This should resolve the performance regression caused by the unnecessary copying of the shared_ptr. Closes https://github.com/facebook/rocksdb/pull/3937 Differential Revision: D8232330 Pulled By: miasantreble fbshipit-source-id: 7885bf7cd190b6f87164c52d6edd328298c13f97 01 June 2018, 04:42:50 UTC
44cf849 Fix the bug of some test scenarios being put after kEnd Summary: DBTestBase::OptionConfig includes the scenarios that unit tests could iterate over them by calling ChangeOptions(). Some of the options have been mistakenly put after kEnd which makes them essentially invisible to ChangeOptions() caller. This patch fixes it except for kUniversalSubcompactions which is left as TODO since it would break some unit tests. Closes https://github.com/facebook/rocksdb/pull/3935 Differential Revision: D8230748 Pulled By: maysamyabandeh fbshipit-source-id: edddb8fffcd161af1809fef24798ce118f8593db 01 June 2018, 02:28:00 UTC
2807678 c api set bottommost level compaction Summary: Closes https://github.com/facebook/rocksdb/pull/3928 Differential Revision: D8224962 Pulled By: ajkr fbshipit-source-id: 3caf463509a935bff46530f27232a85ae7e4e484 01 June 2018, 00:30:50 UTC
82089d5 DBImpl::FindObsoleteFiles() not to call GetChildren() on the same path Summary: DBImpl::FindObsoleteFiles() may call GetChildren() multiple times if different CFs are on the same path. Fix it. Closes https://github.com/facebook/rocksdb/pull/3885 Differential Revision: D8084634 Pulled By: siying fbshipit-source-id: b471fbc251f6a05e9243304dc14c0831060cc0b0 31 May 2018, 19:58:33 UTC
a35451e fix deadlock with enable_pipelined_write=true and max_successive_merges > 0 Summary: fix this https://github.com/facebook/rocksdb/issues/3916 Closes https://github.com/facebook/rocksdb/pull/3923 Differential Revision: D8215192 Pulled By: yiwu-arbug fbshipit-source-id: a4c2f839a91d92dc70906d2b7c6de0fe014a2422 31 May 2018, 18:13:14 UTC
aaac6cd Add write unprepared classes by inheriting from write prepared Summary: Closes https://github.com/facebook/rocksdb/pull/3907 Differential Revision: D8218325 Pulled By: lth fbshipit-source-id: ff32d8dab4a159cd2762876cba4b15e3dc51ff3b 31 May 2018, 17:47:42 UTC
727eb88 Compile error in db bench tool Summary: Small format error below causes build to fail. I believe that this : ``` fprintf(stderr, "num reads to do %lu\n", reads_); ``` Can be changed to this: ``` fprintf(stderr, "num reads to do %" PRIu64 "\n", reads_); ``` Successful build ``` CC utilities/blob_db/blob_dump_tool.o AR librocksdb_debug.a ar: creating archive librocksdb_debug.a /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: librocksdb_debug.a(rocks_lua_compaction_filter.o) has no symbols CC tools/db_bench.o CC tools/db_bench_tool.o tools/db_bench_tool.cc:4532:46: error: format specifies type 'unsigned long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat] fprintf(stderr, "num reads to do %lu\n", reads_); ~~~ ^~~~~~ %lld 1 error generated. make: *** [tools/db_bench_tool.o] Error 1 ``` ``` $ cd rocksdb $ make all $ g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 9.1.0 (clang-902.0.39.1) Target: x86_64-apple-darwin17.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin ``` Closes https://github.com/facebook/rocksdb/pull/3909 Differential Revision: D8215710 Pulled By: siying fbshipit-source-id: 15e49fb02a818fec846e9f9b2a50e372b6b67751 31 May 2018, 01:01:36 UTC
4dd80de Remove tests from ROCKSDB_VALGRIND_RUN Summary: In order to make valgrind check test to pass in a day, remove some tests that run prohibitively slow under valgrind. Closes https://github.com/facebook/rocksdb/pull/3924 Differential Revision: D8210184 Pulled By: siying fbshipit-source-id: 5b06fb08f3cf57571d422d05a0dbddc9f9376f7a 30 May 2018, 23:15:16 UTC
a736255 Delete triggered compaction for universal style Summary: This is still WIP, but I'm hoping for early feedback on the overall approach. This patch implements deletion triggered compaction, which till now only worked for leveled, for universal style. SST files are marked for compaction by the CompactOnDeletionCollertor table property. This is expected to be used when free disk space is low and the user wants to reclaim space by deleting a bunch of keys. The deletions are expected to be dense. In such a situation, we want to avoid a full compaction due to its space overhead. The strategy used in this case is similar to leveled. We pick one file from the set of files marked for compaction. We then expand the inputs to a clean cut on the same level, and then pick overlapping files from the next non-mepty level. Picking files from the next level can cause the key range to expand, and we opportunistically expand inputs in the source level to include files wholly in this key range. The main side effect of this is that it breaks the property of no time range overlap between levels. This shouldn't break any functionality. Closes https://github.com/facebook/rocksdb/pull/3860 Differential Revision: D8124397 Pulled By: anand1976 fbshipit-source-id: bfa2a9dd6817930e991b35d3a8e7e61304ed3dcf 29 May 2018, 22:44:34 UTC
724855c Fix LRUCache missing null check on destruct Summary: Fix LRUCache missing null check on destruct. The check is needed if LRUCache::DisownData is called. Closes https://github.com/facebook/rocksdb/pull/3920 Differential Revision: D8191631 Pulled By: yiwu-arbug fbshipit-source-id: d5014f6e49b51692c18a25fb55ece935f5a023c4 29 May 2018, 22:13:09 UTC
cf826de Fix compilation error when OPT="-DROCKSDB_LITE". Summary: Closes https://github.com/facebook/rocksdb/pull/3917 Differential Revision: D8187733 Pulled By: riversand963 fbshipit-source-id: e4aa179cd0791ca77167e357f99de9afd4aef910 29 May 2018, 19:28:59 UTC
03cda53 Check for rep_->table_properties being nullptr Summary: The very old sst formats do not have table_properties and rep_->table_properties is thus nullptr. The recent patch in https://github.com/facebook/rocksdb/pull/3894 does not check for nullptr and hence makes it backward incompatible. This patch adds the check. Closes https://github.com/facebook/rocksdb/pull/3918 Differential Revision: D8188638 Pulled By: maysamyabandeh fbshipit-source-id: b1d986665ecf0b4d1c442adfa8a193b97707d47b 29 May 2018, 19:13:55 UTC
1c1bafa Fix VersionStorageInfo::EstimateLiveDataSize seg fault Summary: `HandleEstimateLiveDataSize`'s `need_out_of_mutex` is true https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/internal_stats.cc#L412-L413 so , is will ref a `SuperVersion` https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/db_impl.cc#L1896-L1908 so , the param `version` of `InternalStats::HandleEstimateLiveDataSize` is safe , but `cfd_->current()` is not safe ! https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/internal_stats.cc#L790-L795 the `cfd_->current()` maybe invalid ... here's mongo-rocks crash backtrace ``` mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x41) [0x7fe3a3137c51] mongod(+0x2152E89) [0x7fe3a3136e89] mongod(+0x21534F6) [0x7fe3a31374f6] libpthread.so.0(+0xF5E0) [0x7fe39f5e45e0] mongod(rocksdb::InternalKeyComparator::Compare(rocksdb::Slice const&, rocksdb::Slice const&) const+0x17) [0x7fe3a22375a7] mongod(rocksdb::VersionStorageInfo::EstimateLiveDataSize() const+0x3AA) [0x7fe3a228daba] mongod(rocksdb::InternalStats::HandleEstimateLiveDataSize(unsigned long*, rocksdb::DBImpl*, rocksdb::Version*)+0x20) [0x7fe3a2250d70] mongod(rocksdb::DBImpl::GetIntPropertyInternal(rocksdb::ColumnFamilyData*, rocksdb::DBPropertyInfo const&, bool, unsigned long*)+0xEF) [0x7fe3a21e3dbf] ``` Closes https://github.com/facebook/rocksdb/pull/3912 Differential Revision: D8179944 Pulled By: yiwu-arbug fbshipit-source-id: 26f314a8f98f4c2dc4348745d759f26f0e8d95e1 28 May 2018, 18:27:08 UTC
402b7aa Exclude seq from index keys Summary: Index blocks have the same format as data blocks. The keys therefore similarly to the keys in the data blocks are internal keys, which means that in addition to the user key it also has 8 bytes that encodes sequence number and value type. This extra 8 bytes however is not necessary in index blocks since the index keys act as an separator between two data blocks. The only exception is when the last key of a block and the first key of the next block share the same user key, in which the sequence number is required to act as a separator. The patch excludes the sequence from index keys only if the above special case does not happen for any of the index keys. It then records that in the property block. The reader looks at the property block to see if it should expect sequence numbers in the keys of the index block.s Closes https://github.com/facebook/rocksdb/pull/3894 Differential Revision: D8118775 Pulled By: maysamyabandeh fbshipit-source-id: 915479f028b5799ca91671d67455ecdefbd873bd 26 May 2018, 01:42:43 UTC
8c3bf08 Check status when reading HashIndexPrefixesMetadataBlock Summary: This was missed in a refactor of `ReadBlockContents` (2f1a3a4). Closes https://github.com/facebook/rocksdb/pull/3906 Differential Revision: D8172648 Pulled By: ajkr fbshipit-source-id: 27e453b19795fea974bfed4721105be6f3a12090 26 May 2018, 00:42:51 UTC
4543417 Fix an issue with unnecessary capture in lambda expressions Summary: Closes https://github.com/facebook/rocksdb/issues/3900 Replaces https://github.com/facebook/rocksdb/pull/3901 I needed this to build v5.12.4 on Mac OS X (10.13.3). Closes https://github.com/facebook/rocksdb/pull/3904 Differential Revision: D8169357 Pulled By: sagar0 fbshipit-source-id: 85faac42168796e7def9250d0c221a9a03b84476 25 May 2018, 22:12:44 UTC
aa53579 Fix segfault caused by object premature destruction Summary: Please refer to earlier discussion in [issue 3609](https://github.com/facebook/rocksdb/issues/3609). There was also an alternative fix in [PR 3888](https://github.com/facebook/rocksdb/pull/3888), but the proposed solution requires complex change. To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed. Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised. Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion. Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler. Test plan ``` $ make -j16 $ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion ``` Expected behavior: All tests should pass. Closes https://github.com/facebook/rocksdb/pull/3898 Differential Revision: D8149421 Pulled By: riversand963 fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438 25 May 2018, 18:57:51 UTC
6e08916 Fix Fadvise on closed file when reads use mmap Summary: ```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called. Closes https://github.com/facebook/rocksdb/pull/2764 Differential Revision: D8152515 Pulled By: ajkr fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65 25 May 2018, 17:57:57 UTC
070319f add flush_before_backup parameter to c api rocksdb_backup_engine_create_new_backup Summary: Add flush_before_backup to rocksdb_backup_engine_create_new_backup. make c api able to control the flush before backup behavior. Closes https://github.com/facebook/rocksdb/pull/3897 Differential Revision: D8157676 Pulled By: ajkr fbshipit-source-id: 88998c62f89f087bf8672398fd7ddafabbada505 25 May 2018, 05:28:52 UTC
bc7e8d4 LRUCache midpoint insertion Summary: Implement midpoint insertion strategy where new blocks will be insert to the middle of LRU list, then move the head on the first hit in cache. Closes https://github.com/facebook/rocksdb/pull/3877 Differential Revision: D8100895 Pulled By: yiwu-arbug fbshipit-source-id: f4bd83cb8be469e5d02072cfc8bd66011391f3da 24 May 2018, 22:57:33 UTC
3db8504 Catchup with posix features Summary: Catch up with Posix features NewWritableRWFile must fail when file does not exists Implement Env::Truncate() Adjust Env options optimization functions Implement MemoryMappedBuffer on Windows. Closes https://github.com/facebook/rocksdb/pull/3857 Differential Revision: D8053610 Pulled By: ajkr fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e 24 May 2018, 22:13:04 UTC
c465509 port_posix: use posix_memalign() for aligned_alloc Summary: to workaround issue of http://tracker.ceph.com/issues/21422 . and in tcmalloc aligned_alloc and posix_memalign() are basically the same thing. the same applies to GNU glibc. fixes #3175 Signed-off-by: Kefu Chai <tchaikov@gmail.com> Closes https://github.com/facebook/rocksdb/pull/3862 Differential Revision: D8147930 Pulled By: yiwu-arbug fbshipit-source-id: 355afe93c4dd0a96a0d711ef190e8b86fbe8d11d 24 May 2018, 19:13:16 UTC
7a99c04 refactor constructor of LRUCacheShard Summary: Update LRUCacheShard constructor so that adding new params to it don't need to add extra SetXXX() methods. Closes https://github.com/facebook/rocksdb/pull/3896 Differential Revision: D8128618 Pulled By: yiwu-arbug fbshipit-source-id: 6afa715de1493a50de413678761a765e3af9b83b 24 May 2018, 01:57:42 UTC
01bcc34 Introduce library-independent default compression level Summary: Previously we were using -1 as the default for every library, which was legacy from our zlib options. That worked for a while, but after zstd introduced https://github.com/facebook/zstd/commit/a146ee04ae5866b948be0c1911418e0436d80cb4, it started giving poor compression ratios by default in zstd. This PR adds a constant to RocksDB public API, `CompressionOptions::kDefaultCompressionLevel`, which will get translated to the default value specific to the compression library being used in "util/compression.h". The constant uses a number that appears to be larger than any library's maximum compression level. Closes https://github.com/facebook/rocksdb/pull/3895 Differential Revision: D8125780 Pulled By: ajkr fbshipit-source-id: 2db157a89118cd4f94577c2f4a0a5ff31c8391c6 24 May 2018, 01:42:08 UTC
4011012 Specify the underlying type of enums. Summary: Explicitly specify the underlying type of enums help developers understand the physical storage. Closes https://github.com/facebook/rocksdb/pull/3892 Differential Revision: D8107027 Pulled By: riversand963 fbshipit-source-id: a00efecbba46df4a3c8eed0994a2d4972ad1a1d3 23 May 2018, 23:12:59 UTC
6c73a46 Fix a backward compatibility problem with table_properties being nullptr Summary: Currently when ldb built from master tries to open a DB from version 2.2, there will be a segfault because table_properties didn't exist back then. Closes https://github.com/facebook/rocksdb/pull/3890 Differential Revision: D8100914 Pulled By: miasantreble fbshipit-source-id: b255e8aedc54695432be2e704839c857dabdd65a 22 May 2018, 20:57:17 UTC
4420cb4 Fix Issue #3771: Slice ctor checks for nullptr and creates empty string Summary: Fix Issue #3771 : Check for nullptr in Slice constructor Slice ctor checks for nullptr and creates empty string if the string does not exist Closes https://github.com/facebook/rocksdb/pull/3887 Differential Revision: D8098852 Pulled By: ajkr fbshipit-source-id: 04471077defa9776ce7b8c389a61312ce31002fb 22 May 2018, 20:41:56 UTC
7db721b Avoid sleep in DBTest.GroupCommitTest to fix flakiness Summary: DBTest.GroupCommitTest would often fail when run under valgrind because its sleeps were insufficient to guarantee a group commit had multiple entries. Instead we can use sync point to force a leader to wait until a non-leader thread has enqueued its work, thus guaranteeing a leader can do group commit work for multiple threads. Closes https://github.com/facebook/rocksdb/pull/3883 Differential Revision: D8079429 Pulled By: ajkr fbshipit-source-id: 61dc50fad29d2c85547842f681288de60fa29049 22 May 2018, 19:16:25 UTC
fcb3101 Avoid single-deleting merge operands in db_stress Summary: I repro'd some of the "unexpected value" failures showing up in our CI lately and they always happened on keys that have a mix of single deletes and merge operands. The `SingleDelete()` API comment mentions it's incompatible with `Merge()`, so this PR prevents `db_stress` from mixing them. Closes https://github.com/facebook/rocksdb/pull/3878 Differential Revision: D8097346 Pulled By: ajkr fbshipit-source-id: 357a48c6a31156f4f8db3ce565638ad924c437a1 22 May 2018, 17:58:36 UTC
3db1ada PersistRocksDBOptions() to use WritableFileWriter Summary: By using WritableFileWriter rather than WritableFile directly, we can buffer multiple Append() calls to one write() file system call, which will be expensive to underlying Env without its own write buffering. Closes https://github.com/facebook/rocksdb/pull/3882 Differential Revision: D8080673 Pulled By: siying fbshipit-source-id: e0db900cb3c178166aa738f3985db65e3ae2cf1b 21 May 2018, 23:42:22 UTC
c3ebc75 Move prefix_extractor to MutableCFOptions Summary: Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users. This PR aims to make it possible to dynamically change bloom filter config. Closes https://github.com/facebook/rocksdb/pull/3601 Differential Revision: D7253114 Pulled By: miasantreble fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c 21 May 2018, 21:43:11 UTC
263ef52 Update ColumnFamilyTest for multi-CF verification Summary: Change `keys_` from `set<string>` to `vector<set<string>>` so that each column family's keys are stored in one set. ajkr When you have a chance, can you PTAL? Thanks! Closes https://github.com/facebook/rocksdb/pull/3871 Differential Revision: D8056447 Pulled By: riversand963 fbshipit-source-id: 650d0f9cad02b1bc005fc329ad76edbf053e6386 21 May 2018, 18:57:42 UTC
508a09f Print histogram count and sum in statistics string Summary: Previously it only printed percentiles, even though our histogram keeps track of count and sum (and more). There have been many times we want to know more than the percentiles. For example, we currently want sum of "rocksdb.compression.times.nanos" and sum of "rocksdb.decompression.times.nanos", which would allow us to know the relative cost of compression vs decompression. This PR adds count and sum to the string printed by `StatisticsImpl::ToString`. This is a bit risky as there are definitely parsers assuming the old format. I will mention it in HISTORY.md and hope for the best... Closes https://github.com/facebook/rocksdb/pull/3863 Differential Revision: D8038831 Pulled By: ajkr fbshipit-source-id: 0465b72e4b0cbf18ef965f4efe402601d16d5b5c 21 May 2018, 18:12:47 UTC
7b65521 Assert keys/values pinned by range deletion meta-block iterators Summary: `RangeDelAggregator` holds the pointers returned by `BlockIter::key()` and `BlockIter::value()` so requires the data to which they point is pinned. `BlockIter::key()` points into block memory and is guaranteed to be pinned if and only if prefix encoding is disabled (or, equivalently, restart interval is set to one). I think `BlockIter::value()` is always pinned. Added an assert for these and removed the wrong TODO about increasing restart interval, which would enable key prefix encoding and break the assertion. Closes https://github.com/facebook/rocksdb/pull/3875 Differential Revision: D8063667 Pulled By: ajkr fbshipit-source-id: 60b5ebcc0cdd610dd6aad9e74a23378793672c41 21 May 2018, 16:57:00 UTC
e410501 Add missing test files to src.mk Summary: We only generate the header dependency (".cc.d") files for files mentioned in "src.mk". When we don't generate them, changes to header dependencies do not cause `make` to recompile the dependent ".o". Then it takes a while for developers (or maybe just me) to realize `make clean` is necessary. Closes https://github.com/facebook/rocksdb/pull/3876 Differential Revision: D8065389 Pulled By: ajkr fbshipit-source-id: 0f62eee7bcab15b0215791564e6ab3775d46996b 21 May 2018, 16:43:29 UTC
ed4d339 fix a division by zero bug Summary: fixes the failing clang_analyze contrun test Closes https://github.com/facebook/rocksdb/pull/3872 Differential Revision: D8059241 Pulled By: miasantreble fbshipit-source-id: e8fc1838004fe16a823456188386b8b39429803b 19 May 2018, 04:57:24 UTC
26da367 class Block to store num_restarts_ Summary: Right now, every Block::NewIterator() reads num_restarts_ from the block, which is already read in Block::Block(). This sometimes cause a CPU cache miss. Although fetching this cacheline can usually benefit follow-up block restart offset reading, as they are close to each other, it's almost free to get ride of this read by storing it in the Block class. Closes https://github.com/facebook/rocksdb/pull/3869 Differential Revision: D8052493 Pulled By: siying fbshipit-source-id: 9c72360f0c2d7329f3c198ce4eaedd2bc14b87c1 18 May 2018, 19:56:55 UTC
a0c7b4d Set the default value of max_manifest_file_size. Summary: In the past, the default value of max_manifest_file_size is uint64_t::MAX, allowing a long running RocksDB process to grow its MANIFEST file to take up the entire disk, as reported in [issue 3851](https://github.com/facebook/rocksdb/issues/3851). It is reasonable and common to provide a default non-max value for this option. Therefore, I set the value to 1GB. siying miasantreble Please let me know whether this looks good to you. Thanks! Closes https://github.com/facebook/rocksdb/pull/3867 Differential Revision: D8051524 Pulled By: riversand963 fbshipit-source-id: 50251f0804b1fa933a19a30d19d261ea8b9d2b72 18 May 2018, 15:11:55 UTC
17af09f Implement key shortening functions in ReverseBytewiseComparator Summary: Right now ReverseBytewiseComparator::FindShortestSeparator() doesn't really shorten key, and ReverseBytewiseComparator::FindShortestSuccessor() seems to return wrong results. The code is confusing too as it uses BytewiseComparatorImpl::FindShortestSeparator() but the function actually won't do anything if the the first key is larger than the second. Implement ReverseBytewiseComparator::FindShortestSeparator() and override ReverseBytewiseComparator::FindShortestSuccessor() to be empty. Closes https://github.com/facebook/rocksdb/pull/3836 Differential Revision: D7959762 Pulled By: siying fbshipit-source-id: 93acb621c16ce6f23e087ae4e19f7d84d1254683 18 May 2018, 01:27:16 UTC
1d7ca20 add override to virtual functions Summary: this will fix the failing clang_check test Closes https://github.com/facebook/rocksdb/pull/3868 Differential Revision: D8050880 Pulled By: miasantreble fbshipit-source-id: 749932e2e4025f835c961c068d601e522a126da6 18 May 2018, 00:57:48 UTC
aed7abb Reorder field based on esan data Summary: Running. TEST_TMPDIR=/dev/shm ./buck-out/gen/rocks/tools/rocks_db_bench --benchmarks=readwhilewriting --num=5000000 -benchmark_write_rate_limit=2000000 --threads=32 Collected esan data and reorder field. Accesses to 4th and 6th fields take majority of the access. Group them. Overall, this struct takes 10%+ of the total accesses in the program. (637773011/6107964986) ==2433831== class rocksdb::InlineSkipList ==2433831== size = 48, count = 637773011, ratio = 112412, array access = 0 ==2433831== # 0: offset = 0, size = 2, count = 455137, type = i16 ==2433831== # 1: offset = 2, size = 2, count = 6, type = i16 ==2433831== # 2: offset = 4, size = 4, count = 182303, type = i32 ==2433831== # 3: offset = 8, size = 8, count = 263953900, type = %"class.rocksdb::MemTableRep::KeyComparator"* ==2433831== # 4: offset = 16, size = 8, count = 136409, type = %"class.rocksdb::Allocator"* ==2433831== # 5: offset = 24, size = 8, count = 366628820, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Node"* ==2433831== # 6: offset = 32, size = 4, count = 6280031, type = %"struct.std::atomic" = type { %"struct.std::__atomic_base" } ==2433831== # 7: offset = 40, size = 8, count = 136405, type = %"struct.rocksdb::InlineSkipList<const rocksdb::MemTableRep::KeyComparator &>::Splice"* ==2433831==EfficiencySanitizer: total struct field access count = 6107964986 Before re-ordering [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting without-ro.log readwhilewriting : 0.036 micros/op 27545605 ops/sec; 26.8 MB/s (45954 of 5000000 found) readwhilewriting : 0.036 micros/op 28024240 ops/sec; 27.2 MB/s (43158 of 5000000 found) readwhilewriting : 0.037 micros/op 27345145 ops/sec; 27.1 MB/s (46725 of 5000000 found) readwhilewriting : 0.037 micros/op 27072588 ops/sec; 27.3 MB/s (42605 of 5000000 found) readwhilewriting : 0.034 micros/op 29578781 ops/sec; 28.3 MB/s (44294 of 5000000 found) readwhilewriting : 0.035 micros/op 28528304 ops/sec; 27.7 MB/s (44176 of 5000000 found) readwhilewriting : 0.037 micros/op 27075497 ops/sec; 26.5 MB/s (43763 of 5000000 found) readwhilewriting : 0.036 micros/op 28024117 ops/sec; 27.1 MB/s (40622 of 5000000 found) readwhilewriting : 0.037 micros/op 27078709 ops/sec; 27.6 MB/s (47774 of 5000000 found) readwhilewriting : 0.034 micros/op 29020689 ops/sec; 28.1 MB/s (45066 of 5000000 found) AVERAGE()=27.37 MB/s After re-ordering [trentxintong@devbig460.frc2 ~/fbsource/fbcode]$ fgrep readwhilewriting ro.log readwhilewriting : 0.036 micros/op 27542409 ops/sec; 27.7 MB/s (46163 of 5000000 found) readwhilewriting : 0.036 micros/op 28021148 ops/sec; 28.2 MB/s (46155 of 5000000 found) readwhilewriting : 0.036 micros/op 28021035 ops/sec; 27.3 MB/s (44039 of 5000000 found) readwhilewriting : 0.036 micros/op 27538659 ops/sec; 27.5 MB/s (46781 of 5000000 found) readwhilewriting : 0.036 micros/op 28028604 ops/sec; 27.6 MB/s (44689 of 5000000 found) readwhilewriting : 0.036 micros/op 27541452 ops/sec; 27.3 MB/s (43156 of 5000000 found) readwhilewriting : 0.034 micros/op 29041338 ops/sec; 28.8 MB/s (44895 of 5000000 found) readwhilewriting : 0.036 micros/op 27784974 ops/sec; 26.3 MB/s (39963 of 5000000 found) readwhilewriting : 0.036 micros/op 27538892 ops/sec; 28.1 MB/s (46570 of 5000000 found) readwhilewriting : 0.038 micros/op 26622473 ops/sec; 27.0 MB/s (43236 of 5000000 found) AVERAGE()=27.58 MB/s Closes https://github.com/facebook/rocksdb/pull/3855 Reviewed By: siying Differential Revision: D8048781 Pulled By: trentxintong fbshipit-source-id: bc9807a9845e2a92cb171ce1ecb5a2c8a51f1481 18 May 2018, 00:57:48 UTC
fa43948 Update HISTORY and version for upcoming 5.14 Summary: Closes https://github.com/facebook/rocksdb/pull/3866 Differential Revision: D8043563 Pulled By: gfosco fbshipit-source-id: da4af20e604534602ac0e07943135513fd9a9f53 17 May 2018, 21:27:17 UTC
7ccb35f In instrumented mutex, take timing once for both of perf_context and statistics Summary: Closes https://github.com/facebook/rocksdb/pull/3427 Differential Revision: D6827236 Pulled By: siying fbshipit-source-id: d8a2cc525c90df625510565669f2659014259a8a 17 May 2018, 19:56:53 UTC
8bf555f Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs Summary: Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are: * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted. * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not. However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours). This PR changes the convention to: * If status() is not ok, Valid() always returns false. * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.) This does sacrifice the two use cases listed above, but siying said it's ok. Overview of the changes: * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario. * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed. * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator. To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types: Iterators that didn't need changes: * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator. * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator. Iterators with changes (see inline comments for details): * DBIter - an overhaul: - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back. - It had a few code paths silently discarding subiterator's status. The stress test caught a few. - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example. - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption. - It used to not reset status on seek for some types of errors. - Some simplifications and better comments. - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer. * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status. * ForwardIterator - changed to the new convention, also slightly simplified. * ForwardLevelIterator - fixed some bugs and simplified. * LevelIterator - simplified. * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_. * BlockBasedTableIterator - minor changes. * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid. * PlainTableIterator - some seeks used to not reset status. * CuckooTableIterator - tiny code cleanup. * ManagedIterator - fixed some bugs. * BaseDeltaIterator - changed to the new convention and fixed a bug. * BlobDBIterator - seeks used to not reset status. * KeyConvertingIterator - some small change. Closes https://github.com/facebook/rocksdb/pull/3810 Differential Revision: D7888019 Pulled By: al13n321 fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d 17 May 2018, 09:56:56 UTC
46fde6b Fix race condition between log_.erase and log_.back Summary: log_ contract specifies that it should not be modified unless both mutex_ and log_write_mutex_ are held. log_.erase however does that with only holding mutex_. This causes a race condition with two_write_queues since logs_.back is read with holding only log_write_mutex_ (which is correct according to logs_ contract) but logs_.erase is called concurrently. This is probably the cause of logs_.back returning nullptr in https://github.com/facebook/rocksdb/issues/3852 although I could not reproduce it. Fixes https://github.com/facebook/rocksdb/issues/3852 Closes https://github.com/facebook/rocksdb/pull/3859 Differential Revision: D8026103 Pulled By: maysamyabandeh fbshipit-source-id: ee394e00fe4aa520d884c5ef87981e9d6b5ccb28 16 May 2018, 20:01:33 UTC
42cb477 Fix geo_db may seek an error key when they have the same quadkey Summary: Closes https://github.com/facebook/rocksdb/pull/3832 Differential Revision: D7994326 Pulled By: miasantreble fbshipit-source-id: 84a81b35b97750360423a9d4eca5b5a14d002134 15 May 2018, 06:57:15 UTC
12ad711 Suppress tsan lock-order-inversion on FlushWAL Summary: TSAN reports a false alarm for lock-order-inversion in DBWriteTest.IOErrorOnWALWritePropagateToWriteThreadFollower but Open and FlushWAL are not run concurrently. Suppressing the error by skipping FlushWAL in the test until TSAN is fixed. The alternative would be to use ``` TSAN_OPTIONS="suppressions=tsan-suppressions.txt" ./db_write_test ``` but it does not seem straightforward to integrate it to our test infra. Closes https://github.com/facebook/rocksdb/pull/3854 Differential Revision: D8000202 Pulled By: maysamyabandeh fbshipit-source-id: fde33483d963a7ad84d3145123821f64960a4802 15 May 2018, 04:13:35 UTC
3d7dc75 Bottommost level-based compactions in bottom-pri pool Summary: This feature was introduced for universal compaction in cc01985d. At that point we thought it'd be used only to prevent long-running universal full compactions from blocking short-lived upper-level compactions. Now we have a level compaction user who could benefit from it since they use more expensive compression algorithm in the bottom level. So enable it for level. Closes https://github.com/facebook/rocksdb/pull/3835 Differential Revision: D7957179 Pulled By: ajkr fbshipit-source-id: 177285d2cef3b650b6a4d81dc5db84bc441c9fe4 14 May 2018, 21:57:15 UTC
ebb823f Fix db_stress build on mac Summary: I noticed, while debugging an unrelated issue, that db_stress is failing to build on mac, leading to a failed `make all`. ``` $ make db_stress -j4 ... tools/db_stress.cc:862:69: error: cannot initialize a parameter of type 'uint64_t *' (aka 'unsigned long long *') with an rvalue of type 'size_t *' (aka 'unsigned long *') status = FLAGS_env->GetFileSize(FLAGS_expected_values_path, &size); ^~~~~ ./include/rocksdb/env.h:277:66: note: passing argument to parameter 'file_size' here virtual Status GetFileSize(const std::string& fname, uint64_t* file_size) = 0; ^ 1 error generated. make: *** [tools/db_stress.o] Error 1 make: *** Waiting for unfinished jobs.... ``` Closes https://github.com/facebook/rocksdb/pull/3839 Differential Revision: D7979236 Pulled By: sagar0 fbshipit-source-id: 0615e7bb5405bade71e4203803bf723720422d62 14 May 2018, 18:14:07 UTC
718c1c9 Pass manual_wal_flush also to the first wal file Summary: Currently manual_wal_flush if set in the options will be used only for the wal files created during wal switch. The configuration thus does not affect the first wal file. The patch fixes that and also update the related unit tests. This PR is built on top of https://github.com/facebook/rocksdb/pull/3756 Closes https://github.com/facebook/rocksdb/pull/3824 Differential Revision: D7909153 Pulled By: maysamyabandeh fbshipit-source-id: 024ed99d2555db06bf096c902b998e432bb7b9ce 14 May 2018, 17:57:56 UTC
66c7aa3 Clarify the ownership of root db after TransactionDB::Open Summary: The patch clarifies the ownership of the root db after TransactionDB::Open. If it is a success the ownership if with the TransactionDB, and the root db will be deleted when the destructor of the base class, StackableDB, is called. If it is failure, the temporarily created root db will also be deleted properly. The patch also includes lots of useful formatting changes. Closes https://github.com/facebook/rocksdb/pull/3714 upon which this patch is built. Closes https://github.com/facebook/rocksdb/pull/3806 Differential Revision: D7878010 Pulled By: maysamyabandeh fbshipit-source-id: f54f3942e29434143ae5a2423ceec9c7072cd4c2 11 May 2018, 22:14:03 UTC
3272bc0 Fix formatting in log message Summary: Add missing space. Closes https://github.com/facebook/rocksdb/pull/3826 Differential Revision: D7956059 Pulled By: miasantreble fbshipit-source-id: 3aeba76385f8726399a3086c46de710636a31191 11 May 2018, 18:28:54 UTC
072ae67 Apply use_direct_io_for_flush_and_compaction to writes only Summary: Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used. This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously. Closes https://github.com/facebook/rocksdb/pull/3829 Differential Revision: D7915443 Pulled By: ajkr fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279 10 May 2018, 02:42:58 UTC
d19f568 Refactor argument handling in db_crashtest.py Summary: - Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option. - Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default - Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params. - Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication Closes https://github.com/facebook/rocksdb/pull/3809 Differential Revision: D7885779 Pulled By: ajkr fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8 09 May 2018, 20:42:41 UTC
3690276 Disallow to open RandomRW file if the file doesn't exist Summary: The only use of RandomRW is to change seqno when bulkloading, and in this use case, the file should exist. We should fail the file opening in this case. Closes https://github.com/facebook/rocksdb/pull/3827 Differential Revision: D7913719 Pulled By: siying fbshipit-source-id: 62cf6734f1a6acb9e14f715b927da388131c3492 09 May 2018, 17:27:26 UTC
ddfd252 Make BlockIter final Summary: Now BlockBasedTableIterator directly uses BlockIter. By making BlockIter final, we can prevent unintended virtual function overriding. Closes https://github.com/facebook/rocksdb/pull/3828 Differential Revision: D7933816 Pulled By: siying fbshipit-source-id: 026a08cb5c5b6d3d6f44743152b4251da4756f2c 09 May 2018, 17:27:26 UTC
f92cd2f Introduce and use the option to disable stall notifications structures Summary: and code. Removing this helps with insert performance. Closes https://github.com/facebook/rocksdb/pull/3830 Differential Revision: D7921030 Pulled By: siying fbshipit-source-id: 84e80d50a7ef96f5441c51c9a0d089c50217cce2 09 May 2018, 17:13:53 UTC
cee138c Add missing options in BuildColumnfamilyOptions Summary: soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit are added to BuildColumnfamilyOptions. Closes https://github.com/facebook/rocksdb/pull/3823 Differential Revision: D7909246 Pulled By: maysamyabandeh fbshipit-source-id: 89032efbf6b5bd302ea50cbd7a234977984a1fca 08 May 2018, 19:13:18 UTC
4bf169f Disable readahead when using mmap for reads Summary: `ReadaheadRandomAccessFile` had an unwritten assumption, which was that its wrapped file's `Read()` function always copies into the provided scratch buffer. Actually this was not true when the wrapped file was `PosixMmapReadableFile`, whose `Read()` implementation does no copying and instead returns a `Slice` pointing directly into the `mmap`'d memory region. This PR: - prevents `ReadaheadRandomAccessFile` from ever wrapping mmap readable files - adds an assert for the assumption `ReadaheadRandomAccessFile` makes about the wrapped file's use of scratch buffer Closes https://github.com/facebook/rocksdb/pull/3813 Differential Revision: D7891513 Pulled By: ajkr fbshipit-source-id: dc64a55222d6af280c39a1852ee39e9e9d7cde7d 08 May 2018, 19:13:18 UTC
1d9f24d Link jemalloc Summary: Fix undefined reference to `malloc_*` linking errors on Linux. Closes https://github.com/facebook/rocksdb/pull/3817 Differential Revision: D7899066 Pulled By: ajkr fbshipit-source-id: 18c46569a59608388d6240f1b8ec20c2d2557dec 07 May 2018, 21:28:36 UTC
9470ee4 Allows other cmake-specific "true" for USE_RTTI. Summary: People also use ON/OFF, TRUE/FALSE and other switch options that is allowed by cmake. Closes https://github.com/facebook/rocksdb/pull/3814 Differential Revision: D7899032 Pulled By: ajkr fbshipit-source-id: b71511af59e0a78eedafb639b5002c47050bf3c2 07 May 2018, 21:28:36 UTC
6d6e01c Search paths provided by intel's "tbbvars.sh". Summary: TBBROOT and LIBRARY_PATH are set in env by the script. With TBB 2018 the library path is $TBBROOT/lib/intel64/gcc4.7 for anything above gcc 4.7, which is both compiler and architecture related. We cannot simply do ${TBB_ROOT_DIR}/lib. Closes https://github.com/facebook/rocksdb/pull/3815 Differential Revision: D7899006 Pulled By: ajkr fbshipit-source-id: 159ab1f6a5c40452ed6aa8d79300206953d916c2 07 May 2018, 21:28:36 UTC
d72a51e Split FaultInjectionTest.FaultTest to avoid timeout Summary: tsan flavor of this test occasionally times out in our test infra. The patch split the test to two, each working on half of the option range. Before: [ OK ] FaultTest/FaultInjectionTest.FaultTest/0 (5918 ms) [ OK ] FaultTest/FaultInjectionTest.FaultTest/1 (5336 ms) After: [ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/0 (2930 ms) [ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/1 (2676 ms) [ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/2 (2759 ms) [ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/3 (2546 ms) Closes https://github.com/facebook/rocksdb/pull/3819 Differential Revision: D7894975 Pulled By: maysamyabandeh fbshipit-source-id: 809f1411cbcc27f8aa71a6b29a16b039f51b67c9 07 May 2018, 19:29:58 UTC
72942ad Recommit "Avoid adding tombstones of the same file to RangeDelAggregator multiple times" Summary: The origin commit #3635 will hurt performance for users who aren't using range deletions, because unneeded std::set operations, so it was reverted by commit 44653c7b7aabe821e671946e732dda7ae6b43d1b. (see #3672) To fix this, move the set to and add a check in , i.e., file will be added only if is non-nullptr. The db_bench command which find the performance regression: > ./db_bench --benchmarks=fillrandom,seekrandomwhilewriting --threads=1 --num=1000000 --reads=150000 --key_size=66 > --value_size=1262 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=1 --stats_per_interval=1 > --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --seed=1522388277 > -write_buffer_size=1048576 --level0_file_num_compaction_trigger=10000 --compression_type=none Before and after the modification, I re-run this command on the machine, the results of are as follows: **fillrandom** Table | P50 | P75 | P99 | P99.9 | P99.99 | ---- | --- | --- | --- | ----- | ------ | before commit | 5.92 | 8.57 | 19.63 | 980.97 | 12196.00 | after commit | 5.91 | 8.55 | 19.34 | 965.56 | 13513.56 | **seekrandomwhilewriting** Table | P50 | P75 | P99 | P99.9 | P99.99 | ---- | --- | --- | --- | ----- | ------ | before commit | 1418.62 | 1867.01 | 3823.28 | 4980.99 | 9240.00 | after commit | 1450.54 | 1880.61 | 3962.87 | 5429.60 | 7542.86 | Closes https://github.com/facebook/rocksdb/pull/3800 Differential Revision: D7874245 Pulled By: ajkr fbshipit-source-id: 2e8bec781b3f7399246babd66395c88619534a17 04 May 2018, 23:45:15 UTC
4c5a323 Fix db_stress memory leak ASAN error Summary: In case `--expected_values_path` is unset, we allocate a buffer internally to hold the expected DB state. This PR makes sure it is freed. Closes https://github.com/facebook/rocksdb/pull/3804 Differential Revision: D7874694 Pulled By: ajkr fbshipit-source-id: a8f7655e009507c4e639ceebfc3525d69c856e3b 04 May 2018, 23:45:15 UTC
fc522bd Evenly split HarnessTest.Randomized Summary: Currently HarnessTest.Randomized is already split but some of the splits are faster than the others. The reason is that each split takes a continuous range of the generated args and the test with later args takes longer to finish. The patch evenly split the args among splits in a round robin fashion. Before: ``` [ OK ] HarnessTest.Randomized1n2 (2278 ms) [ OK ] HarnessTest.Randomized3n4 (1095 ms) [ OK ] HarnessTest.Randomized5 (658 ms) [ OK ] HarnessTest.Randomized6 (1258 ms) [ OK ] HarnessTest.Randomized7 (6476 ms) [ OK ] HarnessTest.Randomized8 (8182 ms) ``` After ``` [ OK ] HarnessTest.Randomized1 (2649 ms) [ OK ] HarnessTest.Randomized2 (2645 ms) [ OK ] HarnessTest.Randomized3 (2577 ms) [ OK ] HarnessTest.Randomized4 (2490 ms) [ OK ] HarnessTest.Randomized5 (2553 ms) [ OK ] HarnessTest.Randomized6 (2560 ms) [ OK ] HarnessTest.Randomized7 (2501 ms) [ OK ] HarnessTest.Randomized8 (2574 ms) ``` Closes https://github.com/facebook/rocksdb/pull/3808 Differential Revision: D7882663 Pulled By: maysamyabandeh fbshipit-source-id: 09b749a9684b6d7d65466aa4b00c5334a49e833e 04 May 2018, 22:28:06 UTC
171f415 Rename vars to satisfy unity built Summary: Tested by "make unity_test" Closes https://github.com/facebook/rocksdb/pull/3807 Differential Revision: D7882657 Pulled By: maysamyabandeh fbshipit-source-id: 84862c18d7f2fc762bd96ad070eaeb6936e45159 04 May 2018, 22:28:06 UTC
4d40b10 Add USE_RTTI and default behavior to CMakeLists Summary: Proposed fix for #3701 Closes https://github.com/facebook/rocksdb/pull/3801 Differential Revision: D7868264 Pulled By: gfosco fbshipit-source-id: 013963ed3d172c8dc2abd1dd5982580082ca5d2d 04 May 2018, 22:13:03 UTC
back to top