swh:1:snp:5115096b921df712aeb2a08114fede57fb3331fb

sort by:
Revision Author Date Message Commit Date
408205a use user_key and iterate_upper_bound to determine compatibility of bloom filters (#3899) Summary: Previously in https://github.com/facebook/rocksdb/pull/3601 bloom filter will only be checked if `prefix_extractor` in the mutable_cf_options matches the one found in the SST file. This PR relaxes the requirement by checking if all keys in the range [user_key, iterate_upper_bound) all share the same prefix after transforming using the BF in the SST file. If so, the bloom filter is considered compatible and will continue to be looked at. Closes https://github.com/facebook/rocksdb/pull/3899 Differential Revision: D8157459 Pulled By: miasantreble fbshipit-source-id: 18d17cba56a1005162f8d5db7a27aba277089c41 26 June 2018, 22:57:26 UTC
967aa81 Create lgtm.yml for LGTM.com C/C++ analysis (#4058) Summary: As discussed with thatsafunnyname [here](https://discuss.lgtm.com/t/c-c-lang-missing-for-facebook-rocksdb/1079): this configuration enables C/C++ analysis for RocksDB on LGTM.com. The initial commit will contain a build command (simple `make`) that previously resulted in a build error. The build log will then be available on LGTM.com for you to investigate (if you like). I'll immediately add a second commit to this PR to correct the build command to `make static_lib`, which worked when I tested it earlier today. If you like you can also enable automatic code review in pull requests. This will alert you to any new code issues before they actually get merged into `master`. Here's an example of how that works for the AMPHTML project: https://github.com/ampproject/amphtml/pull/13060. You can enable it yourself here: https://lgtm.com/projects/g/facebook/rocksdb/ci/. I'll also add a badge to your README.md in a separate commit — feel free to remove that from this PR if you don't like it. (Full disclosure: I'm part of the LGTM.com team :slightly_smiling_face:. Ping samlanning) Closes https://github.com/facebook/rocksdb/pull/4058 Differential Revision: D8648410 Pulled By: ajkr fbshipit-source-id: 98d55fc19cff1b07268ac8425b63e764806065aa 26 June 2018, 19:43:04 UTC
2694b6d Remove unused imports, from python scripts. (#4057) Summary: Also remove redefined variable. As reported on https://lgtm.com/projects/g/facebook/rocksdb/ Closes https://github.com/facebook/rocksdb/pull/4057 Differential Revision: D8648342 Pulled By: ajkr fbshipit-source-id: afd2ba84d1364d316010179edd44777e64ca9183 26 June 2018, 19:43:04 UTC
a8e503e Fix universal compaction scheduling conflict with CompactFiles (#4055) Summary: Universal size-amp-triggered compaction was pulling the final sorted run into the compaction without checking whether any of its files are already being compacted. When all compactions are automatic, it is safe since it verifies the second-last sorted run is not already being compacted, which implies the last sorted run is also not being compacted (in automatic compaction multiple sorted runs are always compacted together). But with manual compaction, files in the last sorted run can be compacted independently, so the last sorted run also must be checked. We were seeing the below assertion failure in `db_stress`. Also the test case included in this PR repros the failure. ``` db_universal_compaction_test: db/compaction.cc:312: void rocksdb::Compaction::MarkFilesBeingCompacted(bool): Assertion `mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted' failed. Aborted (core dumped) ``` Closes https://github.com/facebook/rocksdb/pull/4055 Differential Revision: D8630094 Pulled By: ajkr fbshipit-source-id: ac3b30a874678b76e113d4f6c42c1260411b08f8 26 June 2018, 17:44:56 UTC
346d106 Align StatisticsImpl / StatisticsData (#4036) Summary: Pinned the alignment of StatisticsData to the cacheline size rather than just extending its size (which could go over two cache lines)if unaligned in allocation. Avoid compile errors in the process as per individual commit messages. strengthen static_assert to CACHELINE rather than the highest common multiple. Closes https://github.com/facebook/rocksdb/pull/4036 Differential Revision: D8582844 Pulled By: yiwu-arbug fbshipit-source-id: 363c37029f28e6093e06c60b987bca9aa204bc71 26 June 2018, 05:58:19 UTC
6d454d7 BlobDB: is_fifo=true also evict non-TTL blob files (#4049) Summary: Previously with is_fifo=true we only evict TTL file. Changing it to also evict non-TTL files from oldest to newest, after exhausted TTL files. Closes https://github.com/facebook/rocksdb/pull/4049 Differential Revision: D8604597 Pulled By: yiwu-arbug fbshipit-source-id: bc4209ee27c1528ce4b72833e6f1e1bff80082c1 26 June 2018, 05:43:05 UTC
189f0c2 Make BlockBasedTableIterator compaction-aware (#4048) Summary: Pass in `for_compaction` to `BlockBasedTableIterator` via `BlockBasedTableReader::NewIterator`. In 7103559f49b46b3287973045f741c0679e3e9e44, `for_compaction` was set in `BlockBasedTable::Rep` via `BlockBasedTable::SetupForCompaction`. In hindsight it was not the right decision; it also caused TSAN to complain. Closes https://github.com/facebook/rocksdb/pull/4048 Differential Revision: D8601056 Pulled By: sagar0 fbshipit-source-id: 30127e898c15c38c1080d57710b8c5a6d64a0ab3 25 June 2018, 20:19:27 UTC
a71e467 Blob DB: enable readahead for garbage collection (#3648) Summary: Enable readahead for blob DB garbage collection, which should improve GC performance a little bit. Closes https://github.com/facebook/rocksdb/pull/3648 Differential Revision: D7383791 Pulled By: yiwu-arbug fbshipit-source-id: 642b3327f7105eca85986d3fb2d8f960a3d83cf1 24 June 2018, 06:12:00 UTC
2729dd7 Reclaim memory allocated to backup_engine. Summary: Closes https://github.com/facebook/rocksdb/pull/4045 Differential Revision: D8595609 Pulled By: riversand963 fbshipit-source-id: 5ba5954d804b82b0e7264b2e18e1da4c94103b53 24 June 2018, 00:12:14 UTC
80ade9a Pin top-level index on partitioned index/filter blocks (#4037) Summary: Top-level index in partitioned index/filter blocks are small and could be pinned in memory. So far we use that by cache_index_and_filter_blocks to false. This however make it difficult to keep account of the total memory usage. This patch introduces pin_top_level_index_and_filter which in combination with cache_index_and_filter_blocks=true keeps the top-level index in cache and yet pinned them to avoid cache misses and also cache lookup overhead. Closes https://github.com/facebook/rocksdb/pull/4037 Differential Revision: D8596218 Pulled By: maysamyabandeh fbshipit-source-id: 3a5f7f9ca6b4b525b03ff6bd82354881ae974ad2 22 June 2018, 22:27:46 UTC
c726f7f Fix dangling checkpoint pointer in db_stress (#4042) Summary: Fix db_stress failed to delete checkpoint pointer. It's caught by asan_crash test. Closes https://github.com/facebook/rocksdb/pull/4042 Differential Revision: D8592604 Pulled By: yiwu-arbug fbshipit-source-id: 7b2d67d5e3dfb05f71c33fcf320482303e97d3ef 22 June 2018, 18:43:50 UTC
64c85d0 Set DEBUG_LEVEL=0 for RocksJava Mac Release (#4040) Summary: Closes https://github.com/facebook/rocksdb/issues/2717 Closes https://github.com/facebook/rocksdb/pull/4040 Differential Revision: D8592058 Pulled By: sagar0 fbshipit-source-id: d01099a1067aa32659abb0b4bed641d919a3927e 22 June 2018, 17:57:48 UTC
795e663 option for timing measurement of non-blocking ops during compaction (#4029) Summary: For example calling CompactionFilter is always timed and gives the user no way to disable. This PR will disable the timer if `Statistics::stats_level_` (which is part of DBOptions) is `kExceptDetailedTimers` Closes https://github.com/facebook/rocksdb/pull/4029 Differential Revision: D8583670 Pulled By: miasantreble fbshipit-source-id: 913be9fe433ae0c06e88193b59d41920a532307f 22 June 2018, 04:28:05 UTC
0a5b16c Cleanup staging directory at start of checkpoint (#4035) Summary: - Attempt to clean the checkpoint staging directory before starting a checkpoint. It was already cleaned up at the end of checkpoint. But it wasn't cleaned up in the edge case where the process crashed while staging checkpoint files. - Attempt to clean the checkpoint directory before calling `Checkpoint::Create` in `db_stress`. This handles the case where checkpoint directory was created by a previous `db_stress` run but the process crashed before cleaning it up. - Use `DestroyDB` for cleaning checkpoint directory since a checkpoint is a DB. Closes https://github.com/facebook/rocksdb/pull/4035 Reviewed By: yiwu-arbug Differential Revision: D8580223 Pulled By: ajkr fbshipit-source-id: 28c667400e249fad0fdedc664b349031b7b61599 21 June 2018, 23:27:12 UTC
645e57c Assert for Direct IO at the beginning in PositionedRead (#3891) Summary: Moved the direct-IO assertion to the top in `PosixSequentialFile::PositionedRead`, as it doesn't make sense to check for sector alignments before checking for direct IO. Closes https://github.com/facebook/rocksdb/pull/3891 Differential Revision: D8267972 Pulled By: sagar0 fbshipit-source-id: 0ecf77c0fb5c35747a4ddbc15e278918c0849af7 21 June 2018, 21:58:01 UTC
58c2214 Update TARGETS file (#4028) Summary: -Wshorten-64-to-32 is invalid flag in fbcode. Changing it to -Warrowing. Closes https://github.com/facebook/rocksdb/pull/4028 Differential Revision: D8553694 Pulled By: yiwu-arbug fbshipit-source-id: 1523cbcb4c76cf1d2b10a4d28b5f58c78e6cb876 21 June 2018, 21:42:39 UTC
3974959 Fix a warning (treated as error) caused by type mismatch. Summary: Closes https://github.com/facebook/rocksdb/pull/4032 Differential Revision: D8573061 Pulled By: riversand963 fbshipit-source-id: 112324dcb35956d6b3ec891073f4f21493933c8b 21 June 2018, 18:13:09 UTC
7103559 Improve direct IO range scan performance with readahead (#3884) Summary: This PR extends the improvements in #3282 to also work when using Direct IO. We see **4.5X performance improvement** in seekrandom benchmark doing long range scans, when using direct reads, on flash. **Description:** This change improves the performance of iterators doing long range scans (e.g. big/full index or table scans in MyRocks) by using readahead and prefetching additional data on each disk IO, and storing in a local buffer. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan. **Implementation Details:** - Used `FilePrefetchBuffer` as the underlying buffer to store the readahead data. `FilePrefetchBuffer` can now take file_reader, readahead_size and max_readahead_size as input to the constructor, and automatically do readahead. - `FilePrefetchBuffer::TryReadFromCache` can now call `FilePrefetchBuffer::Prefetch` if readahead is enabled. - `AlignedBuffer` (which is the underlying store for `FilePrefetchBuffer`) now takes a few additional args in `AlignedBuffer::AllocateNewBuffer` to allow copying data from the old buffer. - Made sure not to re-read partial chunks of data that were already available in the buffer, from device again. - Fixed a couple of cases where `AlignedBuffer::cursize_` was not being properly kept up-to-date. **Constraints:** - Similar to #3282, this gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value). - Since the prefetched data is stored in a temporary buffer allocated on heap, this could increase the memory usage if you have many iterators doing long range scans simultaneously. - Enabled only for user reads, and disabled for compactions. Compaction reads are controlled by the options `use_direct_io_for_flush_and_compaction` and `compaction_readahead_size`, and the current feature takes precautions not to mess with them. **Benchmarks:** I used the same benchmark as used in #3282. Data fill: ``` TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes ``` Do a long range scan: Seekrandom with large number of nexts ``` TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -use_direct_reads -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram ``` ``` Before: seekrandom : 37939.906 micros/op 26 ops/sec; 29.2 MB/s (1636 of 1999 found) With this change: seekrandom : 8527.720 micros/op 117 ops/sec; 129.7 MB/s (6530 of 7999 found) ``` ~4.5X perf improvement. Taken on an average of 3 runs. Closes https://github.com/facebook/rocksdb/pull/3884 Differential Revision: D8082143 Pulled By: sagar0 fbshipit-source-id: 4d7a8561cbac03478663713df4d31ad2620253bb 21 June 2018, 18:13:08 UTC
524c6e6 Add file name info to SequentialFileReader. (#4026) Summary: We potentially need this information for tracing, profiling and diagnosis. Closes https://github.com/facebook/rocksdb/pull/4026 Differential Revision: D8555214 Pulled By: riversand963 fbshipit-source-id: 4263e06c00b6d5410b46aa46eb4e358ff2161dd2 21 June 2018, 15:42:24 UTC
14cee19 Support file ingestion in stress test (#4018) Summary: Once per `ingest_external_file_one_in` operations, uses SstFileWriter to create a file containing `ingest_external_file_width` consecutive keys. The file is named containing the thread ID to avoid clashes. The file is then added to the DB using `IngestExternalFile`. We can't enable it by default in crash test because `nooverwritepercent` and `test_batches_snapshot` both must be zero for the DB's whole lifetime. Perhaps we should setup a separate test with that config as range deletion also requires it. Closes https://github.com/facebook/rocksdb/pull/4018 Differential Revision: D8507698 Pulled By: ajkr fbshipit-source-id: 1437ea26fd989349a9ce8b94117241c65e40f10f 21 June 2018, 05:27:45 UTC
61d69d4 Hide jemalloc aligned allocation functions into .cc (#4025) Summary: so they could be overriden Closes https://github.com/facebook/rocksdb/pull/4025 Differential Revision: D8526287 Pulled By: siying fbshipit-source-id: 9537b299dc907b4d1eeaf77a8784b13cb058280d 20 June 2018, 00:12:06 UTC
28a9d89 Fix the bug with duplicate prefix in partition filters (#4024) Summary: https://github.com/facebook/rocksdb/pull/3764 introduced an optimization feature to skip duplicate prefix entires in full bloom filters. Unfortunately it also introduces a bug in partitioned full filters, where the duplicate prefix should still be inserted if it is in a new partition. The patch fixes the bug by resetting the duplicate detection logic each time a partition is cut. This bug could result into false negatives, which means that DB could skip an existing key. Closes https://github.com/facebook/rocksdb/pull/4024 Differential Revision: D8518866 Pulled By: maysamyabandeh fbshipit-source-id: 044f4d988e606a330ecafd8c79daceb68b8796bf 19 June 2018, 21:12:46 UTC
92ee335 BlockBasedTableIterator to keep BlockIter after out of upper bound (#4004) Summary: b555ed30a4a93b80a3ac4781c6721ab988e03b5b makes the BlockBasedTableIterator to be invalidated if the current position if over the upper bound. However, this can bring performance regression to the case of multiple Seek()s hitting the same data block but all out of upper bound. For example, if an SST file has a data block containing following keys : {a, z} The user sets the upper bound to be "x", and it executed following queries: Seek("b") Seek("c") Seek("d") Before the upper bound optimization, these queries always come to this same current data block of the iterator, but now inside each Seek() the data block is read from the block cache but is returned again. To prevent this regression case, we keep the current data block iterator if it is upper bound. Closes https://github.com/facebook/rocksdb/pull/4004 Differential Revision: D8463192 Pulled By: siying fbshipit-source-id: 8710628b30acde7063a097c3184d6c4333a8ef81 19 June 2018, 16:57:11 UTC
7f3a634 Support pipelined write in stress/crash tests Summary: Closes https://github.com/facebook/rocksdb/pull/4019 Differential Revision: D8508681 Pulled By: ajkr fbshipit-source-id: 23a3c07d642386446e322b02e69cdf70d12ef009 19 June 2018, 16:14:12 UTC
8585059 Support backup and checkpoint in db_stress (#4005) Summary: Add the `backup_one_in` and `checkpoint_one_in` options to periodically trigger backups and checkpoints. The directory names contain thread ID to avoid clashing with parallel backups/checkpoints. Enable checkpoint in crash test so our CI runs will use it. Didn't enable backup in crash test since it copies all the files which is too slow. Closes https://github.com/facebook/rocksdb/pull/4005 Differential Revision: D8472275 Pulled By: ajkr fbshipit-source-id: ff91bdc37caac4ffd97aea8df96b3983313ac1d5 19 June 2018, 02:28:18 UTC
de2c6fb Fix stderr processing in crash test (#4006) Summary: Fixed bug where `db_stress` output a line with a warning followed by a line with an error, and `db_crashtest.py` considered that a success. For example: ``` WARNING: prefix_size is non-zero but memtablerep != prefix_hash open error: Corruption: SST file is ahead of WALs ``` Closes https://github.com/facebook/rocksdb/pull/4006 Differential Revision: D8473463 Pulled By: ajkr fbshipit-source-id: 60461bdd7491d9d26c63f7d4ee522a0f88ba3de7 19 June 2018, 00:58:13 UTC
c766887 Fix ExternalSSTFileTest::OverlappingRanges test on Solaris Sparc (#4012) Summary: Fix of #4011 Closes https://github.com/facebook/rocksdb/pull/4012 Differential Revision: D8499173 Pulled By: sagar0 fbshipit-source-id: cbb2b90c544ed364a3640ea65835d577b2dbc5df 18 June 2018, 21:57:37 UTC
7b4b43f zLinux build error with gcc and IBM Java headers (#4013) Summary: `SetByteArrayRegion` does not have const source buffer thus compilation error. I have made that same as in other JNI files (const_cast). It was missing for new transaction functionality added recently. Closes https://github.com/facebook/rocksdb/pull/4013 Differential Revision: D8493290 Pulled By: sagar0 fbshipit-source-id: 14afedf365b111121bd11e68a8d546a1cae68b26 18 June 2018, 20:58:28 UTC
e5bee40 zLinux s390x support in JNI (#4009) Summary: Adding support for zLinux on s390x architecture in JNI. Closes https://github.com/facebook/rocksdb/pull/4009 Differential Revision: D8483750 Pulled By: siying fbshipit-source-id: e681657c27e7a28f1731e08e8570382de5deff44 18 June 2018, 16:57:02 UTC
e750dac Crash on Windows, because of shared_ptr reinterpret cast (#3999) Summary: For more details see #3998 Closes https://github.com/facebook/rocksdb/pull/3999 Differential Revision: D8458905 Pulled By: sagar0 fbshipit-source-id: d6e09182933253a08eaf81ac7cfe50ed3b6576c5 18 June 2018, 03:56:33 UTC
80bc359 Should only decode restart points for uncompressed blocks (#3996) Summary: The Block object assumes contents are uncompressed. Block's constructor tries to read the number of restarts, but does not get an accurate number when its contents are compressed, which is causing issues like https://github.com/facebook/rocksdb/issues/3843. This PR address this issue by skipping reconstruction of restart points when blocks are known to be compressed. Somehow the restart points can be read directly when Snappy is used and some tests (for example https://github.com/facebook/rocksdb/blob/master/db/db_block_cache_test.cc#L196) expects blocks to be fully constructed even when Snappy compression is used, so here we keep the restart point logic for Snappy. Closes https://github.com/facebook/rocksdb/pull/3996 Differential Revision: D8416186 Pulled By: miasantreble fbshipit-source-id: 002c0b62b9e5d89fb7736563d354ce0023c8cb28 16 June 2018, 02:26:58 UTC
c48764b Don't generate a notification for a 0 size SST (#4003) Summary: Don't call the OnTableFileCreated listener callback when a 0 size SST file gets created by Flush. Doing so causes an assertion failure in db_stress. It is also not correct behavior as we call env->DeleteFile() for such files right before the notification. Closes https://github.com/facebook/rocksdb/pull/4003 Differential Revision: D8461385 Pulled By: anand1976 fbshipit-source-id: ae92d4f921c2e2cff981ad58f4929ed8b609f35d 16 June 2018, 00:57:24 UTC
3fbc865 Add kOptionsStatistics to GetProperty() (#3966) Summary: Add a new DB property to DB::GetProperty(), which returns the option.statistics. Test is updated to pass. Closes https://github.com/facebook/rocksdb/pull/3966 Differential Revision: D8311139 Pulled By: zhichao-cao fbshipit-source-id: ea78f4727358c807b0e5a0ea62e09defb10ad9ac 16 June 2018, 00:28:01 UTC
7b5f7ff travis: osx install zstd lz4 snappy xz (#3893) Summary: test osx against the brew libraries zstd, lz4, snappy, xz. Closes https://github.com/facebook/rocksdb/pull/3893 Differential Revision: D8461988 Pulled By: siying fbshipit-source-id: cc2a8487bcb1e98ca05bddd3a509a6896258ccf8 15 June 2018, 23:57:30 UTC
906a602 Build and tests fixes for Solaris Sparc (#4000) Summary: Here are some fixes for build on Solaris Sparc. It is also fixing CRC test on BigEndian platforms. Closes https://github.com/facebook/rocksdb/pull/4000 Differential Revision: D8455394 Pulled By: ajkr fbshipit-source-id: c9289a7b541a5628139c6b77e84368e14dc3d174 15 June 2018, 19:42:53 UTC
f23fed1 Delay verify compaction output table (#3979) Summary: Verify table will load SST into `TableCache` it occupy memory & `TableCache`‘s capacity ... but no logic use them it's unnecessary ... so , we verify them after all sub compact finished Closes https://github.com/facebook/rocksdb/pull/3979 Differential Revision: D8389946 Pulled By: ajkr fbshipit-source-id: 54bd4f474f9e7b3accf39c3068b1f36a27ec4c49 15 June 2018, 19:42:53 UTC
4faaab7 Benchmark sine wave write rate limit (#3914) Summary: As mentioned at the [dev forum.](https://www.facebook.com/groups/rocksdb.dev/1693425187422655/) Let me know if you would like me to do any changes! Closes https://github.com/facebook/rocksdb/pull/3914 Differential Revision: D8452824 Pulled By: siying fbshipit-source-id: 56439b3228ecdcc5a199d5198eff2fab553be961 15 June 2018, 19:12:03 UTC
f5281a5 tools/check_format_compatible.sh to cover forward option reading too (#3994) Summary: Make sure that some recent releases can read master's option files while ignoring unknown options. Also add two more recent release branches. Closes https://github.com/facebook/rocksdb/pull/3994 Differential Revision: D8409499 Pulled By: siying fbshipit-source-id: 1b025f19ba288da0517f6b4572797573e23e23c2 15 June 2018, 18:12:29 UTC
fbe3b9e Udpate db_universal_compaction_test according to PR #3970 (#3995) Summary: The SST file sizes changed slightly after the improvement of PR #3970 which reduces the size of the properties block. Before PR #3970 a size ratio compaction included all of the first four flushed files but it only includes two files after. We increase the size_ratio universal compaction option to make that compaction include all four files again. Closes https://github.com/facebook/rocksdb/pull/3995 Differential Revision: D8426925 Pulled By: fgwu fbshipit-source-id: 1429c38672e9f4fb4d4881fd4b06db45c4861d62 15 June 2018, 17:42:21 UTC
1f32dc7 Check with PosixEnv before opening LOCK file (#3993) Summary: Rebased and resubmitting #1831 on behalf of stevelittle. The problem is when a single process attempts to open the same DB twice, the second attempt fails due to LOCK file held. If the second attempt had opened the LOCK file, it'll now need to close it, and closing causes the file to be unlocked. Then, any subsequent attempt to open the DB will succeed, which is the wrong behavior. The solution was to track which files a process has locked in PosixEnv, and check those before opening a LOCK file. Fixes #1780. Closes https://github.com/facebook/rocksdb/pull/3993 Differential Revision: D8398984 Pulled By: ajkr fbshipit-source-id: 2755fe66950a0c9de63075f932f9e15768041918 14 June 2018, 00:32:04 UTC
7497f99 Run manual compaction in stress/crash tests (#3936) Summary: - Add support to `db_stress` for `CompactRange` - Enable `CompactRange` and `CompactFiles` in crash tests Closes https://github.com/facebook/rocksdb/pull/3936 Differential Revision: D8230953 Pulled By: ajkr fbshipit-source-id: 208f9980b5bc8c204b1fa726e83791ad674e21e8 13 June 2018, 23:45:28 UTC
dd216dd Choose unique keys faster in db_stress (#3990) Summary: db_stress initialization randomly chooses a set of keys to not overwrite. It was doing it separately for each column family. That caused 30+ second initialization times for the non-simple crash tests, which have 10 CFs. This PR: - reuses the same set of randomly chosen no-overwrite keys across all CFs - logs a couple more timestamps so we can more easily see initialization time Closes https://github.com/facebook/rocksdb/pull/3990 Differential Revision: D8393821 Pulled By: ajkr fbshipit-source-id: d0b263a298df607285ffdd8b0983ff6575cc6c34 13 June 2018, 20:43:23 UTC
a720401 Avoid acquiring SyncPoint mutex when it is disabled (#3991) Summary: In `db_stress` profile the vast majority of CPU time is spent acquiring the `SyncPoint` mutex. I mistakenly assumed #3939 had fixed this mutex contention problem by disabling `SyncPoint` processing. But actually the lock was still being acquired just to check whether processing is enabled. We can avoid that overhead by using an atomic to track whether it's enabled. Closes https://github.com/facebook/rocksdb/pull/3991 Differential Revision: D8393825 Pulled By: ajkr fbshipit-source-id: 5bc4e3c722ee7304e7a9c2439998c456b05a6897 13 June 2018, 20:13:18 UTC
d82f142 Fix regression bug of Prev() with upper bound (#3989) Summary: A recent change pushed down the upper bound checking to child iterators. However, this causes the logic of following sequence wrong: Seek(key); if (!Valid()) SeekToLast(); Because !Valid() may be caused by upper bounds, rather than the end of the iterator. In this case SeekToLast() points to totally wrong places. This can cause wrong results, infinite loops, or segfault in some cases. This sequence is called when changing direction from forward to backward. And this by itself also implicitly happen during reseeking optimization in Prev(). Fix this bug by using SeekForPrev() rather than this sequuence, as what is already done in prefix extrator case. Closes https://github.com/facebook/rocksdb/pull/3989 Differential Revision: D8385422 Pulled By: siying fbshipit-source-id: 429e869990cfd2dc389421e0836fc496bed67bb4 12 June 2018, 23:57:36 UTC
9d34733 Fix argument mismatch in BlockBasedTableBuilder (#3974) Summary: The sixth argument should be `key_includes_seq` bool, the seventh a `GetContext*`. We were mistakenly passing the `GetContext*` as the sixth argument and relying on the default (nullptr) for the seventh. This would make statistics inaccurate, at least. Blame: 402b7aa0 Closes https://github.com/facebook/rocksdb/pull/3974 Differential Revision: D8344907 Pulled By: ajkr fbshipit-source-id: 3ad865a0541d6d30f75dfc726352788118cfe12e 12 June 2018, 20:57:44 UTC
9c7da96 Fix a crash in WinEnvIO::GetSectorSize (#3975) Summary: Fix a crash in `WinEnvIO::GetSectorSize` that happens on old Windows systems (e.g Windows 7). On old Windows systems that don't support querying StorageAccessAlignmentProperty using IOCTL_STORAGE_QUERY_PROPERTY, the flow calls a different DeviceIoControl with nullptr as lpBytesReturned. When the code reaches this point, we get an access violation. Closes https://github.com/facebook/rocksdb/pull/3975 Differential Revision: D8385186 Pulled By: ajkr fbshipit-source-id: fae4c9b4b0a52c8a10182e1b35bcaa30dc393bbb 12 June 2018, 20:45:18 UTC
3593275 Remove restart point from the properties_block (#3970) Summary: Property block will be read sequentially and cached in a heap located object, so there's no need for restart points. Thus we set the restart interval to infinity to save space. Closes https://github.com/facebook/rocksdb/pull/3970 Differential Revision: D8332586 Pulled By: fgwu fbshipit-source-id: 899c3267832a81d0f084ec2db6b387332f461134 12 June 2018, 19:57:37 UTC
f450294 Change db path for BlockBasedTableTest.BadOptions (#3965) Summary: BadOptions test creates a temporary db path changed to table_block_based_bad_options_test to avoid collide with that created by the PrefixAndWholeKeyTest Closes https://github.com/facebook/rocksdb/pull/3965 Differential Revision: D8316080 Pulled By: fgwu fbshipit-source-id: bb8e0fdfdb9abf0e5ce94494b4388cd1622ee032 08 June 2018, 19:57:14 UTC
3470c75 Fix build errors. Summary: Closes https://github.com/facebook/rocksdb/pull/3967 Differential Revision: D8322775 Pulled By: riversand963 fbshipit-source-id: bd73067bd5d3ed4627348f0685bc499359ad6442 07 June 2018, 22:43:09 UTC
23e1d23 Fixed the fprintf of uint64_t by using PRIu64 (#3963) Summary: Fixed the fprintf format of uint64_t by using PRIu64 in file tools/ldb_cmd.cc Closes https://github.com/facebook/rocksdb/pull/3963 Differential Revision: D8306179 Pulled By: zhichao-cao fbshipit-source-id: 597dcd55321576801bbf2cf4714736ebc4750a0c 07 June 2018, 18:44:48 UTC
0a0860a Refactoring db_stress.cc (#3902) Summary: We use `db_stress.cc` intensively to test and verify the behavior of RocksDB. Sometimes we need to add new tests for recently added features. Original `StressTest` class provides many general functionality that can be leveraged by other tests. Therefore, in this refactoring PR, I try to identify the general operations as well as operations that future tests most likely want to customize. Future tests can inherit `StressTest` and overriding the virtual functions to test custom logic. Closes https://github.com/facebook/rocksdb/pull/3902 Differential Revision: D8284607 Pulled By: riversand963 fbshipit-source-id: 019302d04665a2b18334b6d05d04a477168c8ea4 07 June 2018, 17:43:00 UTC
45b6bcc ZSTD compression: should also expect type = kZSTDNotFinalCompression (#3964) Summary: Depending on the compression type, `CompressBlock` calls the compress method for each compression type. It calls ZSTD_Compress for both kZSTD and kZSTDNotFinalCompression (https://github.com/facebook/rocksdb/blob/master/table/block_based_table_builder.cc#L169). However currently ZSTD_Compress only expects the type to be kZSTD and this is causing assert failures and crashes. The same also applies to ZSTD_Uncompress. Closes https://github.com/facebook/rocksdb/pull/3964 Differential Revision: D8308715 Pulled By: miasantreble fbshipit-source-id: e5125f53edb829c9c33733167bec74e4793d0782 07 June 2018, 06:42:29 UTC
b736521 Extend format 3 to partitioned index/filters (#3958) Summary: format_version 3 changes the format of index blocks by storing user keys instead of the internal keys, which saves 8-bytes per key. This patch extends the format to top-level indexes in partitioned index/filters. Closes https://github.com/facebook/rocksdb/pull/3958 Differential Revision: D8294615 Pulled By: maysamyabandeh fbshipit-source-id: 17666cc16b8076c363972e2308e31547e835f0fe 06 June 2018, 23:58:16 UTC
5504a05 Adding advisor Rules and parser scripts with unit tests. (#3934) Summary: This adds some rules in the tools/advisor/advisor/rules.ini (refer this for more information) file and corresponding python parser scripts for parsing the rules file and the rocksdb LOG and OPTIONS files. This is WIP for adding rules depending on ODS. The starting point of the script is the rocksdb/tools/advisor/advisor/rule_parser.py file. Closes https://github.com/facebook/rocksdb/pull/3934 Reviewed By: maysamyabandeh Differential Revision: D8304059 Pulled By: poojam23 fbshipit-source-id: 47f2a50f04d46d40e225dd1cbf58ba490f79e239 06 June 2018, 21:42:59 UTC
4420df4 Check conflict at output level in CompactFiles (#3926) Summary: CompactFiles checked whether the existing files conflicted with the chosen compaction. But it missed checking whether future files would conflict, i.e., when another compaction was simultaneously writing new files to the same range at the same output level. Closes https://github.com/facebook/rocksdb/pull/3926 Differential Revision: D8218996 Pulled By: ajkr fbshipit-source-id: 21cb00a6fed4c8c62d3ed2ff810962e6bdc2fdfb 05 June 2018, 21:14:05 UTC
f1592a0 run make format for PR 3838 (#3954) Summary: PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings. Run `make format` to fix formatting as suggested by siying . Also piggyback two changes: 1) fix singleton destruction order for windows and posix env 2) fix two clang warnings Closes https://github.com/facebook/rocksdb/pull/3954 Differential Revision: D8272041 Pulled By: miasantreble fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f 05 June 2018, 19:58:02 UTC
812c737 Fix performance regression in Get() for block-based tables (#3953) Summary: This fixes a regression in one of myrocks regression tests (readwhilewriting), introduced in https://github.com/facebook/rocksdb/commit/8bf555f487d1de84a4fb19cb97b9ae1a8dbebc60 This PR changes two lines of code: one of them actually fixes the observed regression, the other is a mostly unrelated small fix that I'm piggy-backing here. EDIT: Nevermind, it fixes one line. More details in inline comments. Closes https://github.com/facebook/rocksdb/pull/3953 Differential Revision: D8270664 Pulled By: al13n321 fbshipit-source-id: a7d91e196807d1e816551591257c700f70e4ccac 05 June 2018, 18:43:16 UTC
d0c38c0 Extend some tests to format_version=3 (#3942) Summary: format_version=3 changes the format of SST index. This is however not being tested currently since tests only work with the default format_version which is currently 2. The patch extends the most related tests to also test for format_version=3. Closes https://github.com/facebook/rocksdb/pull/3942 Differential Revision: D8238413 Pulled By: maysamyabandeh fbshipit-source-id: 915725f55753dd8e9188e802bf471c23645ad035 05 June 2018, 03:13:00 UTC
2210152 Fix singleton destruction order of PosixEnv and SyncPoint (#3951) Summary: Ensure the PosixEnv singleton is destroyed first since its destructor waits for background threads to all complete. This ensures background threads cannot hit sync points after the SyncPoint singleton is destroyed, which was previously possible. Closes https://github.com/facebook/rocksdb/pull/3951 Differential Revision: D8265295 Pulled By: ajkr fbshipit-source-id: 7738dd458c5d993a78377dd0420e82badada81ab 04 June 2018, 22:58:46 UTC
ab2254b Fix clang analyze Summary: This fixes the errors as reported here: https://github.com/facebook/rocksdb/pull/3941#issuecomment-394424043 Closes https://github.com/facebook/rocksdb/pull/3950 Differential Revision: D8263086 Pulled By: lth fbshipit-source-id: 5e148d489cab2153e5846d16979a0a1f2d677d57 04 June 2018, 21:44:23 UTC
f4b72d7 Provide a way to override windows memory allocator with jemalloc for ZSTD Summary: Windows does not have LD_PRELOAD mechanism to override all memory allocation functions and ZSTD makes use of C-tuntime calloc. During flushes and compactions default system allocator fragments and the system slows down considerably. For builds with jemalloc we employ an advanced ZSTD context creation API that re-directs memory allocation to jemalloc. To reduce the cost of context creation on each block we cache ZSTD context within the block based table builder while a new SST file is being built, this will help all platform builds including those w/o jemalloc. This avoids system allocator fragmentation and improves the performance. The change does not address random reads and currently on Windows reads with ZSTD regress as compared with SNAPPY compression. Closes https://github.com/facebook/rocksdb/pull/3838 Differential Revision: D8229794 Pulled By: miasantreble fbshipit-source-id: 719b622ab7bf4109819bc44f45ec66f0dd3ee80d 04 June 2018, 19:12:48 UTC
4f297ad Fix crash test check for direct I/O Summary: We need to keep the DB directory around since the direct IO check in "db_crashtest.py" relies on it existing. This PR fixes an issue where it was removed after each stress test run during the second half of whitebox crash testing. Closes https://github.com/facebook/rocksdb/pull/3946 Differential Revision: D8247998 Pulled By: ajkr fbshipit-source-id: 4e7cffbdab9b40df125e7842d0d59916e76261d3 04 June 2018, 04:42:12 UTC
50d7ac0 Fix test for rocksdb_lite: hide incompatible option kDirectIO Summary: Previous commit https://github.com/facebook/rocksdb/pull/3935 unhide a few test options which includes kDirectIO. However it's not supported by RocksDB lite. Need to hide this option from the lite build. Closes https://github.com/facebook/rocksdb/pull/3943 Differential Revision: D8242757 Pulled By: miasantreble fbshipit-source-id: 1edfad3a5d01a46bfb7eedee765981ebe02c500a 02 June 2018, 03:42:36 UTC
fea2b1d Copy Get() result when file reads use mmap Summary: For iterator reads, a `SuperVersion` is pinned to preserve a snapshot of SST files, and `Block`s are pinned to allow `key()` and `value()` to return pointers directly into a RocksDB memory region. This works for both non-mmap reads, where the block owns the memory region, and mmap reads, where the file owns the memory region. For point reads with `PinnableSlice`, only the `Block` object is pinned. This works for non-mmap reads because the block owns the memory region, so even if the file is deleted after compaction, the memory region survives. However, for mmap reads, file deletion causes the memory region to which the `PinnableSlice` refers to be unmapped. The result is usually a segfault upon accessing the `PinnableSlice`, although sometimes it returned wrong results (I repro'd this a bunch of times with `db_stress`). This PR copies the value into the `PinnableSlice` when it comes from mmap'd memory. We can tell whether the `Block` owns its memory using `Block::cachable()`, which is unset when reads do not use the provided buffer as is the case with mmap file reads. When that is false we ensure the result of `Get()` is copied. This feels like a short-term solution as ideally we'd have the `PinnableSlice` pin the mmap'd memory so we can do zero-copy reads. It seemed hard so I chose this approach to fix correctness in the meantime. Closes https://github.com/facebook/rocksdb/pull/3881 Differential Revision: D8076288 Pulled By: ajkr fbshipit-source-id: 31d78ec010198723522323dbc6ea325122a46b08 01 June 2018, 23:57:58 UTC
88c3ee2 Configure direct I/O statically in db_stress Summary: Previously `db_stress` attempted to configure direct I/O dynamically in `SetOptions()` which had multiple problems (ummm must've never been tested): - It's a DB option so SetDBOptions should've been called instead - It's not a dynamic option so even SetDBOptions would fail - It required enabling SyncPoint to mask O_DIRECT since it had no way to detect whether the DB directory was in tmpfs or not. This required locking that consumed ~80% of db_stress CPU. In this PR I delete the broken dynamic config and instead configure it statically, only enabling it if the DB directory truly supports O_DIRECT. Closes https://github.com/facebook/rocksdb/pull/3939 Differential Revision: D8238120 Pulled By: ajkr fbshipit-source-id: 60bb2deebe6c9b54a3f788079261715b4a229279 01 June 2018, 23:42:34 UTC
01e3c30 Extend existing unit tests to run with WriteUnprepared as well Summary: As titled. I have not extended the Compatibility tests because the new WAL markers are still unimplemented. Closes https://github.com/facebook/rocksdb/pull/3941 Differential Revision: D8238394 Pulled By: lth fbshipit-source-id: 980e3d44837bbf2cfa64047f9738f559dfac4b1d 01 June 2018, 21:58:41 UTC
89b3708 add c api rocksdb_sstfilewriter_file_size Summary: Closes https://github.com/facebook/rocksdb/pull/3922 Differential Revision: D8208528 Pulled By: ajkr fbshipit-source-id: d384fe53cf526f2aadc7b79a423ce36dbd3ff224 01 June 2018, 16:43:59 UTC
2a0dfaa fix PrefixExtractorChanged: pass raw pointer instead shared_ptr Summary: This should resolve the performance regression caused by the unnecessary copying of the shared_ptr. Closes https://github.com/facebook/rocksdb/pull/3937 Differential Revision: D8232330 Pulled By: miasantreble fbshipit-source-id: 7885bf7cd190b6f87164c52d6edd328298c13f97 01 June 2018, 04:42:50 UTC
44cf849 Fix the bug of some test scenarios being put after kEnd Summary: DBTestBase::OptionConfig includes the scenarios that unit tests could iterate over them by calling ChangeOptions(). Some of the options have been mistakenly put after kEnd which makes them essentially invisible to ChangeOptions() caller. This patch fixes it except for kUniversalSubcompactions which is left as TODO since it would break some unit tests. Closes https://github.com/facebook/rocksdb/pull/3935 Differential Revision: D8230748 Pulled By: maysamyabandeh fbshipit-source-id: edddb8fffcd161af1809fef24798ce118f8593db 01 June 2018, 02:28:00 UTC
2807678 c api set bottommost level compaction Summary: Closes https://github.com/facebook/rocksdb/pull/3928 Differential Revision: D8224962 Pulled By: ajkr fbshipit-source-id: 3caf463509a935bff46530f27232a85ae7e4e484 01 June 2018, 00:30:50 UTC
82089d5 DBImpl::FindObsoleteFiles() not to call GetChildren() on the same path Summary: DBImpl::FindObsoleteFiles() may call GetChildren() multiple times if different CFs are on the same path. Fix it. Closes https://github.com/facebook/rocksdb/pull/3885 Differential Revision: D8084634 Pulled By: siying fbshipit-source-id: b471fbc251f6a05e9243304dc14c0831060cc0b0 31 May 2018, 19:58:33 UTC
a35451e fix deadlock with enable_pipelined_write=true and max_successive_merges > 0 Summary: fix this https://github.com/facebook/rocksdb/issues/3916 Closes https://github.com/facebook/rocksdb/pull/3923 Differential Revision: D8215192 Pulled By: yiwu-arbug fbshipit-source-id: a4c2f839a91d92dc70906d2b7c6de0fe014a2422 31 May 2018, 18:13:14 UTC
aaac6cd Add write unprepared classes by inheriting from write prepared Summary: Closes https://github.com/facebook/rocksdb/pull/3907 Differential Revision: D8218325 Pulled By: lth fbshipit-source-id: ff32d8dab4a159cd2762876cba4b15e3dc51ff3b 31 May 2018, 17:47:42 UTC
727eb88 Compile error in db bench tool Summary: Small format error below causes build to fail. I believe that this : ``` fprintf(stderr, "num reads to do %lu\n", reads_); ``` Can be changed to this: ``` fprintf(stderr, "num reads to do %" PRIu64 "\n", reads_); ``` Successful build ``` CC utilities/blob_db/blob_dump_tool.o AR librocksdb_debug.a ar: creating archive librocksdb_debug.a /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: librocksdb_debug.a(rocks_lua_compaction_filter.o) has no symbols CC tools/db_bench.o CC tools/db_bench_tool.o tools/db_bench_tool.cc:4532:46: error: format specifies type 'unsigned long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat] fprintf(stderr, "num reads to do %lu\n", reads_); ~~~ ^~~~~~ %lld 1 error generated. make: *** [tools/db_bench_tool.o] Error 1 ``` ``` $ cd rocksdb $ make all $ g++ --version Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 9.1.0 (clang-902.0.39.1) Target: x86_64-apple-darwin17.5.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin ``` Closes https://github.com/facebook/rocksdb/pull/3909 Differential Revision: D8215710 Pulled By: siying fbshipit-source-id: 15e49fb02a818fec846e9f9b2a50e372b6b67751 31 May 2018, 01:01:36 UTC
4dd80de Remove tests from ROCKSDB_VALGRIND_RUN Summary: In order to make valgrind check test to pass in a day, remove some tests that run prohibitively slow under valgrind. Closes https://github.com/facebook/rocksdb/pull/3924 Differential Revision: D8210184 Pulled By: siying fbshipit-source-id: 5b06fb08f3cf57571d422d05a0dbddc9f9376f7a 30 May 2018, 23:15:16 UTC
a736255 Delete triggered compaction for universal style Summary: This is still WIP, but I'm hoping for early feedback on the overall approach. This patch implements deletion triggered compaction, which till now only worked for leveled, for universal style. SST files are marked for compaction by the CompactOnDeletionCollertor table property. This is expected to be used when free disk space is low and the user wants to reclaim space by deleting a bunch of keys. The deletions are expected to be dense. In such a situation, we want to avoid a full compaction due to its space overhead. The strategy used in this case is similar to leveled. We pick one file from the set of files marked for compaction. We then expand the inputs to a clean cut on the same level, and then pick overlapping files from the next non-mepty level. Picking files from the next level can cause the key range to expand, and we opportunistically expand inputs in the source level to include files wholly in this key range. The main side effect of this is that it breaks the property of no time range overlap between levels. This shouldn't break any functionality. Closes https://github.com/facebook/rocksdb/pull/3860 Differential Revision: D8124397 Pulled By: anand1976 fbshipit-source-id: bfa2a9dd6817930e991b35d3a8e7e61304ed3dcf 29 May 2018, 22:44:34 UTC
724855c Fix LRUCache missing null check on destruct Summary: Fix LRUCache missing null check on destruct. The check is needed if LRUCache::DisownData is called. Closes https://github.com/facebook/rocksdb/pull/3920 Differential Revision: D8191631 Pulled By: yiwu-arbug fbshipit-source-id: d5014f6e49b51692c18a25fb55ece935f5a023c4 29 May 2018, 22:13:09 UTC
cf826de Fix compilation error when OPT="-DROCKSDB_LITE". Summary: Closes https://github.com/facebook/rocksdb/pull/3917 Differential Revision: D8187733 Pulled By: riversand963 fbshipit-source-id: e4aa179cd0791ca77167e357f99de9afd4aef910 29 May 2018, 19:28:59 UTC
03cda53 Check for rep_->table_properties being nullptr Summary: The very old sst formats do not have table_properties and rep_->table_properties is thus nullptr. The recent patch in https://github.com/facebook/rocksdb/pull/3894 does not check for nullptr and hence makes it backward incompatible. This patch adds the check. Closes https://github.com/facebook/rocksdb/pull/3918 Differential Revision: D8188638 Pulled By: maysamyabandeh fbshipit-source-id: b1d986665ecf0b4d1c442adfa8a193b97707d47b 29 May 2018, 19:13:55 UTC
1c1bafa Fix VersionStorageInfo::EstimateLiveDataSize seg fault Summary: `HandleEstimateLiveDataSize`'s `need_out_of_mutex` is true https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/internal_stats.cc#L412-L413 so , is will ref a `SuperVersion` https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/db_impl.cc#L1896-L1908 so , the param `version` of `InternalStats::HandleEstimateLiveDataSize` is safe , but `cfd_->current()` is not safe ! https://github.com/facebook/rocksdb/blob/402b7aa07f0e6da4c1f0216ff2b2e50fd2e5eaac/db/internal_stats.cc#L790-L795 the `cfd_->current()` maybe invalid ... here's mongo-rocks crash backtrace ``` mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x41) [0x7fe3a3137c51] mongod(+0x2152E89) [0x7fe3a3136e89] mongod(+0x21534F6) [0x7fe3a31374f6] libpthread.so.0(+0xF5E0) [0x7fe39f5e45e0] mongod(rocksdb::InternalKeyComparator::Compare(rocksdb::Slice const&, rocksdb::Slice const&) const+0x17) [0x7fe3a22375a7] mongod(rocksdb::VersionStorageInfo::EstimateLiveDataSize() const+0x3AA) [0x7fe3a228daba] mongod(rocksdb::InternalStats::HandleEstimateLiveDataSize(unsigned long*, rocksdb::DBImpl*, rocksdb::Version*)+0x20) [0x7fe3a2250d70] mongod(rocksdb::DBImpl::GetIntPropertyInternal(rocksdb::ColumnFamilyData*, rocksdb::DBPropertyInfo const&, bool, unsigned long*)+0xEF) [0x7fe3a21e3dbf] ``` Closes https://github.com/facebook/rocksdb/pull/3912 Differential Revision: D8179944 Pulled By: yiwu-arbug fbshipit-source-id: 26f314a8f98f4c2dc4348745d759f26f0e8d95e1 28 May 2018, 18:27:08 UTC
402b7aa Exclude seq from index keys Summary: Index blocks have the same format as data blocks. The keys therefore similarly to the keys in the data blocks are internal keys, which means that in addition to the user key it also has 8 bytes that encodes sequence number and value type. This extra 8 bytes however is not necessary in index blocks since the index keys act as an separator between two data blocks. The only exception is when the last key of a block and the first key of the next block share the same user key, in which the sequence number is required to act as a separator. The patch excludes the sequence from index keys only if the above special case does not happen for any of the index keys. It then records that in the property block. The reader looks at the property block to see if it should expect sequence numbers in the keys of the index block.s Closes https://github.com/facebook/rocksdb/pull/3894 Differential Revision: D8118775 Pulled By: maysamyabandeh fbshipit-source-id: 915479f028b5799ca91671d67455ecdefbd873bd 26 May 2018, 01:42:43 UTC
8c3bf08 Check status when reading HashIndexPrefixesMetadataBlock Summary: This was missed in a refactor of `ReadBlockContents` (2f1a3a4). Closes https://github.com/facebook/rocksdb/pull/3906 Differential Revision: D8172648 Pulled By: ajkr fbshipit-source-id: 27e453b19795fea974bfed4721105be6f3a12090 26 May 2018, 00:42:51 UTC
4543417 Fix an issue with unnecessary capture in lambda expressions Summary: Closes https://github.com/facebook/rocksdb/issues/3900 Replaces https://github.com/facebook/rocksdb/pull/3901 I needed this to build v5.12.4 on Mac OS X (10.13.3). Closes https://github.com/facebook/rocksdb/pull/3904 Differential Revision: D8169357 Pulled By: sagar0 fbshipit-source-id: 85faac42168796e7def9250d0c221a9a03b84476 25 May 2018, 22:12:44 UTC
aa53579 Fix segfault caused by object premature destruction Summary: Please refer to earlier discussion in [issue 3609](https://github.com/facebook/rocksdb/issues/3609). There was also an alternative fix in [PR 3888](https://github.com/facebook/rocksdb/pull/3888), but the proposed solution requires complex change. To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed. Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised. Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion. Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler. Test plan ``` $ make -j16 $ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion ``` Expected behavior: All tests should pass. Closes https://github.com/facebook/rocksdb/pull/3898 Differential Revision: D8149421 Pulled By: riversand963 fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438 25 May 2018, 18:57:51 UTC
6e08916 Fix Fadvise on closed file when reads use mmap Summary: ```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called. Closes https://github.com/facebook/rocksdb/pull/2764 Differential Revision: D8152515 Pulled By: ajkr fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65 25 May 2018, 17:57:57 UTC
070319f add flush_before_backup parameter to c api rocksdb_backup_engine_create_new_backup Summary: Add flush_before_backup to rocksdb_backup_engine_create_new_backup. make c api able to control the flush before backup behavior. Closes https://github.com/facebook/rocksdb/pull/3897 Differential Revision: D8157676 Pulled By: ajkr fbshipit-source-id: 88998c62f89f087bf8672398fd7ddafabbada505 25 May 2018, 05:28:52 UTC
bc7e8d4 LRUCache midpoint insertion Summary: Implement midpoint insertion strategy where new blocks will be insert to the middle of LRU list, then move the head on the first hit in cache. Closes https://github.com/facebook/rocksdb/pull/3877 Differential Revision: D8100895 Pulled By: yiwu-arbug fbshipit-source-id: f4bd83cb8be469e5d02072cfc8bd66011391f3da 24 May 2018, 22:57:33 UTC
3db8504 Catchup with posix features Summary: Catch up with Posix features NewWritableRWFile must fail when file does not exists Implement Env::Truncate() Adjust Env options optimization functions Implement MemoryMappedBuffer on Windows. Closes https://github.com/facebook/rocksdb/pull/3857 Differential Revision: D8053610 Pulled By: ajkr fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e 24 May 2018, 22:13:04 UTC
c465509 port_posix: use posix_memalign() for aligned_alloc Summary: to workaround issue of http://tracker.ceph.com/issues/21422 . and in tcmalloc aligned_alloc and posix_memalign() are basically the same thing. the same applies to GNU glibc. fixes #3175 Signed-off-by: Kefu Chai <tchaikov@gmail.com> Closes https://github.com/facebook/rocksdb/pull/3862 Differential Revision: D8147930 Pulled By: yiwu-arbug fbshipit-source-id: 355afe93c4dd0a96a0d711ef190e8b86fbe8d11d 24 May 2018, 19:13:16 UTC
7a99c04 refactor constructor of LRUCacheShard Summary: Update LRUCacheShard constructor so that adding new params to it don't need to add extra SetXXX() methods. Closes https://github.com/facebook/rocksdb/pull/3896 Differential Revision: D8128618 Pulled By: yiwu-arbug fbshipit-source-id: 6afa715de1493a50de413678761a765e3af9b83b 24 May 2018, 01:57:42 UTC
01bcc34 Introduce library-independent default compression level Summary: Previously we were using -1 as the default for every library, which was legacy from our zlib options. That worked for a while, but after zstd introduced https://github.com/facebook/zstd/commit/a146ee04ae5866b948be0c1911418e0436d80cb4, it started giving poor compression ratios by default in zstd. This PR adds a constant to RocksDB public API, `CompressionOptions::kDefaultCompressionLevel`, which will get translated to the default value specific to the compression library being used in "util/compression.h". The constant uses a number that appears to be larger than any library's maximum compression level. Closes https://github.com/facebook/rocksdb/pull/3895 Differential Revision: D8125780 Pulled By: ajkr fbshipit-source-id: 2db157a89118cd4f94577c2f4a0a5ff31c8391c6 24 May 2018, 01:42:08 UTC
4011012 Specify the underlying type of enums. Summary: Explicitly specify the underlying type of enums help developers understand the physical storage. Closes https://github.com/facebook/rocksdb/pull/3892 Differential Revision: D8107027 Pulled By: riversand963 fbshipit-source-id: a00efecbba46df4a3c8eed0994a2d4972ad1a1d3 23 May 2018, 23:12:59 UTC
6c73a46 Fix a backward compatibility problem with table_properties being nullptr Summary: Currently when ldb built from master tries to open a DB from version 2.2, there will be a segfault because table_properties didn't exist back then. Closes https://github.com/facebook/rocksdb/pull/3890 Differential Revision: D8100914 Pulled By: miasantreble fbshipit-source-id: b255e8aedc54695432be2e704839c857dabdd65a 22 May 2018, 20:57:17 UTC
4420cb4 Fix Issue #3771: Slice ctor checks for nullptr and creates empty string Summary: Fix Issue #3771 : Check for nullptr in Slice constructor Slice ctor checks for nullptr and creates empty string if the string does not exist Closes https://github.com/facebook/rocksdb/pull/3887 Differential Revision: D8098852 Pulled By: ajkr fbshipit-source-id: 04471077defa9776ce7b8c389a61312ce31002fb 22 May 2018, 20:41:56 UTC
7db721b Avoid sleep in DBTest.GroupCommitTest to fix flakiness Summary: DBTest.GroupCommitTest would often fail when run under valgrind because its sleeps were insufficient to guarantee a group commit had multiple entries. Instead we can use sync point to force a leader to wait until a non-leader thread has enqueued its work, thus guaranteeing a leader can do group commit work for multiple threads. Closes https://github.com/facebook/rocksdb/pull/3883 Differential Revision: D8079429 Pulled By: ajkr fbshipit-source-id: 61dc50fad29d2c85547842f681288de60fa29049 22 May 2018, 19:16:25 UTC
fcb3101 Avoid single-deleting merge operands in db_stress Summary: I repro'd some of the "unexpected value" failures showing up in our CI lately and they always happened on keys that have a mix of single deletes and merge operands. The `SingleDelete()` API comment mentions it's incompatible with `Merge()`, so this PR prevents `db_stress` from mixing them. Closes https://github.com/facebook/rocksdb/pull/3878 Differential Revision: D8097346 Pulled By: ajkr fbshipit-source-id: 357a48c6a31156f4f8db3ce565638ad924c437a1 22 May 2018, 17:58:36 UTC
3db1ada PersistRocksDBOptions() to use WritableFileWriter Summary: By using WritableFileWriter rather than WritableFile directly, we can buffer multiple Append() calls to one write() file system call, which will be expensive to underlying Env without its own write buffering. Closes https://github.com/facebook/rocksdb/pull/3882 Differential Revision: D8080673 Pulled By: siying fbshipit-source-id: e0db900cb3c178166aa738f3985db65e3ae2cf1b 21 May 2018, 23:42:22 UTC
c3ebc75 Move prefix_extractor to MutableCFOptions Summary: Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users. This PR aims to make it possible to dynamically change bloom filter config. Closes https://github.com/facebook/rocksdb/pull/3601 Differential Revision: D7253114 Pulled By: miasantreble fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c 21 May 2018, 21:43:11 UTC
263ef52 Update ColumnFamilyTest for multi-CF verification Summary: Change `keys_` from `set<string>` to `vector<set<string>>` so that each column family's keys are stored in one set. ajkr When you have a chance, can you PTAL? Thanks! Closes https://github.com/facebook/rocksdb/pull/3871 Differential Revision: D8056447 Pulled By: riversand963 fbshipit-source-id: 650d0f9cad02b1bc005fc329ad76edbf053e6386 21 May 2018, 18:57:42 UTC
508a09f Print histogram count and sum in statistics string Summary: Previously it only printed percentiles, even though our histogram keeps track of count and sum (and more). There have been many times we want to know more than the percentiles. For example, we currently want sum of "rocksdb.compression.times.nanos" and sum of "rocksdb.decompression.times.nanos", which would allow us to know the relative cost of compression vs decompression. This PR adds count and sum to the string printed by `StatisticsImpl::ToString`. This is a bit risky as there are definitely parsers assuming the old format. I will mention it in HISTORY.md and hope for the best... Closes https://github.com/facebook/rocksdb/pull/3863 Differential Revision: D8038831 Pulled By: ajkr fbshipit-source-id: 0465b72e4b0cbf18ef965f4efe402601d16d5b5c 21 May 2018, 18:12:47 UTC
back to top