https://github.com/facebook/rocksdb

sort by:
Revision Author Date Message Commit Date
f438b98 Update history for 5.17.2 release 12 November 2018, 19:57:32 UTC
fb40637 Bump version 02 November 2018, 22:23:50 UTC
05c9d53 Update TARGET file after changing buck template 02 November 2018, 22:21:10 UTC
832ab14 Change BUCK template files (#4624) Summary: Slightly changes the format of generated BUCK files for Facebook consumption. Generated targets end up looking like this: ``` cpp_library( name = "rocksdb_tools_lib", srcs = [ "tools/db_bench_tool.cc", "tools/trace_analyzer_tool.cc", "util/testutil.cc", ], auto_headers = AutoHeaders.RECURSIVE_GLOB, arch_preprocessor_flags = rocksdb_arch_preprocessor_flags, compiler_flags = rocksdb_compiler_flags, preprocessor_flags = rocksdb_preprocessor_flags, deps = [":rocksdb_lib"], external_deps = rocksdb_external_deps, ) ``` Instead of ``` cpp_library( name = "rocksdb_tools_lib", srcs = [ "tools/db_bench_tool.cc", "tools/trace_analyzer_tool.cc", "util/testutil.cc", ], headers = AutoHeaders.RECURSIVE_GLOB, arch_preprocessor_flags = rocksdb_arch_preprocessor_flags, compiler_flags = rocksdb_compiler_flags, preprocessor_flags = rocksdb_preprocessor_flags, deps = [":rocksdb_lib"], external_deps = rocksdb_external_deps, ) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4624 Reviewed By: riversand963 Differential Revision: D12906711 Pulled By: philipjameson fbshipit-source-id: 32ab64a3390cdcf2c4043ff77517ac1ad58a5e2b 02 November 2018, 22:20:35 UTC
516c7df Revert "Introduce CacheAllocator, a custom allocator for cache blocks (#4437)" This reverts commit bb3b2eb960a162e10d7730f4e0a08fd8801d6184. 31 October 2018, 21:55:11 UTC
bb3b2eb Introduce CacheAllocator, a custom allocator for cache blocks (#4437) Summary: This is a conceptually simple change, but it touches many files to pass the allocator through function calls. We introduce CacheAllocator, which can be used by clients to configure custom allocator for cache blocks. Our motivation is to hook this up with folly's `JemallocNodumpAllocator` (https://github.com/facebook/folly/blob/f43ce6d6866b7b994b3019df561109afae050ebc/folly/experimental/JemallocNodumpAllocator.h), but there are many other possible use cases. Additionally, this commit cleans up memory allocation in `util/compression.h`, making sure that all allocations are wrapped in a unique_ptr as soon as possible. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437 Differential Revision: D10132814 Pulled By: yiwu-arbug fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564 31 October 2018, 20:02:53 UTC
ae5305e Fix compile error with jemalloc (#4488) Summary: The "je_" prefix of jemalloc APIs presents only when the macro `JEMALLOC_NO_RENAME` from jemalloc.h presents. With the patch I'm also adding -DROCKSDB_JEMALLOC flag in buck TARGETS. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4488 Differential Revision: D10355971 Pulled By: yiwu-arbug fbshipit-source-id: 03a2d69790a44ac89219c7525763fa937a63d95a 30 October 2018, 19:37:06 UTC
f37ea82 Update HISTORY.md Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: 19 October 2018, 23:38:03 UTC
c954445 Handle mixed slowdown/no_slowdown writer properly (#4475) Summary: There is a bug when the write queue leader is blocked on a write delay/stop, and the queue has writers with WriteOptions::no_slowdown set to true. They are not woken up until the write stall is cleared. The fix introduces a dummy writer inserted at the tail to indicate a write stall and prevent further inserts into the queue, and a condition variable that writers who can tolerate slowdown wait on before adding themselves to the queue. The leader calls WriteThread::BeginWriteStall() to add the dummy writer and then walk the queue to fail any writers with no_slowdown set. Once the stall clears, the leader calls WriteThread::EndWriteStall() to remove the dummy writer and signal the condition variable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4475 Differential Revision: D10285827 Pulled By: anand1976 fbshipit-source-id: 747465e5e7f07a829b1fb0bc1afcd7b93f4ab1a9 19 October 2018, 23:29:57 UTC
619f754 Fix WriteBatchWithIndex's SeekForPrev() (#4559) Summary: WriteBatchWithIndex's SeekForPrev() has a bug that we internally place the position just before the seek key rather than after. This makes the iterator to miss the result that is the same as the seek key. Fix it by position the iterator equal or smaller. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4559 Differential Revision: D10468534 Pulled By: siying fbshipit-source-id: 2fb371ae809c561b60a1c11cef71e1c66fea1f19 19 October 2018, 22:31:49 UTC
f81fe96 update HISTORY.md and version number 16 October 2018, 22:43:44 UTC
7d3eb00 Properly determine a truncated CompactRange stop key (#4496) Summary: When a CompactRange() call for a level is truncated before the end key is reached, because it exceeds max_compaction_bytes, we need to properly set the compaction_end parameter to indicate the stop key. The next CompactRange will use that as the begin key. We set it to the smallest key of the next file in the level after expanding inputs to get a clean cut. Previously, we were setting it before expanding inputs. So we could end up recompacting some files. In a pathological case, where a single key has many entries spanning all the files in the level (possibly due to merge operands without a partial merge operator, thus resulting in compaction output identical to the input), this would result in an endless loop over the same set of files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4496 Differential Revision: D10395026 Pulled By: anand1976 fbshipit-source-id: f0c2f89fee29b4b3be53b6467b53abba8e9146a9 16 October 2018, 22:42:06 UTC
02d47c1 Avoid per-key linear scan over snapshots in compaction (#4495) Summary: `CompactionIterator::snapshots_` is ordered by ascending seqnum, just like `DBImpl`'s linked list of snapshots from which it was copied. This PR exploits this ordering to make `findEarliestVisibleSnapshot` do binary search rather than linear scan. This can make flush/compaction significantly faster when many snapshots exist since that function is called on every single key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4495 Differential Revision: D10386470 Pulled By: ajkr fbshipit-source-id: 29734991631227b6b7b677e156ac567690118a8b 16 October 2018, 20:11:32 UTC
0f17fb9 Update version macro for 5.17 (#4472) Summary: Forgot this in previous commit. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4472 Differential Revision: D10244227 Pulled By: gfosco fbshipit-source-id: ba0cf7a2f5271f0d9f9443004e2620887cd5fd11 08 October 2018, 23:26:12 UTC
9552660 Update HISTORY.md to current status (#4471) Summary: 5.16.x status wasn't tracked, and also updated for pending 5.17 release. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4471 Differential Revision: D10240925 Pulled By: gfosco fbshipit-source-id: 95ab368a04a65b201d2518097af69edf2402f544 08 October 2018, 18:21:55 UTC
016437d rocksdb: put `#pragma once` before `#ifdef` Summary: Work around upstream bug with modules: https://bugs.llvm.org/show_bug.cgi?id=39184. Reviewed By: yiwu-arbug Differential Revision: D10209569 fbshipit-source-id: 696853a02a3869e9c33d0e61168ad4b0436fa3c0 05 October 2018, 17:30:11 UTC
b6b7268 Revert "Introduce CacheAllocator, a custom allocator for cache blocks (#4437)" This reverts commit 1cf5deb8fdecb7f63ce5ce1a0e942222a95f881e. 04 October 2018, 22:17:30 UTC
a1f6142 VersionSet: GetOverlappingInputs() fix overflow and optimize. (#4385) Summary: This fix is for `level == 0` in `GetOverlappingInputs()`: - In `GetOverlappingInputs()`, if `level == 0`, it has potential risk of overflow if `i == 0`. - Optmize process when `expand = true`, the expected complexity can be reduced to O(n). Signed-off-by: JiYou <jiyou09@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/4385 Differential Revision: D10181001 Pulled By: riversand963 fbshipit-source-id: 46eef8a1d1605c9329c164e6471cd5c5b6de16b5 04 October 2018, 01:40:59 UTC
1cf5deb Introduce CacheAllocator, a custom allocator for cache blocks (#4437) Summary: This is a conceptually simple change, but it touches many files to pass the allocator through function calls. We introduce CacheAllocator, which can be used by clients to configure custom allocator for cache blocks. Our motivation is to hook this up with folly's `JemallocNodumpAllocator` (https://github.com/facebook/folly/blob/f43ce6d6866b7b994b3019df561109afae050ebc/folly/experimental/JemallocNodumpAllocator.h), but there are many other possible use cases. Additionally, this commit cleans up memory allocation in `util/compression.h`, making sure that all allocations are wrapped in a unique_ptr as soon as possible. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437 Differential Revision: D10132814 Pulled By: yiwu-arbug fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564 03 October 2018, 00:24:58 UTC
4e58b2e Check for compression lib support before test exec (#4443) Summary: Before running CompactFilesTest.SentinelCompressionType, we should check whether zlib and snappy are supported. CompactFilesTest.SentinelCompressionType is a newly added test. Compilation and linking with different options, e.g. COMPILE_WITH_TSAN, COMPILE_WITH_ASAN, etc. lead to generation of different binaries. On the one hand, it's not clear why zlib or snappy is present under ASAN, but not under TSAN. On the other hand, changing the compilation flags for TSAN or ASAN seems a bigger change worth much more attention. To unblock the cont-runs, I suggest that we simply add these two checks at the beginning of the test, as we did for GeneralTableTest.ApproximateOffsetOfCompressed in table/table_test.cc. Future actions include invesigating the absence of zlib and snappy when compiling with TSAN, i.e. COMPILE_WITH_TSAN=1, if necessary. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4443 Differential Revision: D10140935 Pulled By: riversand963 fbshipit-source-id: 62f96d1e685386accd2ef0b98f6f754d3fd67b3e 02 October 2018, 17:42:01 UTC
d78b289 Adding IOTA Foundation to USERS.MD (#4436) Summary: Adding IOTA Foundation to USERS.MD Pull Request resolved: https://github.com/facebook/rocksdb/pull/4436 Differential Revision: D10108142 Pulled By: sagar0 fbshipit-source-id: 948dc9f7169cec5c113ae347f1af765a41355aae 02 October 2018, 17:03:46 UTC
477107d Add proper newline markdown (#4434) Summary: Add newline for readability Pull Request resolved: https://github.com/facebook/rocksdb/pull/4434 Differential Revision: D10127684 Pulled By: riversand963 fbshipit-source-id: 39f3ed7eaea655b6ff83474bc9f7616c6ad59107 02 October 2018, 00:27:13 UTC
be5cc4c Remove a race condition between lsdir and rm (#4440) Summary: In DBCompactionTestWithParam::ManualLevelCompactionOutputPathId, there is a race condition between `DBTestBase::GetSstFileCount` and `DBImpl::PurgeObsoleteFiles`. The following graph explains why. ``` Timeline db_compact_test_t bg_flush_t bg_compact_t | [initiate bg flush and | start waiting] | flush | DeleteObsoleteFiles | [waken up by bg_flush_t which | signaled in DeleteObsoleteFiles] | | [initiate compaction and | start waiting] | | [compact, | set manual.done to true] | [signal at the end of | BackgroundCallFlush] | | [waken up by bg_flush_t | which signaled before | returning from | BackgroundCallFlush] | | Check manual.done is true | | GetSstFileCount <-- race condition --> PurgeObsoleteFiles V ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4440 Differential Revision: D10122628 Pulled By: riversand963 fbshipit-source-id: 3ede73c39fee6ad804dc6ac1ed84759c7e63977f 01 October 2018, 18:57:55 UTC
ac6f435 Fix CompactFiles support for kDisableCompressionOption (#4438) Summary: Previously `CompactFiles` with `CompressionType::kDisableCompressionOption` caused program to crash on assertion failure. This PR fixes the crash by adding support for that setting. Now, that setting will cause RocksDB to choose compression according to the column family's options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4438 Differential Revision: D10115761 Pulled By: ajkr fbshipit-source-id: a553c6fa76fa5b6f73b0d165d95640da6f454122 01 October 2018, 08:18:10 UTC
d6f2ecf Utility to run task periodically in a thread (#4423) Summary: Introduce `RepeatableThread` utility to run task periodically in a separate thread. It is basically the same as the the same class in fbcode, and in addition provide a helper method to let tests mock time and trigger execution one at a time. We can use this class to replace `TimerQueue` in #4382 and `BlobDB`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4423 Differential Revision: D10020932 Pulled By: yiwu-arbug fbshipit-source-id: 3616bef108c39a33c92eedb1256de424b7c04087 27 September 2018, 22:28:00 UTC
75ca138 FindFile: use std::lower_bound reduce the repeated code. (#4372) Summary: `FindFile()` and `FindFileInRange()` actually works as the same of `std::lower_bound()`. Use `std::lower_bound()` to reduce the repeated code. - change `FindFile()` and `FindFileInRange()` to use `std::lower_bound()` Signed-off-by: JiYou <jiyou09@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/4372 Differential Revision: D9919677 Pulled By: ajkr fbshipit-source-id: f74aaa30e2f80e410e299c5a5bca4eaf2a7a26de 27 September 2018, 17:35:00 UTC
b1dad4c assert in PosixEnv::FileExists should be based on errno (#4427) Summary: The assert in PosixEnv::FileExists is currently based on the return value of `access` syscall. Instead it should be based on errno. Initially I wanted to remove this assert as [`access`](https://linux.die.net/man/2/access) can error out in a few other cases (like EROFS). But on thinking more it feels like the assert is doing the right thing ... its good to crash on EROFS, EFAULT, EINVAL, and other major filesystem related problems so that the user is immediately aware of the problems while testing. (I think it might be ok to crash on EIO as well, but there might be a specific reason why it was decided not to crash for EIO, and I don't have that context. So letting the letting the assert checks remain as is for now). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4427 Differential Revision: D10037200 Pulled By: sagar0 fbshipit-source-id: 5cc96116a2e53cef701f444a8b5290576f311e51 26 September 2018, 20:25:15 UTC
d56070d Fix benchmark script with vector memtable (#4428) Summary: I guess we didn't update this script when `--allow_concurrent_memtable_write` became true by default. Fixes #4413. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4428 Differential Revision: D10036452 Pulled By: ajkr fbshipit-source-id: f464be0642bd096d9040f82cdc3eae614a902183 26 September 2018, 20:22:45 UTC
dc813e4 Improve log handling when recover without flush (#4405) Summary: Improve log handling when avoid_flush_during_recovery=true. 1. restore total_log_size_ after recovery, by summing up existing log sizes. Fixes #4253. 2. truncate the last existing log, since this log can contain preallocated space and it will be a waste to keep the space. It avoids a crash loop of user application cause a lot of log with non-trivial size being created and ultimately take up all disk space. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4405 Differential Revision: D9953933 Pulled By: yiwu-arbug fbshipit-source-id: 967780fee8acec7f358b6eb65190fb4684f82e56 26 September 2018, 17:37:48 UTC
17edc82 Handle tombstones at the same seqno in the CollapsedRangeDelMap (#4424) Summary: The CollapsedRangeDelMap was entirely mishandling tombstones at the same sequence number when the tombstones did not have identical start and end keys. Such tombstones are common since 90fc40690, which causes tombstones to be split during compactions. For example, if the tombstone [a, c) @ 1 lies across a compaction boundary at b, it will be split into [a, b) @ 1 and [b, c) @ 1. Without this patch, the collapsed range deletion map would look like this: a -> 1 b -> 1 c -> 0 Notice how the b -> 1 entry is redundant. When the tombstones overlap, the problem is even worse. Consider tombstones [a, c) @ 1 and [b, d) @ 1, which produces this map without this patch: a -> 1 b -> 1 c -> 0 d -> 0 This map is corrupt, as a map can never contain adjacent sentinel (zero) entries. When the iterator advances from b to c, it will notice that c is a sentinel enty and skip to d--but d is also a sentinel entry! Asking what tombstone this iterator points to will trigger an assertion, as it is not pointing to a valid tombstone. /cc ajkr Pull Request resolved: https://github.com/facebook/rocksdb/pull/4424 Differential Revision: D10039248 Pulled By: abhimadan fbshipit-source-id: 6d737c1e88d60e80cf27286726627ba44463e7f4 25 September 2018, 21:50:31 UTC
31d4699 Update TARGETS file template (#4426) Summary: Update template of TARGETS file according to recent changes in #4371 , #4363 and https://github.com/facebook/rocksdb/commit/dbf44c314b4adf3276afc1ca797b88944ca3162c. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4426 Differential Revision: D10025053 Pulled By: yiwu-arbug fbshipit-source-id: e6a0a702bfd401fc1af240ee446f5690f0bcd85d 25 September 2018, 21:14:01 UTC
3c350a7 Improve RangeDelAggregator benchmarks (#4395) Summary: Improve time measurements for AddTombstones to only include the call and not the VectorIterator setup. Also add a new add_tombstones_per_run flag to call AddTombstones multiple times per aggregator, which will help simulate more realistic workloads. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4395 Differential Revision: D9996811 Pulled By: abhimadan fbshipit-source-id: 5865a95c323fbd9b3606493013664b4890fe5a02 21 September 2018, 23:13:08 UTC
04d373b BlobDB: handle IO error on read (#4410) Summary: Fix IO error on read not being handle and crashing the DB. With the fix we properly return the error. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4410 Differential Revision: D9979246 Pulled By: yiwu-arbug fbshipit-source-id: 111a85675067a29c03cb60e9a34103f4ff636694 20 September 2018, 23:58:45 UTC
72712f4 Allow dynamic modification of window size and deletion trigger (#4403) Summary: Make the CompactOnDeletionCollectorFactory class public, and provide methods to update the window size and deletion trigger params. These will take effect on subsequent created SST files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4403 Differential Revision: D9976857 Pulled By: anand1976 fbshipit-source-id: 31dbf0511c12fa2bb9b2a7ba620079e0ee09cf48 20 September 2018, 22:15:28 UTC
02dc074 add GetAggregatedLongProperty for Java API (#4379) Summary: Add Java API `getAggregatedLongProperty(final String property)` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4379 Differential Revision: D9921463 Pulled By: sagar0 fbshipit-source-id: a02512e1b2aff4765a10b77de9a7bf7b1909d954 20 September 2018, 00:46:59 UTC
519f8b1 Generate appropriate number of keys in db_bench (#4404) Summary: If range tombstones are generated every few writes, the KeyGenerator's limit is now extended to account for the additional Next() calls. This is primarily important for `filluniquerandom` benchmarks that enforce the call limit. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4404 Differential Revision: D9949326 Pulled By: abhimadan fbshipit-source-id: 0bdfeb2cad2098dc0b8b029236dab5e4bef25e38 19 September 2018, 23:28:21 UTC
9b3cf90 add missing range in random.choice argument (#4397) Summary: This will fix the broken asan crash test: > Traceback (most recent call last): File "tools/db_crashtest.py", line 384, in <module> main() File "tools/db_crashtest.py", line 368, in main parser.add_argument("--" + k, type=type(v() if callable(v) else v)) File "tools/db_crashtest.py", line 59, in <lambda> "index_block_restart_interval": lambda: random.choice(1, 16), TypeError: choice() takes exactly 2 arguments (3 given) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397 Differential Revision: D9933041 Pulled By: miasantreble fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639 19 September 2018, 19:13:20 UTC
a0ebec3 Extend crash test with index_block_restart_interval (#4383) Summary: The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383 Differential Revision: D9887304 Pulled By: maysamyabandeh fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f 18 September 2018, 22:43:29 UTC
886766c Fix issue with docs/feed.xml validation (#4392) Summary: Per #4387 this should address the validation error with the link tag. This is a quick fix, a future iteration could significantly upgrade the jekyll integration. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4392 Differential Revision: D9923643 Pulled By: gfosco fbshipit-source-id: e7ed478e55c907add8319290326540e6e44fc0d6 18 September 2018, 20:43:32 UTC
990b52e Unit test for custom comparator RangeDelAggregator (#4388) Summary: Add a unit test for range collapsing when non-default comparator is used. This exposes the bug fixed in #4386. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4388 Differential Revision: D9918252 Pulled By: ajkr fbshipit-source-id: 99501b96b251eab41791a7e33b27055ee36c5c39 18 September 2018, 19:13:20 UTC
27221b0 use specified comparator in CollapsedRangeDelMap (#4386) Summary: The Comparator passed to CollapsedRangeDelMap was not used for operator less of the std::map `rep_` object contained in CollapsedRangeDelMap. So the map was always sorted using the default ByteWiseComparator, which seems wrong. Passing the specified Comparator through for usage in that map object fixes actual problems we were seeing with RangeDelete operations that do not delete keys as expected when using a custom Comparator. I found that the tests in current master crash when I run them locally, both with and without my patch, at the very same location. I therefore don't know if the patch breaks something else, but it seems to fix RangeDeletion issues in our product that uses RocksDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4386 Differential Revision: D9916506 Pulled By: ajkr fbshipit-source-id: 27bff8c775831f089dde8c5289df7343d88b2d66 18 September 2018, 16:28:30 UTC
65ac72e Fix bug in partition filters with format_version=4 (#4381) Summary: Value delta encoding in format_version 4 requires the differences between the size of two consecutive handles to be sent to BlockBuilder::Add. This applies not only to indexes on blocks but also the indexes on indexes and filters in partitioned indexes and filters respectively. The patch fixes a bug where the partitioned filters would encode the entire size of the handle rather than the difference of the size with the last size. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4381 Differential Revision: D9879505 Pulled By: maysamyabandeh fbshipit-source-id: 27a22e49b482b927fbd5629dc310c46d63d4b6d1 18 September 2018, 00:28:15 UTC
1626f6a Add RangeDelAggregator microbenchmarks (#4363) Summary: To measure the results of upcoming DeleteRange v2 work, this commit adds simple benchmarks for RangeDelAggregator. It measures the average time for AddTombstones and ShouldDelete calls. Using this to compare the results before #4014 and on the latest master (using the default arguments) produces the following results: Before #4014: ``` ======================= Results: ======================= AddTombstones: 1356.28 us ShouldDelete: 0.401732 us ``` Latest master: ``` ======================= Results: ======================= AddTombstones: 740.82 us ShouldDelete: 0.383271 us ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4363 Differential Revision: D9881676 Pulled By: abhimadan fbshipit-source-id: 793e7d61aa4b9d47eb917bbcc03f08695b5e5442 17 September 2018, 21:58:31 UTC
30c21df Fix regression test failures introduced by PR #4164 (#4375) Summary: 1. Add override keyword to overridden virtual functions in EventListener 2. Fix a memory corruption that can happen during DB shutdown when in read-only mode due to a background write error 3. Fix uninitialized buffers in error_handler_test.cc that cause valgrind to complain Pull Request resolved: https://github.com/facebook/rocksdb/pull/4375 Differential Revision: D9875779 Pulled By: anand1976 fbshipit-source-id: 022ede1edc01a9f7e21ecf4c61ef7d46545d0640 17 September 2018, 20:14:07 UTC
8c25204 Support manual flush in stress/crash tests (#4368) Summary: - Made stress test call `Flush()` periodically according to `--flush_one_in` flag. - Enabled by default in crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368 Differential Revision: D9838593 Pulled By: ajkr fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc 17 September 2018, 19:27:55 UTC
ac46790 Fix sync-point comment in Block destructor (#4380) Summary: This is a follow up to #4370. The earlier comment is not correct. Thanks to ajkr for pointing this out. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4380 Differential Revision: D9874667 Pulled By: sagar0 fbshipit-source-id: f4e092d86b29c715258210b770643d367e38caae 17 September 2018, 18:58:11 UTC
dfda910 Remove trace_analyzer_tool.cc from rocksdb_lib buck target (#4371) Summary: Including tools/trace_analyzer_tool.cc in rocksdb_lib was causing conflicts in dependent binaries due to duplicate gflag (other_prefix). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4371 Differential Revision: D9846953 Pulled By: anand1976 fbshipit-source-id: 80b4aa36ab8428b8f6dceb896c45532684102709 16 September 2018, 02:58:13 UTC
a27fce4 Auto recovery from out of space errors (#4164) Summary: This commit implements automatic recovery from a Status::NoSpace() error during background operations such as write callback, flush and compaction. The broad design is as follows - 1. Compaction errors are treated as soft errors and don't put the database in read-only mode. A compaction is delayed until enough free disk space is available to accomodate the compaction outputs, which is estimated based on the input size. This means that users can continue to write, and we rely on the WriteController to delay or stop writes if the compaction debt becomes too high due to persistent low disk space condition 2. Errors during write callback and flush are treated as hard errors, i.e the database is put in read-only mode and goes back to read-write only fater certain recovery actions are taken. 3. Both types of recovery rely on the SstFileManagerImpl to poll for sufficient disk space. We assume that there is a 1-1 mapping between an SFM and the underlying OS storage container. For cases where multiple DBs are hosted on a single storage container, the user is expected to allocate a single SFM instance and use the same one for all the DBs. If no SFM is specified by the user, DBImpl::Open() will allocate one, but this will be one per DB and each DB will recover independently. The recovery implemented by SFM is as follows - a) On the first occurance of an out of space error during compaction, subsequent compactions will be delayed until the disk free space check indicates enough available space. The required space is computed as the sum of input sizes. b) The free space check requirement will be removed once the amount of free space is greater than the size reserved by in progress compactions when the first error occured c) If the out of space error is a hard error, a background thread in SFM will poll for sufficient headroom before triggering the recovery of the database and putting it in write-only mode. The headroom is calculated as the sum of the write_buffer_size of all the DB instances associated with the SFM 4. EventListener callbacks will be called at the start and completion of automatic recovery. Users can disable the auto recov ery in the start callback, and later initiate it manually by calling DB::Resume() Todo: 1. More extensive testing 2. Add disk full condition to db_stress (follow-on PR) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164 Differential Revision: D9846378 Pulled By: anand1976 fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a 15 September 2018, 20:43:04 UTC
3db5840 Remove sync point from Block destructor (#4370) Summary: AddressSanitizer: heap-use-after-free in std::__atomic_base<bool>::load(std::memory_order) const ==1798517==ABORTING ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4370 Differential Revision: D9844146 Pulled By: sagar0 fbshipit-source-id: 18a2970b1d504b4f6c8fb04857f26e0f32124dd1 15 September 2018, 07:12:57 UTC
879998b Adjust c test and fix windows compilation issues Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4369 Differential Revision: D9844200 Pulled By: sagar0 fbshipit-source-id: 0d9f5f73b28234eaac55d3551ce4e2dc177af138 15 September 2018, 03:57:22 UTC
82e8e9e VersionBuilder: optmize SaveTo() to linear time. (#4366) Summary: Because `base_files` and `added_files` both are sorted, using a merge operation to these two sorted arrays is more effective. The complexity is reduced to linear time. - optmize the merge complexity. - move the `NDEBUG` of sorted `added_files` out of merge process. Signed-off-by: JiYou <jiyou09@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/4366 Differential Revision: D9833592 Pulled By: ajkr fbshipit-source-id: dd32b67ebdca4c20e5e9546ab8082cecefe99fd0 15 September 2018, 02:43:04 UTC
8959063 Store the return value of Fsync for check Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4361 Differential Revision: D9803723 Pulled By: riversand963 fbshipit-source-id: 5a0d4cd3e57fd195571dcd5822895ee00547fa6a 14 September 2018, 20:29:56 UTC
82057b0 Improve type conversion (#4367) Summary: Use `static_cast<type>(var)` instead of `(type)var`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4367 Differential Revision: D9833391 Pulled By: riversand963 fbshipit-source-id: 3d33fc2c290e7e0f3d1d45b256a881d1bc5a7df2 14 September 2018, 18:12:52 UTC
2353c5c Fix cross-filesystem checkpoint on Windows (#4365) Summary: Now port/win_env.cc do check error for cross device link creation. Fixes #4364 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4365 Differential Revision: D9833144 Pulled By: ajkr fbshipit-source-id: be7555e510f4b8d2196d843841606a6cfada7644 14 September 2018, 17:28:39 UTC
c94523e Delete code for WAL reader to start at nonzero offset (#4362) Summary: The code is dead in RocksDB as `log::Reader::initial_offset_` is always zero. We should delete it so we don't have to maintain it like in #4359. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4362 Differential Revision: D9817829 Pulled By: ajkr fbshipit-source-id: 474a2c679e5bd273b40608f3a5332931d9eefe6d 14 September 2018, 00:13:03 UTC
9022615 correct mistyped msg. (#4341) Summary: corrected the mistyped message. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4341 Differential Revision: D9816571 Pulled By: ajkr fbshipit-source-id: 1df0424e981a01470a638a37b925c4133d59a48b 13 September 2018, 21:57:38 UTC
0bd2ede Memory usage stats in C API (#4340) Summary: Please consider this small PR providing access to the `MemoryUsage::GetApproximateMemoryUsageByType` function in plain C API. Actually I'm working on Go application and now trying to investigate the reasons of high memory consumption (#4313). Go [wrappers](https://github.com/tecbot/gorocksdb) are built on the top of Rocksdb C API. According to the #706, `MemoryUsage::GetApproximateMemoryUsageByType` is considered as the best option to get database internal memory usage stats, but it wasn't supported in C API yet. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4340 Differential Revision: D9655135 Pulled By: ajkr fbshipit-source-id: a3d2f3f47c143ae75862fbcca2f571ea1b49e14a 13 September 2018, 21:27:31 UTC
9ea9007 Reduce IndexBlockIter size (#4358) Summary: With #3983 the size of IndexBlockIter was increased. This had resulted in a regression on P50 latencies in one of our benchmarks. The patch reduces IndexBlockIter size be eliminating active_comparator_ field from the class. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4358 Differential Revision: D9781737 Pulled By: maysamyabandeh fbshipit-source-id: 71e2b28d90ff0813db9e04b737ae73e185583c52 12 September 2018, 17:03:35 UTC
ca92fc7 Initialize uninitialized std::atomic variables Summary: Initialize uninitialized std::atomic variables Reviewed By: yfeldblum Differential Revision: D9758050 fbshipit-source-id: 865d89eddafc81f3cab6f11e2ebb669f7ff70d04 12 September 2018, 15:58:05 UTC
3ba3b15 Fix Makefile target 'jtest' on PowerPC (#4357) Summary: Before the fix: On a PowerPC machine, run the following ``` $ make jtest ``` The command will fail due to "undefined symbol: crc32c_ppc". It was caused by 'rocksdbjava' Makefile target not including crc32c_ppc object files when generating the shared lib. The fix is simple. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4357 Differential Revision: D9779474 Pulled By: riversand963 fbshipit-source-id: 3c5ec9068c2b9c796e6500f71cd900267064fd51 11 September 2018, 23:37:23 UTC
dbf44c3 Lint TARGETS files with buildifier Summary: Build file formatting Reviewed By: mzlee Differential Revision: D9728238 fbshipit-source-id: 99a266d5d2260eabfd63a200b2994c6850b59cf4 11 September 2018, 21:58:19 UTC
c86a22a Restrict RangeDelAggregator's tombstone end-key truncation (#4356) Summary: `RangeDelAggregator::AddTombstones` contained an assertion which stated that, if a range tombstone extended past the largest key in the sstable, then `FileMetaData::largest` must have a sentinel sequence number of `kMaxSequenceNumber`, which implies that the tombstone's end key is safe to truncate. However, `largest` will not be a sentinel key when the next sstable in the level's smallest key is equal to the current sstable's largest key, which caused the assertion to fail. The assertion must hold for the truncation to be safe, so it has been moved to an additional check on end-key truncation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4356 Differential Revision: D9760891 Pulled By: abhimadan fbshipit-source-id: 7c20c3885cd919dcd14f291f88fd27aa33defebc 11 September 2018, 00:42:43 UTC
3f52822 Skip concurrency control during recovery of pessimistic txn (#4346) Summary: TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346 Differential Revision: D9759149 Pulled By: maysamyabandeh fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b 10 September 2018, 23:57:53 UTC
faf529f env_librados.h: drop redundant #endif (#4354) Summary: without this change, rocksdb_env_librados_test fails to build. it's a regression introduced by 64324e32 Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/4354 Differential Revision: D9702665 Pulled By: riversand963 fbshipit-source-id: 65134eaff0543733210edfc77f89c96709da7a3f 07 September 2018, 18:12:44 UTC
655ef7d Inline doc for format_version 4 (#4350) Summary: Fixes #4337 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4350 Differential Revision: D9700871 Pulled By: maysamyabandeh fbshipit-source-id: fe1e07803783f34588dc14aba66d51117ca4a180 07 September 2018, 14:57:30 UTC
ced618c Fix a lint error due to unspecified move evaluation order (#4348) Summary: In C++ 11, the order of argument and move evaluation in a statement such as below is unspecified - foo(a.b).bar(std::move(a)) The compiler is free to evaluate std::move(a) first, and then a.b is unspecified. In C++ 17, this will be safe if a draft proposal around function chaining rules is accepted. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4348 Differential Revision: D9688810 Pulled By: anand1976 fbshipit-source-id: e4651d0ca03dcf007e50371a0fc72c0d1e710fb4 06 September 2018, 21:42:57 UTC
2c14662 Revert "Digest ZSTD compression dictionary once per SST file (#4251)" (#4347) Summary: Reverting is needed to unblock a user building against master, who is blocked for multiple days due to a thread-safety issue in `GetEmptyDict`. We haven't been able to fix it quickly, so reverting. Simply ran `git revert 6c40806e51a89386d2b066fddf73d3fd03a36f65`. There were no merge conflicts. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4347 Differential Revision: D9668365 Pulled By: ajkr fbshipit-source-id: 0c56334f0a23cf5ee0233d4e4679eae6709739cd 06 September 2018, 16:58:34 UTC
64324e3 Support pragma once in all header files and cleanup some warnings (#4339) Summary: As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited. Besides this, try to fix some warnings about loss of data. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339 Differential Revision: D9654990 Pulled By: ajkr fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848 06 September 2018, 01:13:31 UTC
90f5048 Remove warnings caused by unused variables in jni (#4345) Summary: Test plan ``` $make clean jclean $make -j32 rocksdbjavastatic $make -j32 rocksdbjava ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4345 Differential Revision: D9661256 Pulled By: riversand963 fbshipit-source-id: aed316c53b29d02fbdd3fa1063a3e832b8a66469 05 September 2018, 20:42:34 UTC
1a88c43 Reduce empty SST creation/deletion in compaction (#4336) Summary: This is a followup to #4311. Checking `!RangeDelAggregator::IsEmpty()` before opening a dedicated range tombstone SST did not properly prevent empty SSTs from being generated. That's because it relies on `CollapsedRangeDelMap::Size`, which had an underflow bug when the map was empty. This PR fixes that underflow bug. Also fixed an uninitialized variable in db_stress. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4336 Differential Revision: D9600080 Pulled By: ajkr fbshipit-source-id: bc6980ca79d2cd01b825ebc9dbccd51c1a70cfc7 31 August 2018, 19:28:52 UTC
462ed70 BlobDB: GetLiveFiles and GetLiveFilesMetadata return relative path (#4326) Summary: `GetLiveFiles` and `GetLiveFilesMetadata` should return path relative to db path. It is a separate issue when `path_relative` is false how can we return relative path. But `DBImpl::GetLiveFiles` don't handle it as well when there are multiple `db_paths`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4326 Differential Revision: D9545904 Pulled By: yiwu-arbug fbshipit-source-id: 6762d879fcb561df2b612e6fdfb4a6b51db03f5d 31 August 2018, 19:12:49 UTC
1cf17ba Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323) Summary: Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict. Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323 Differential Revision: D9599170 Pulled By: miasantreble fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029 31 August 2018, 01:42:51 UTC
3e801e5 BlobDB: Improve info log (#4324) Summary: Improve BlobDB info logs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4324 Differential Revision: D9545074 Pulled By: yiwu-arbug fbshipit-source-id: 678ab8820a78758fee451be3b123b0680c1081df 30 August 2018, 18:57:46 UTC
f46dd5c Remove trace_analyzer_tool from LIB_SOURCES (#4331) Summary: trace_analyzer_tool should only be in ANALYZER_LIB_SOURCES and not in LIB_SOURCES. This fixes java_test travis build failures seen in jtest. Blame: a6d3de4e7a29a19d9e5ef58a31d645f336258a75 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4331 Differential Revision: D9560377 Pulled By: sagar0 fbshipit-source-id: 6b9636201a920b56ee0f61e367fee5d3dca692b0 30 August 2018, 04:28:40 UTC
d00e5de use atomic O_CLOEXEC when available (#4328) Summary: In our application we spawn helper child processes concurrently with opening rocksdb. In one situation I observed that the child process had inherited the rocksdb lock file as well as directory handles to the rocksdb storage location. The code in env_posix takes care to set CLOEXEC but doesn't use `O_CLOEXEC` at the time that the files are opened which means that there is a window of opportunity to leak the descriptors across a fork/exec boundary. This diff introduces a helper that can conditionally set the `O_CLOEXEC` bit for the open call using the same logic as that in the existing helper for setting that flag post-open. I've preserved the post-open logic for systems that don't have `O_CLOEXEC`. I've introduced setting `O_CLOEXEC` for what appears to be a number of temporary or transient files and directory handles; I suspect that none of the files opened by Rocks are intended to be inherited by a forked child process. In one case, `fopen` is used to open a file. I've added the use of the glibc-specific `e` mode to turn on `O_CLOEXEC` for this case. While this doesn't cover all posix systems, it is an improvement for our common deployment system. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4328 Reviewed By: ajkr Differential Revision: D9553046 Pulled By: wez fbshipit-source-id: acdb89f7a85ca649b22fe3c3bd76f82142bec2bf 30 August 2018, 03:27:43 UTC
927f274 Avoiding write stall caused by manual flushes (#4297) Summary: Basically at the moment it seems it's possible to cause write stall by calling flush (either manually vis DB::Flush(), or from Backup Engine directly calling FlushMemTable() while background flush may be already happening. One of the ways to fix it is that in DBImpl::CompactRange() we already check for possible stall and delay flush if needed before we actually proceed to call FlushMemTable(). We can simply move this delay logic to separate method and call it from FlushMemTable. This is draft patch, for first look; need to check tests/update SyncPoints and most certainly would need to add allow_write_stall method to FlushOptions(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4297 Differential Revision: D9420705 Pulled By: mikhail-antonov fbshipit-source-id: f81d206b55e1d7b39e4dc64242fdfbceeea03fcc 29 August 2018, 19:12:55 UTC
5f63a89 data block hash index blog post Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4309 Differential Revision: D9557843 Pulled By: sagar0 fbshipit-source-id: 190e4ccedfaeaacd96d945610de843f97c307540 29 August 2018, 17:58:10 UTC
a876995 Grab straggler files to explicitly import AutoHeaders Summary: There were a few files that were missed when AutoHeaders were moved to their own file. Add explicit loads Reviewed By: yfeldblum Differential Revision: D9499942 fbshipit-source-id: 942bf3a683b8961e1b6244136f6337477dcc45af 29 August 2018, 04:28:55 UTC
4273363 Sync CURRENT file during checkpoint (#4322) Summary: For the CURRENT file forged during checkpoint, we were forgetting to `fsync` or `fdatasync` it after its creation. This PR fixes it. Differential Revision: D9525939 Pulled By: ajkr fbshipit-source-id: a505483644026ee3f501cfc0dcbe74832165b2e3 28 August 2018, 19:43:18 UTC
38ad3c9 BlobDB: Avoid returning garbage value on key not found (#4321) Summary: When reading an expired key using `Get(..., std::string* value)` API, BlobDB first read the index entry and decode expiration from it. In this case, although BlobDB reset the PinnableSlice, the index entry is stored in user provided string `value`. The value will be returned as a garbage value, despite status being NotFound. Fixing it by use a different PinnableSlice to read the index entry. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4321 Differential Revision: D9519042 Pulled By: yiwu-arbug fbshipit-source-id: f054c951a1fa98265228be94f931904ed7056677 27 August 2018, 23:28:39 UTC
6ed7f14 cmake: allow opting out debug runtime (#4317) Summary: Projects built in debug profile don't always link to debug runtime. Allowing opting out the debug runtime to make rocksdb get along well with other projects. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4317 Differential Revision: D9518038 Pulled By: sagar0 fbshipit-source-id: 384901a0d12b8de20759756e8a19b4888a27c399 27 August 2018, 22:58:59 UTC
a6d3de4 BlobDB: Implement DisableFileDeletions (#4314) Summary: `DB::DiableFileDeletions` and `DB::EnableFileDeletions` are used for applications to stop RocksDB background jobs to delete files while they are doing replication. Implement these methods for BlobDB. `DeleteObsolteFiles` now needs to check `disable_file_deletions_` before starting, and will hold `delete_file_mutex_` the whole time while it is running. `DisableFileDeletions` needs to wait on `delete_file_mutex_` for running `DeleteObsolteFiles` job and set `disable_file_deletions_` flag. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4314 Differential Revision: D9501373 Pulled By: yiwu-arbug fbshipit-source-id: 81064c1228f1724eff46da22b50ff765b16292cd 27 August 2018, 17:58:29 UTC
2f871bc Download bzip2 packages from Internet Archive (#4306) Summary: Since bzip.org is no longer maintained, download the bzip2 packages from a snapshot taken by the internet archive until we figure out a more credible source. Fixes issue: #4305 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4306 Differential Revision: D9514868 Pulled By: sagar0 fbshipit-source-id: 57c6a141a62e652f94377efc7ca9916b458e68d5 27 August 2018, 16:58:24 UTC
198459c Fix an inaccurate comment (#4315) Summary: According to https://github.com/facebook/rocksdb/blob/4848bd0c4e98713bf5ae72a36057e188c53206f8/db/log_reader.cc#L355, the original text is misleading when describing the layout of RecyclableLogHeader. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4315 Differential Revision: D9505284 Pulled By: riversand963 fbshipit-source-id: 79994c37a69e7003f03453e7efc0186feeafa609 25 August 2018, 01:13:20 UTC
4848bd0 Drop unnecessary deletion markers during compaction (issue - 3842) (#4289) Summary: This PR fixes issue 3842. We drop deletion markers iff 1. We are the bottom most level AND 2. All other occurrences of the key are in the same snapshot range as the delete I've also enhanced db_stress_test to add an option that does a full compare of the keys. This is done by a single thread (thread # 0). For tests I've run (so far) make check -j64 db_stress db_stress --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify that new code doesnt break existing tests */ ./db_stress --compare_full_db_state_snapshot=true --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify new test code */ Pull Request resolved: https://github.com/facebook/rocksdb/pull/4289 Differential Revision: D9491165 Pulled By: shrikanthshankar fbshipit-source-id: ce144834f31736c189aaca81bed356ba990331e2 24 August 2018, 22:17:54 UTC
8022500 Add compatibility test of SST ingestion (#4310) Summary: Test plan ``` $cd rocksdb/ $./tools/check_format_compatible.sh ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4310 Differential Revision: D9498125 Pulled By: riversand963 fbshipit-source-id: 83cf6992949a52199e7812bb41bc9281ac271a24 24 August 2018, 21:27:43 UTC
7daae51 Refactor flush request queueing and processing (#3952) Summary: RocksDB currently queues individual column family for flushing. This is not sufficient to support the needs of some applications that want to enforce order/dependency between column families, given that multiple foreground and background activities can trigger flushing in RocksDB. This PR aims to address this limitation. Each flush request is described as a `FlushRequest` that can contain multiple column families. A background flushing thread pops one flush request from the queue at a time and processes it. This PR does not enable atomic_flush yet, but is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752). Pull Request resolved: https://github.com/facebook/rocksdb/pull/3952 Differential Revision: D8529933 Pulled By: riversand963 fbshipit-source-id: 78908a21e389a3a3f7de2a79bae0cd13af5f3539 24 August 2018, 20:27:35 UTC
17f9a18 Reduce empty SST creation/deletion during compaction (#4311) Summary: I have a PR to start calling `OnTableFileCreated` for empty SSTs: #4307. However, it is a behavior change so should not go into a patch release. This PR adds back a check to make sure range deletions at least exist before starting file creation. This PR should be safe to backport to earlier versions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4311 Differential Revision: D9493734 Pulled By: ajkr fbshipit-source-id: f0d43cda4cfd904f133cfe3a6eb622f52a9ccbe8 24 August 2018, 19:27:57 UTC
e7bb8e9 Fix clang build of db_stress (#4312) Summary: Blame: #4307 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312 Differential Revision: D9494093 Pulled By: ajkr fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42 24 August 2018, 04:57:57 UTC
6c40806 Digest ZSTD compression dictionary once per SST file (#4251) Summary: In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file. ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary. There are a couple other changes included in this PR: - Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it. - Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251 Differential Revision: D9257078 Pulled By: ajkr fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438 24 August 2018, 02:28:18 UTC
ee234e8 Invoke OnTableFileCreated for empty SSTs (#4307) Summary: The API comment on `OnTableFileCreationStarted` (https://github.com/facebook/rocksdb/blob/b6280d01f9f9c4305c536dfb804775fce3956280/include/rocksdb/listener.h#L331-L333) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data. This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307 Differential Revision: D9485201 Pulled By: ajkr fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6 24 August 2018, 01:27:30 UTC
cf7150a Add the unit test of Iterator to trace_analyzer_test (#4282) Summary: Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282 Differential Revision: D9436758 Pulled By: zhichao-cao fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2 24 August 2018, 00:28:32 UTC
ad789e4 Adding a method for memtable class for memtable getting flushed. (#4304) Summary: Memtables are selected for flushing by the flush job. Currently we have listener which is invoked when memtables for a column family are flushed. That listener does not indicate which memtable was flushed in the notification. If clients want to know if particular data in the memtable was retired, there is no straight forward way to know this. This method will help users who implement memtablerep factory and extend interface for memtablerep, to know if the data in the memtable was retired. Another option that was tried, was to depend on memtable destructor to be called after flush to mark that data was persisted. This works all the time but sometimes there can huge delays between actual flush happening and memtable getting destroyed. Hence, if anyone who is waiting for data to persist will have to wait that longer. It is expected that anyone who is implementing this method to have return quickly as it blocks RocksDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4304 Reviewed By: riversand963 Differential Revision: D9472312 Pulled By: gdrane fbshipit-source-id: 8e693308dee749586af3a4c5d4fcf1fa5276ea4d 24 August 2018, 00:14:25 UTC
da40d45 DataBlockHashIndex: avoiding expensive iiter->Next when handling hash kNoEntry (#4296) Summary: When returning `kNoEntry` from HashIndex lookup, previously we invalidate the `biter` by set `current_=restarts_`, so that the search can continue to the next block in case the search result may reside in the next block. There is one problem: when we are searching for a missing key, if the search finds a `kNoEntry` and continue the search to the next block, there is also a non-trivial possibility that the HashIndex return `kNoEntry` too, and the expensive index iterator `Next()` will happen several times for nothing. The solution is that if the hash table returns `kNoEntry`, `SeekForGetImpl()` just search the last restart interval for the key. It will stop at the first key that is large than the seek_key, or to the end of the block, and each case will be handled correctly. Microbenchmark script: ``` TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillseq,readtocache,readmissing \ --cache_size=20000000000 --use_data_block_hash_index={true|false} ``` `readmissing` performance (lower is better): ``` binary: 3.6098 micros/op hash (before applying diff): 4.1048 micros/op hash (after applying diff): 3.3502 micros/op ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4296 Differential Revision: D9419159 Pulled By: fgwu fbshipit-source-id: 21e3eedcccbc47a249aa8eb4bf405c9def0b8a05 23 August 2018, 17:12:58 UTC
bb5dcea Add path to WritableFileWriter. (#4039) Summary: We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039 Differential Revision: D8670178 Pulled By: riversand963 fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a 23 August 2018, 17:12:58 UTC
f1f5ba0 add missing counters in readonly mode (#4260) Summary: User reported (https://github.com/facebook/rocksdb/issues/4168) that when opening RocksDB in read-only mode, some statistics are not correctly reported. After some investigation, we believe the following counters are indeed not reported during Get() call in a read-only DB: rocksdb.memtable.hit rocksdb.memtable.miss rocksdb.number.keys.read rocksdb.bytes.read As well as histogram rocksdb.bytes.per.read and perf context get_read_bytes This PR will add the necessary counter reporting logic in the Get() call path Pull Request resolved: https://github.com/facebook/rocksdb/pull/4260 Differential Revision: D9476431 Pulled By: miasantreble fbshipit-source-id: 7ab409d4e59df05d09ae8b69fe75554e5aa240d6 23 August 2018, 05:43:13 UTC
b6280d0 Require ZSTD 1.1.3+ to use dictionary trainer (#4295) Summary: ZSTD's dynamic library exports `ZDICT_trainFromBuffer` symbol since v1.1.3, and its static library exports it since v0.6.1. We don't know whether linkage is static or dynamic, so just require v1.1.3 to use dictionary trainer. Fixes the issue reported here: https://jira.mariadb.org/browse/MDEV-16525. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4295 Differential Revision: D9417183 Pulled By: ajkr fbshipit-source-id: 0e89d2f48d9e7f6eee73e7f4572660a9f7122db8 23 August 2018, 01:27:52 UTC
640cfa7 DataBlockHashIndex: fix comment in NumRestarts() (#4286) Summary: Improve the description of the backward compatibility check in NumRestarts() Pull Request resolved: https://github.com/facebook/rocksdb/pull/4286 Differential Revision: D9412490 Pulled By: fgwu fbshipit-source-id: ea7dd5c61d8ff8eacef623b729d4e4fd53cca066 22 August 2018, 00:12:45 UTC
4f12d49 Suppress clang analyzer error (#4299) Summary: Suppress multiple clang-analyzer error. All of them are clang false-positive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4299 Differential Revision: D9430740 Pulled By: yiwu-arbug fbshipit-source-id: fbdd575bdc214d124826d61d35a117995c509279 21 August 2018, 23:43:05 UTC
c9a0419 Release 5.16 (#4298) Summary: Update HISTORY.md for 5.16. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4298 Differential Revision: D9433868 Pulled By: anand1976 fbshipit-source-id: e7880a1c952210b1e9d7466eed72a6cb5018096b 21 August 2018, 21:43:08 UTC
back to top