https://github.com/facebook/rocksdb

sort by:
Revision Author Date Message Commit Date
628a7fd History and version updated to 6.0.2 for Java compilation fixes. 23 April 2019, 22:20:22 UTC
fa1c26d Fix compilation on db_bench_tool.cc on Windows (#5227) Summary: I needed this change to be able to build the v6.0.1 release on Windows. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5227 Differential Revision: D15033815 Pulled By: sagar0 fbshipit-source-id: 579f3b8e694c34c0d43527eb2fa37175e37f5911 23 April 2019, 22:16:09 UTC
f9fc8c6 Fix build failures due to missing JEMALLOC_CXX_THROW macro (#5053) Summary: JEMALLOC_CXX_THROW is not defined for earlier versions of jemalloc (e.g. 3.6), causing builds to fail on some platforms. Fixing it. Closes #4869 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5053 Differential Revision: D14390034 Pulled By: sagar0 fbshipit-source-id: b2b7a03cd377201ef385eb521f65bae85c558055 23 April 2019, 22:16:09 UTC
662b8fe Update version and history for 6.0.1 26 March 2019, 18:55:31 UTC
37eb632 BlobDB::Open() should put all existing trash files to delete scheduler (#5103) Summary: Right now, BlobDB::Open() fails to put all trash files to delete scheduler, which causes some trash files permanently untracked. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5103 Differential Revision: D14606095 Pulled By: siying fbshipit-source-id: 41a9437a2948abb235c0ed85f9a04612d0e50183 26 March 2019, 18:35:42 UTC
bb2eb06 When closing BlobDB, should first wait for all background tasks (#5005) Summary: When closing a BlobDB, it only waits for background tasks to finish as the last thing, but the background task may access some variables that are destroyed. The fix is to introduce a shutdown function in the timer queue and call the function as the first thing when destorying BlobDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5005 Differential Revision: D14170342 Pulled By: siying fbshipit-source-id: 081e6a2d99b9765d5956cf6cdfc290c07270c233 26 March 2019, 18:35:26 UTC
e36748c Disable getApproximateSizes test (#5035) Summary: Disabling `org.rocksdb.RocksDBTest.getApproximateSizes` test as it is frequently crashing on travis (#5020). It will be re-enabled once the root-cause is found and fixed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5035 Differential Revision: D14294736 Pulled By: sagar0 fbshipit-source-id: e28bff0d143a58ad6c82991fec3d4cf8c0209995 25 March 2019, 18:42:24 UTC
524609d Fix DefaultEnvTest.incBackgroundThreadsIfNeeded test (#5021) Summary: `DefaultEnvTest.incBackgroundThreadsIfNeeded` jtest should assert that the number of threads is greater than or equal to the minimum number of threads. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5021 Differential Revision: D14268311 Pulled By: sagar0 fbshipit-source-id: 01fb32b5b3ce636451d162fa1a2bbc5bd1974682 25 March 2019, 18:42:07 UTC
a1f9583 Add missing functionality to RocksJava (#4833) Summary: This is my latest round of changes to add missing items to RocksJava. More to come in future PRs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4833 Differential Revision: D14152266 Pulled By: sagar0 fbshipit-source-id: d6cff67e26da06c131491b5cf6911a8cd0db0775 25 March 2019, 18:41:29 UTC
ffbcc0d Revert "Remove PlainTable's feature store_index_in_file (#4914)" (#5034) Summary: This reverts commit ee1818081ff4ca2a49a48cb4ca5b97665b8dcddf. We are not ready to deprecate this feature. revert it for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5034 Differential Revision: D14287246 Pulled By: siying fbshipit-source-id: e4beafdeaee1c94364fdaa6ba198218d158339f7 04 March 2019, 18:37:59 UTC
42c61ae [sync fix] Add defs.bzl 01 March 2019, 18:30:34 UTC
50eea98 Merge branch '6.0.fb' of github.com:facebook/rocksdb into 6.0.fb 28 February 2019, 19:32:29 UTC
c4f5d0a add GetStatsHistory to retrieve stats snapshots (#4748) Summary: This PR adds public `GetStatsHistory` API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: `stats_persist_period_sec` (default 10 minutes) controls the intervals between two snapshots are taken; `max_stats_history_count` (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if `stats_persist_period_sec` is set to 0. (This PR is the in-memory part of https://github.com/facebook/rocksdb/pull/4535) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4748 Differential Revision: D13961471 Pulled By: miasantreble fbshipit-source-id: ac836d401ecb84ea92216bf9966f969dedf4ad04 20 February 2019, 23:52:54 UTC
f1be6ae Update version and history for 6.0 20 February 2019, 18:11:27 UTC
48c8d84 Update version and history for 6.0 20 February 2019, 18:10:11 UTC
cf98df3 Change random seed for txn stress tests on each run (#5004) Summary: Currently the transaction stress tests use thread id as the seed. Since the thread ids are likely to be the same across multiple runs, the seed is thus going to be the same. The patch includes time in calculating the seed to help covering a very different part of state space in each run of the stress tests. To be able to reproduce the bug in case the stress tests failed, it also prints out the time that was used to calculate the seed value. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5004 Differential Revision: D14144356 Pulled By: maysamyabandeh fbshipit-source-id: 728ed522f550fc8b4f5f9f373259c05fe9a54556 20 February 2019, 03:58:55 UTC
0f4244f WritePrepared: Improve stress tests with slow threads (#4974) Summary: The transaction stress tests, stress a high concurrency scenario. In WritePrepared/WriteUnPrepared we need to also stress the scenarios where an inserting/reading transaction is very slow. This would stress the corner cases that the caching is not sufficient and other slower data structures are engaged. To emulate such cases we make use of slow inserter/verifier threads and also reduce the size of cache data structures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4974 Differential Revision: D14143070 Pulled By: maysamyabandeh fbshipit-source-id: 81eb674678faf9fae0f654cd60ebcc74e26aeee7 20 February 2019, 00:56:49 UTC
bcdc8c8 WritePrepared: max_evicted_seq_ update during commit cache lookup (#4955) Summary: max_evicted_seq_ could be updated in the middle of the read in ::IsInSnapshot. The code to be correct in presence of this update would be complicated. The patch simplifies it by checking the value of max_evicted_seq_ before and after looking into commit_cache_ and retries in the unlucky case that it was changed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4955 Differential Revision: D13999556 Pulled By: maysamyabandeh fbshipit-source-id: 7a1bdfa95ea8b5d8d73ddff3263ed31d7297b39c 20 February 2019, 00:14:08 UTC
93f7e7a Temporarily Disable DBTest2.PresetCompressionDict (#5003) Summary: DBTest2.PresetCompressionDict is flaky. Temparily disable it for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5003 Differential Revision: D14139505 Pulled By: siying fbshipit-source-id: ebf1872d364b76b2cb021b489ea2f17ee997116a 19 February 2019, 22:44:12 UTC
7d23210 Separate crash test with atomic flush (#4945) Summary: Currently crash test covers cases with and without atomic flush, but takes too long to finish. Therefore it may be a better idea to put crash test with atomic flush in a separate set of tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4945 Differential Revision: D13947548 Pulled By: riversand963 fbshipit-source-id: 177c6de865290fd650b0103408339eaa3f801d8c 19 February 2019, 22:08:39 UTC
3c5d1b1 Apply modernize-use-override (3) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. bypass-lint drop-conflicts Reviewed By: igorsugak Differential Revision: D14131816 fbshipit-source-id: f20e7f7cecf2e699d70f5fa036f72c0e3f59b50e 19 February 2019, 21:39:49 UTC
ed995c6 add whole key bloom filter support in memtables (#4985) Summary: MyRocks calls `GetForUpdate` on `INSERT`, for unique key check, and in almost all cases GetForUpdate returns empty result. For such cases, whole key bloom filter is helpful. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4985 Differential Revision: D14118257 Pulled By: miasantreble fbshipit-source-id: d35cb7109c62fd5ad541a26968e3a3e16d3e85ea 19 February 2019, 20:15:39 UTC
c2affcc Header logger should call LogHeader() (#4980) Summary: The info log header feature never worked well, because log level Header was not translated to Logger::LogHeader() call. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4980 Differential Revision: D14087283 Pulled By: siying fbshipit-source-id: 7e7d03ce35fa8d13d4ee549f46f7326f7bc0006d 16 February 2019, 00:59:36 UTC
26a33ee flush_job logs data size too (#4979) Summary: Right now when a flush is triggered, the memory consumption is logged but data size is not. It's useful to log both when we debug unexpected small flushed file size. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4979 Differential Revision: D14071979 Pulled By: siying fbshipit-source-id: 0cd60449c5205eb00e0fbc299084418f609904ed 16 February 2019, 00:33:19 UTC
4db46aa Fix LITE Build (#4989) Summary: LITE mode has EventListener to be an empty class. However in db_bench, it is used. When "override" is added to the functions, the build breaks. Fix it by keeping the listener empty in LITE mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4989 Differential Revision: D14108132 Pulled By: siying fbshipit-source-id: 80121aab35b1120e502b37b782301dd700692697 16 February 2019, 00:13:11 UTC
3231a2e Deprecate ttl option from CompactionOptionsFIFO (#4965) Summary: We introduced ttl option in CompactionOptionsFIFO when ttl-based file deletion (compaction) was supported only as part of FIFO Compaction. But with the extension of ttl semantics even to Level compaction, CompactionOptionsFIFO.ttl can now be deprecated. Instead we will start using ColumnFamilyOptions.ttl for FIFO compaction as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4965 Differential Revision: D14072960 Pulled By: sagar0 fbshipit-source-id: c98cc2ae695a28136295787cd88d36a220fc219e 15 February 2019, 17:51:41 UTC
ca89ac2 Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14090024 fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a 14 February 2019, 22:41:36 UTC
c8c8104 Dictionary compression for files written by SstFileWriter (#4978) Summary: If `CompressionOptions::max_dict_bytes` and/or `CompressionOptions::zstd_max_train_bytes` are set, `SstFileWriter` will now generate files respecting those options. I refactored the logic a bit for deciding when to use dictionary compression. Previously we plumbed `is_bottommost_level` down to the table builder and used that. However it was kind of confusing in `SstFileWriter`'s context since we don't know what level the file will be ingested to. Instead, now the higher-level callers (e.g., flush, compaction, file writer) are responsible for building the right `CompressionOptions` to give the table builder. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4978 Differential Revision: D14060763 Pulled By: ajkr fbshipit-source-id: dc802c327896df2b319dc162d6acc82b9cdb452a 14 February 2019, 19:23:55 UTC
4fc4420 Avoid using kInAtomicGroup tag for single-cf op (#4981) Summary: if an operation just involves a single column family, then we do not have to set the kInAtomicGroup tag when writing to MANIFEST. This change can fix a compatibility test failure, i.e. 5.15 and earlier cannot recognize kInAtomicGroup tag. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4981 Differential Revision: D14072687 Pulled By: riversand963 fbshipit-source-id: 46b0c61e399f16c6b7169de0b33430d0ed90d6d4 14 February 2019, 02:33:42 UTC
34b55dd Fix no compression CI test config (#4982) Summary: We should strip `-DZSTD` to prevent ZSTD from being used in the no compression tests, similarly to how we prevent all other compression libraries from being used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4982 Differential Revision: D14075349 Pulled By: ajkr fbshipit-source-id: 8bd861516cf28a568c2b701ad33d0bb658db93b2 14 February 2019, 00:47:01 UTC
51a9041 Add load statements to rocksdb TARGETS files Reviewed By: siying Differential Revision: D13993686 fbshipit-source-id: 0c55e8952307bcf457c1d78d527a0c86b59628e8 13 February 2019, 22:08:21 UTC
5af9446 Remove Lua compaction filter from RocksDB main repo (#4971) Summary: as title. For people who continue to need Lua compaction filter, you can copy the include/rocksdb/utilities/rocks_lua/lua_compaction_filter.h and utilities/lua/rocks_lua_compaction_filter.cc to your own codebase. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4971 Differential Revision: D14047468 Pulled By: riversand963 fbshipit-source-id: 9ad1a6484a7c94e478f1e108127a3184e4069f70 13 February 2019, 20:42:44 UTC
a69d4de Atomic ingest (#4895) Summary: Make file ingestion atomic. as title. Ingesting external SST files into multiple column families should be atomic. If a crash occurs and db reopens, either all column families have successfully ingested the files before the crash, or non of the ingestions have any effect on the state of the db. Also add unit tests for atomic ingestion. Note that the unit test here does not cover the case of incomplete atomic group in the MANIFEST, which is covered in VersionSetTest already. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4895 Differential Revision: D13718245 Pulled By: riversand963 fbshipit-source-id: 7df97cc483af73ad44dd6993008f99b083852198 13 February 2019, 03:16:17 UTC
33b3323 Add Java multiGet API for returning List<byte[]> (#1570) Summary: Closes https://github.com/facebook/rocksdb/pull/1570 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4797 Differential Revision: D13961770 Pulled By: sagar0 fbshipit-source-id: e34fd6250d0cd3ebb0bd688e8801fe8947fd464d 13 February 2019, 01:04:48 UTC
49ddd7e Stats should be logged in INFO level (#4977) Summary: Previously, stats were logged in warning level. This was done in that way because people reported that it wasn't logged in MyRocks. However, later we learned that it turns out to be due to a bug in MyRocks, which is fixed in https://github.com/facebook/mysql-5.6/commit/79bb705e74b239d7030b724ea6bbd635eceec531 Now we revert the stats logging to INFO level, so that it doesn't pollute the warning level logging. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4977 Differential Revision: D14058485 Pulled By: siying fbshipit-source-id: 19fab323c19d9bc88184287f209551f9a77ca0e6 13 February 2019, 00:54:55 UTC
eafb09a Fix issues found by Clang Analyzer (#4976) Summary: Fix issues found by Clang Analyzer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4976 Differential Revision: D14054211 Pulled By: sagar0 fbshipit-source-id: ec2053bae43af3b2ff3425306824c677e3ba70c2 12 February 2019, 21:59:44 UTC
c5a64cf Avoid fsync on the same directory in atomic flush (#4817) Summary: In `DBImpl::AtomicFlushMemTablesToOutputFiles`, we need to call fsync only once on the same data directory. If two column families share a common directory for their data, we call fsync only once. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4817 Differential Revision: D13543689 Pulled By: riversand963 fbshipit-source-id: 4701d77c96a47802fbf6cb9f3337ee65d46b95f5 12 February 2019, 20:28:36 UTC
62f70f6 Reduce scope of compression dictionary to single SST (#4952) Summary: Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio. So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include: - The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called. - After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up. - Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952 Differential Revision: D13967980 Pulled By: ajkr fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f 12 February 2019, 03:47:32 UTC
79496d7 Increment NUMBER_BLOCK_NOT_COMPRESSED when !GoodCompressionRatio (#4929) Summary: See https://github.com/facebook/rocksdb/issues/4884 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4929 Differential Revision: D14028333 Pulled By: sagar0 fbshipit-source-id: eed12bceae85385a34aaa6dd303bf0f53c4c7b06 12 February 2019, 01:56:23 UTC
d6b9b3b Enhance transaction_test_util with delays (#4970) Summary: Enhance ::Insert and ::Verify test functions to add artificial delay between prepare and commit, and take snapshot and reads respectively. A future PR will make use of these to improve stress tests to test against long-running transactions as well as long-running backup jobs. Also randomly sets set_snapshot to false for inserters to skip setting the snapshot in the initialization phase and let the snapshot be taken later explicitly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4970 Differential Revision: D14031342 Pulled By: maysamyabandeh fbshipit-source-id: b52b453751f0b25b81b23c48892bc1d152464cab 12 February 2019, 00:02:37 UTC
576d2d6 WritePrepared: relax assert in compaction iterator (#4969) Summary: If IsInSnapshot(seq2, snapshot) determines that the snapshot is released, the future queries IsInSnapshot(seq1, snapshot) could still return a definitive answer of true if for example seq1 is too old that is determined visible in all snapshots. This violates a recently added assert statement to compaction iterator. The patch relaxes the assert. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4969 Differential Revision: D14030998 Pulled By: maysamyabandeh fbshipit-source-id: 6db53db0e37d0a20e8997ef2c1004b8627614ab9 11 February 2019, 23:01:46 UTC
1218704 Fix `compression_zstd_max_train_bytes` coverage in stress test (#4957) Summary: Previously `finalize_and_sanitize` function was always zeroing out `compression_zstd_max_train_bytes`. It was only supposed to do that when non-ZSTD compression was used. But since `--compression_type` was an unknown argument (i.e., one that `db_crashtest.py` does not recognize and blindly forwards to `db_stress`), `finalize_and_sanitize` could not tell whether ZSTD was used. This PR fixes it simply by making `--compression_type` a known argument with snappy as default (same as `db_stress`). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4957 Differential Revision: D13994302 Pulled By: ajkr fbshipit-source-id: 1b0baea7331397822830970d3698642eb7a7df65 11 February 2019, 22:56:39 UTC
9144d1f WritePrepared: add private options to TransactionDBOptions (#4966) Summary: WritePreparedTransactionDB operates with more options which should not be configurable to avoid complicating it for the users. For testing purposes however we need to change the default value of this parameters. This patch makes these parameters private fields in TransactionDBOptions so that the existing ::Open API could use them seamlessly without however exposing them to the users. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4966 Differential Revision: D14015986 Pulled By: maysamyabandeh fbshipit-source-id: 13037efa7dfdd6f73ec7a19414b66571e044c633 11 February 2019, 22:44:02 UTC
2d049ab Checksum properties block for block-based table (#4956) Summary: Always enable properties block checksum verification for block-based table. For external SST file ingested with 'write_global_seqno==true', we use 'DecodeEntrySlow' to parse its blocks' contents so that the process will not die upon failing the assertion possibly caused by corruption. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4956 Differential Revision: D14012741 Pulled By: riversand963 fbshipit-source-id: 8b766e6f54b36f8f9e074c0e19e0926ec3cce186 11 February 2019, 19:50:01 UTC
5d9a623 Add a unit test to Ignorable manfiest record (#4964) Summary: https://github.com/facebook/rocksdb/pull/4960 introduced ignorable manfiest record. Adding a test to it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4964 Differential Revision: D14012667 Pulled By: siying fbshipit-source-id: e5f10ecc68dec2716e178d44f0fe2b76c3d857ef 11 February 2019, 19:20:24 UTC
08809f5 Implement trace sampling (#4963) Summary: Implement trace sampling to allow user to specify the sampling frequency, i.e. save one per how many requests, so that a user does not need to log all if he/she is interested in only a sampled set. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4963 Differential Revision: D14011190 Pulled By: tang-jianfeng fbshipit-source-id: 078b631d9319b67cb089dd2c30e21d0df8dc406a 09 February 2019, 02:08:18 UTC
10d1469 WritePrepared: fix ValidateSnapshot with long-running txn (#4961) Summary: ValidateSnapshot checks if another txn has committed a value to about-to-be-locked key since a particular snapshot. It applies an optimization of looking into only the memtable if snapshot seq is larger than the earliest seq in the memtables. With a long-running txn in WritePrepared, the prepared value might be flushed out to the disk and yet it commits after the snapshot, which breaks this optimization. The patch fixes that by disabling this optimization when the min_uncomitted seq at the time the snapshot was taken is lower than earliest seq in the memtables. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4961 Differential Revision: D14009947 Pulled By: maysamyabandeh fbshipit-source-id: 1d11679950326f7c4094b433e6b821b729f08850 09 February 2019, 02:01:25 UTC
39fb88f Reset size_ to 0 in PinnableSlice::Reset (#4962) Summary: It would avoid bugs if the reused PinnableSlice is not actually reassigned and yet the programmer makes conclusions based on the size of the Slice. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4962 Differential Revision: D14012710 Pulled By: maysamyabandeh fbshipit-source-id: 23f4e173386b5461fd5650f44cde470805f4e816 09 February 2019, 00:51:17 UTC
1a761e6 Add a placeholder in manifest indicating ignorable record (#4960) Summary: We want to reserve some right that some extra information added manifest in the future can be forward compatible by previous versions. Now we create a place holder for that. A bit in tag is added to indicate that a field can be safely ignored. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4960 Differential Revision: D14000484 Pulled By: siying fbshipit-source-id: cbf5bad3f9d5ec798f789806f244d1c20d3b66d6 08 February 2019, 19:33:11 UTC
f48758e Deprecate CompactionFilter::IgnoreSnapshots() = false (#4954) Summary: We found that the behavior of CompactionFilter::IgnoreSnapshots() = false isn't what we have expected. We thought that snapshot will always be preserved. However, we just realized that, if no snapshot is created while compaction starts, and a snapshot is created after that, the data seen from the snapshot can successfully be dropped by the compaction. This creates a strange behavior to the feature, which is hard to explain. Like what is documented in code comment, this feature is not very useful with snapshot anyway. The decision is to deprecate the feature. We keep the function to avoid to break users code. However, we will fail compactions if false is returned. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4954 Differential Revision: D13981900 Pulled By: siying fbshipit-source-id: 2db8c2c3865acd86a28dca625945d1481b1d1e36 08 February 2019, 00:57:33 UTC
cf3a671 Remove cuckoo hash memtable (#4953) Summary: Cuckoo Hash is less useful than we initially expected. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953 Differential Revision: D13979264 Pulled By: siying fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed 08 February 2019, 00:15:27 UTC
199fabc WritePrepared: non-atomic commit of delayed prepared (#4947) Summary: Commit of delayed prepared has two non-atomic steps: add to commit cache, remove from delayed_prepared_. Similarly in ::IsInSnapshot we read from commit cache first and then look into delayed_prepared_. Due to non-atomicity thus the reader might not find the prep_seq that is just committed neither in commit cache nor in delayed_prepared_. To fix that i) we check if there was any delayed prepared BEFORE looking into commit cache, ii) if there was, we complete the search steps to be these: i) commit cache, ii) delayed prepared, commit cache again. In this way if the first query to commit cache missed the commit, the 2nd will catch it. The cost of the redundant read from commit cache is paid only if delayed_prepared_ is nonempty which should be a very rare scenario. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4947 Differential Revision: D13952754 Pulled By: maysamyabandeh fbshipit-source-id: 8f47826b13f8ce154398d842028342423f4ca2b2 06 February 2019, 16:48:06 UTC
d9c9f3c db_bench: fix "micros/op" reporting (#4949) Summary: https://github.com/facebook/rocksdb/commit/4985a9f73b9fb8a0323fbbb06222ae1f758a6b1d#diff-e5276985b26a0551957144f4420a594bR511 changes the meaning of latency reporting from running time per query, to elapse_time / #ops, without providing a reason why. Considering that this is a counter-intuitive reporting, Reverting the change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4949 Differential Revision: D13964684 Pulled By: siying fbshipit-source-id: d6304d3d4b5a802daa292302623c7dbca9a680bc 06 February 2019, 01:20:02 UTC
71cae59 exclude test CompactFilesShouldTriggerAutoCompaction from ROCKSDB_LITE (#4950) Summary: This will fix the following build error: > db/db_test.cc: In member function ‘virtual void rocksdb::DBTest_CompactFilesShouldTriggerAutoCompaction_Test::TestBody()’: > db/db_test.cc:5462:8: error: ‘class rocksdb::DB’ has no member named ‘GetColumnFamilyMetaData’ > db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_data); > db/db_test.cc:5490:8: error: ‘class rocksdb::DB’ has no member named ‘GetColumnFamilyMetaData’ > db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_data); > db/db_test.cc:5499:8: error: ‘class rocksdb::DB’ has no member named ‘GetColumnFamilyMetaData’ > db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_data); Pull Request resolved: https://github.com/facebook/rocksdb/pull/4950 Differential Revision: D13965378 Pulled By: miasantreble fbshipit-source-id: a975435476fe555b1cd9d5da263ee3da3acdea56 06 February 2019, 01:01:11 UTC
00ed41d Allow copy for PerfContext objects (#4919) Summary: Existing implementation of PerfContext does not define copy constructor or assignment operator, which could potentially cause problems when user create copies and resets the builtin one. This PR address the issue by providing these two constructors with deep copy semantics. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4919 Differential Revision: D13960406 Pulled By: miasantreble fbshipit-source-id: 36aab5aaee65d4480f537e4e22148faa45e8e334 05 February 2019, 22:29:08 UTC
c9a52cb Fix potential DB hang while using CompactFiles (#4940) Summary: CompactFiles() may block auto compaction which could cuase DB hang when it reachs level0_stop_writes_trigger. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4940 Differential Revision: D13929648 Pulled By: cooldoger fbshipit-source-id: 10842df38df3bebf862cd1a120a88ce961fdd381 05 February 2019, 19:23:38 UTC
8fe0733 BYTES_READ stats miscount for NotFound cases (#4938) Summary: In NotFound cases, stats BYTES_READ and perf_context.get_read_bytes is still be increased. The amount increased will be whatever size of the string or PinnableSlice that users passed in as the output data structure. This is wrong. Fix this by not increasing these two counters. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4938 Differential Revision: D13908963 Pulled By: siying fbshipit-source-id: 60bce42e4fbb9862bba3da36dbc27b2963ea6162 05 February 2019, 18:53:35 UTC
31221bb Properly set upper bound of subcompaction output (#4879) (#4898) Summary: Fix the ouput overlap bug when using subcompactions, the upper bound of output file was extended incorrectly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4898 Differential Revision: D13736107 Pulled By: ajkr fbshipit-source-id: 21dca09f81d5f07bf2766bf566f9b50dcab7d8e3 05 February 2019, 18:20:16 UTC
dcb73e7 WritePrepared: release snapshot equal to max (#4944) Summary: WritePrepared maintains a list of snapshots that are <= max_evicted_seq_. Based on this list, old_commit_map_ is updated if an evicted commit entry overlaps with such snapshot. Such lists are garbage collected when the release of snapshot is reported to WritePreparedTxnDB, which is the next time max_evicted_seq_ is updated and yet the snapshot is not found is the list returned from DB. This logic was broken since ReleaseSnapshotInternal was using "< max_evicted_seq_" to cleanup old_commit_map_, which would leave a snapshot uncleaned if it "= max_evicted_seq_". The patch fixes that and adds a unit test to check for the bug. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4944 Differential Revision: D13945000 Pulled By: maysamyabandeh fbshipit-source-id: 0c904294f735911f52348a148bf1f945282fc17c 04 February 2019, 20:57:23 UTC
30468d8 Fix analyze error on possible un-initialized value (#4937) Summary: The patch fixes the following analyze error by checking the return status of ParseInternalKey. ``` db/merge_helper.cc:306:23: warning: The right operand of '==' is a garbage value assert(kTypeMerge == orig_ikey.type); ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4937 Differential Revision: D13908506 Pulled By: maysamyabandeh fbshipit-source-id: 68d7771e75519da3d4bd807fd231675ec12093f6 01 February 2019, 17:41:27 UTC
5924444 Zero seqnum of final key / drop final tombstone when compacting to bottommost level Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4927 Differential Revision: D13889458 Pulled By: mzhaom fbshipit-source-id: d6b66db85901a9eb90748fba6a9dc4e7457b9c5e 01 February 2019, 17:21:57 UTC
4091597 fix for nvme device path (#4866) Summary: nvme device path doesn't have "block" as like "nvme/nvme0/nvme0n1" or "nvme/nvme0/nvme0n1/nvme0n1p1". the last directory such as "nvme0n1p1" should be removed if nvme drive is partitioned. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4866 Differential Revision: D13627824 Pulled By: riversand963 fbshipit-source-id: 09ab968f349f3dbb890beea20193f1359b17d317 01 February 2019, 03:08:37 UTC
842cdc1 Use correct FileMeta for atomic flush result install (#4932) Summary: 1. this commit fixes our handling of a combination of two separate edge cases. If a flush job does not pick any memtable to flush (because another flush job has already picked the same memtables), and the column family assigned to the flush job is dropped right before RocksDB calls rocksdb::InstallMemtableAtomicFlushResults, our original code passes a FileMetaData object whose file number is 0, failing the assertion in rocksdb::InstallMemtableAtomicFlushResults (assert(m->GetFileNumber() > 0)). 2. Also piggyback a small change: since we already create a local copy of column family's mutable CF options to eliminate potential race condition with `SetOptions` call, we might as well use the local copy in other function calls in the same scope. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4932 Differential Revision: D13901322 Pulled By: riversand963 fbshipit-source-id: b936580af7c127ea0c6c19ea10cd5fcede9fb0f9 31 January 2019, 22:49:51 UTC
0ea5711 Fix `WriteBatchBase::DeleteRange` API comment (#4935) Summary: The `DeleteRange` end key is exclusive, not inclusive. Updated API comment accordingly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4935 Differential Revision: D13905406 Pulled By: ajkr fbshipit-source-id: f577db841a279427991ecf9005cd56b30c8eb3c7 31 January 2019, 22:43:40 UTC
35e5689 Take snapshots once for all cf flushes (#4934) Summary: FlushMemTablesToOutputFiles calls FlushMemTableToOutputFile for each column family. The patch moves the take-snapshot logic to outside FlushMemTableToOutputFile so that it does it once for all the flushes. This also addresses a deadlock issue for resetting the managed snapshot of job_snapshot in the 2nd call to FlushMemTableToOutputFile. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4934 Differential Revision: D13900747 Pulled By: maysamyabandeh fbshipit-source-id: f3cd650c5fff24cf95c1aaf8a10c149d42bf042c 31 January 2019, 20:21:59 UTC
32a6dd9 Add a new CPU time counter to compaction report (#4889) Summary: Measure CPU time consumed for a compaction and report it in the stats report Enable NowCPUNanos() to work for MacOS Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889 Differential Revision: D13701276 Pulled By: zinoale fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055 30 January 2019, 01:24:00 UTC
158da7a Verify checksum before ingestion (#4916) Summary: before file ingestion (in preparation phase), verify the checksums of the blocks of the external SST file, including properties block with global seqno. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4916 Differential Revision: D13863501 Pulled By: riversand963 fbshipit-source-id: dc54697f970e3807832e2460f7228fcc7efe81ee 30 January 2019, 01:17:29 UTC
d0d484b Always delete Blob DB files in the background (#4928) Summary: Blob DB files are not tracked by the SFM, so they currently don't get deleted in the background. Force them to be deleted in background so rate limiting can be applied Pull Request resolved: https://github.com/facebook/rocksdb/pull/4928 Differential Revision: D13854649 Pulled By: anand1976 fbshipit-source-id: 8031ce66842ff0af440c715d886b377983dad7d8 29 January 2019, 23:50:03 UTC
95604d1 Change the command to invoke parallel tests (#4922) Summary: We used to call `printf $(t_run)` and later feed the result to GNU parallel in the recipe of target `check_0`. However, this approach is problematic when the length of $(t_run) exceeds the maximum length of a command and the `printf` command cannot be executed. Instead we use 'find -print' to avoid generating an overly long command. **This PR is actually the last commit of #4916. Prefer to merge this PR separately.** Pull Request resolved: https://github.com/facebook/rocksdb/pull/4922 Differential Revision: D13845883 Pulled By: riversand963 fbshipit-source-id: b56de7f7af43337c6ec89b931de843c9667cb679 28 January 2019, 23:02:26 UTC
4978caa Remove a redundant call to TableFileName in CompactionJob::FinishCompactionOutputFile (#4925) Summary: While stepping through the code I noticed that there is a redundant call to TableFileName. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4925 Differential Revision: D13845749 Pulled By: sagar0 fbshipit-source-id: 31db45716b4d720e0e0350dd457b49d6f1848e7d 28 January 2019, 21:33:23 UTC
ee18180 Remove PlainTable's feature store_index_in_file (#4914) Summary: Store_index_in_file is a less useful feature. To simplify the code to maintain, we are dropping the feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4914 Differential Revision: D13791883 Pulled By: siying fbshipit-source-id: d187c5d662584866103e4b77d09dfb925509ae2e 28 January 2019, 20:50:22 UTC
e254710 Fix the build error caused by the dynamic array (#4918) Summary: In the MixGraph benchmark of db_bench #4788 , the char array is initialized with an argument from user's input, which can cause build error on some platforms. Also, the msg char array size can be potentially smaller than the printed data, which should be extended from 100 to 256. Tested with make check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4918 Differential Revision: D13844298 Pulled By: sagar0 fbshipit-source-id: 33c4809c5c4438f0a9f7b289d3f42e20c545bbab 28 January 2019, 20:39:57 UTC
e242fa4 Add latest toolchain (gcc-8, etc.) build support for fbcode users (#4923) Summary: - When building with internal dependencies, specify this toolchain by setting `ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1` - It is not enabled by default. However, it is enabled for TSAN builds in CI since there is a known problem with TSAN in gcc-5: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71090 - I did not add support for Lua since (1) we agreed to deprecate it, and (2) we only have an internal build for v5.3 with this toolchain while that has breaking changes compared to our current version (v5.2). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4923 Differential Revision: D13827226 Pulled By: ajkr fbshipit-source-id: 9aa3388ed3679777cfb15ef8cbcb83c07f62f947 28 January 2019, 19:26:32 UTC
bc7d166 Fix test name typo in PlainTableDBTest Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4926 Differential Revision: D13830196 Pulled By: siying fbshipit-source-id: e06bf2a6cd273b5eb18dfd82bdd35ffce197d021 26 January 2019, 02:14:26 UTC
f184bee PlainTable should avoid copying Get() results from immortal source. (#4924) Summary: https://github.com/facebook/rocksdb/pull/4053 avoids memcopy for Get() results if files are immortable (read-only DB, max_open_files=-1) and the file is ammaped. The same optimization is being applied to PlainTable here. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4924 Differential Revision: D13827749 Pulled By: siying fbshipit-source-id: 1f2cbfc530b40ce08ccd53f95f6e78de4d1c2f96 26 January 2019, 01:12:19 UTC
e1de88c Escape '.' by adding a '\' to avoid matching any char Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4912 Differential Revision: D13789449 Pulled By: riversand963 fbshipit-source-id: 0639dae82049b7ac977c8f81851f1c9fdc346705 24 January 2019, 19:25:27 UTC
fc53839 Disallow customized hash function in DynamicBloom (#4915) Summary: I didn't find where customized hash function is used in DynamicBloom. This can only reduce performance. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4915 Differential Revision: D13794452 Pulled By: siying fbshipit-source-id: e38669b11e01444d2d782da11c7decabbd851819 24 January 2019, 18:34:30 UTC
e07aa86 Allow full merge when root of history for a key is reached (#4909) Summary: Previously compaction was not collapsing operands for a first key on a layer, even in cases when it was its root of history. Some tests (CompactionJobTest.NonAssocMerge) was actually accounting for that bug, Pull Request resolved: https://github.com/facebook/rocksdb/pull/4909 Differential Revision: D13781169 Pulled By: finik fbshipit-source-id: d2de353ecf05bec39b942cd8d5b97a8dc445f336 24 January 2019, 05:46:10 UTC
8ec3e72 Cache dictionary used for decompressing data blocks (#4881) Summary: - If block cache disabled or not used for meta-blocks, `BlockBasedTableReader::Rep::uncompression_dict` owns the `UncompressionDict`. It is preloaded during `PrefetchIndexAndFilterBlocks`. - If block cache is enabled and used for meta-blocks, block cache owns the `UncompressionDict`, which holds dictionary and digested dictionary when needed. It is never prefetched though there is a TODO for this in the code. The cache key is simply the compression dictionary block handle. - New stats for compression dictionary accesses in block cache: "BLOCK_CACHE_COMPRESSION_DICT_*" and "compression_dict_block_read_count" Pull Request resolved: https://github.com/facebook/rocksdb/pull/4881 Differential Revision: D13663801 Pulled By: ajkr fbshipit-source-id: bdcc54044e180855cdcc57639b493b0e016c9a3f 24 January 2019, 02:15:47 UTC
43defe9 Correct the code comment in Compaction::KeyNotExistsBeyondOutputLevel (#4902) Summary: Even one key falls in a file's range, we can not infer it definitely exists in this file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4902 Differential Revision: D13795018 Pulled By: siying fbshipit-source-id: 590956f727e9440fcdee55ad9541ace934c64914 24 January 2019, 02:00:56 UTC
d94aa2f Make compaction_pri = kMinOverlappingRatio to be default (#4911) Summary: compaction_pri = kMinOverlappingRatio usually provides much better write amplification than the default. https://github.com/facebook/rocksdb/pull/4907 fixes one shortcome of this option. Make it default. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4911 Differential Revision: D13789262 Pulled By: siying fbshipit-source-id: d90acf8c4dede44f00d183ca4c7a210259378269 24 January 2019, 00:47:38 UTC
27054d8 Call NewDataBlockIterator with correct arguments in DB::Get (#4913) Summary: The pointer `get_context` was passed as the value for the boolean argument `index_key_is_full`. Luckily the pointer was always non-null so evaluated to true which is the correct value for the boolean argument. But we were missing out on batch updates to stats since we were not passing anything for the `GetContext*` argument and it defaults to `nullptr`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4913 Differential Revision: D13791449 Pulled By: ajkr fbshipit-source-id: dbe40bf406c64d34cb5298604145d18b9e0ca9be 23 January 2019, 23:39:05 UTC
0cead31 Fix Clang static analyzer warning in db_bench (#4910) Summary: Fixed clang static analyzer warning about division by 0. ``` ar: creating librocksdb_debug.a tools/db_bench_tool.cc:4650:43: warning: Division by zero int pos = static_cast<int>(rand_num % range_); ~~~~~~~~~^~~~~~~~ 1 warning generated. make: *** [analyze] Error 1 ``` This is from the new code I recently merged in ce8e88d2d7a62e2a08c4109aac84cb9e95ed359b. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4910 Differential Revision: D13788037 Pulled By: sagar0 fbshipit-source-id: f48851dca85047c19fbb1a361e25ce643aa4c7ea 23 January 2019, 21:33:02 UTC
5bf9419 CompactionPri = kMinOverlappingRatio also uses compensated file size (#4907) Summary: Right now, CompactionPri = kMinOverlappingRatio provides best write amplification, but it doesn't prioritize files with more tombstones. We combine the two good features: make kMinOverlappingRatio to boost files with lots of tombstones too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4907 Differential Revision: D13788774 Pulled By: siying fbshipit-source-id: 1991cbb495fb76c8b529de69896e38d81ed9d9b3 23 January 2019, 21:21:01 UTC
1eded07 Bug in Regular Expression in Makefile (#4682) Summary: False-negative about path not existing. The regex is ignoring the "." in front of a path. Example: "./path/to/file" Pull Request resolved: https://github.com/facebook/rocksdb/pull/4682 Differential Revision: D13777110 Pulled By: sagar0 fbshipit-source-id: 9f8173b7581407555fdc055580732aeab37d4ade 23 January 2019, 18:24:10 UTC
cbe0239 add cast to avoid loss of precision error (#4906) Summary: this PR address the following error: > tools/db_bench_tool.cc:4776:68: error: implicit conversion loses integer precision: 'int64_t' (aka 'long') to 'unsigned int' [-Werror,-Wshorten-64-to-32] s = db_with_cfh->db->Put(write_options_, key, gen.Generate(value_size)); Pull Request resolved: https://github.com/facebook/rocksdb/pull/4906 Differential Revision: D13780185 Pulled By: miasantreble fbshipit-source-id: 1c83a77d341099518c72f0f4a63e97ab9c4784b3 23 January 2019, 06:44:17 UTC
08b8cea Deleting Blob files also goes through SstFileManager (#4904) Summary: Right now, deleting blob files is not rate limited, even if SstFileManger is specified. On the other hand, rate limiting blob deletion is not supported. With this change, Blob file deletion will go through SstFileManager too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4904 Differential Revision: D13772545 Pulled By: siying fbshipit-source-id: bd1b1d0beb26d5167385e00b7ecb8b94b879de84 23 January 2019, 01:00:29 UTC
b2ba068 Add load() statements to TARGETS files Reviewed By: luciang Differential Revision: D13733578 fbshipit-source-id: 556c115935aa42c1da85ec0e91199b9f198fc467 22 January 2019, 23:24:51 UTC
8189c18 Remove unused Blob WAL filter (#4896) Summary: Remove unused blob WAL filter so that users are not confused. I was initially under the impression that we have WAL Filter support in BlobDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4896 Differential Revision: D13725709 Pulled By: sagar0 fbshipit-source-id: f997d7546e138a474036e88b957907cc714327f1 22 January 2019, 19:13:49 UTC
ce8e88d Generate mixed workload with Get, Put, Seek in db_bench (#4788) Summary: Based on the specific workload models (key access distribution, value size distribution, and iterator scan length distribution, the QPS variation), the MixGraph benchmark generate the synthetic workload according to these distributions which can reflect the real-world workload characteristics. After user enable the tracing function, they will get the trace file. By analyzing the trace file with the trace_analyzer tool, user can generate a set of statistic data files including. The *_accessed_key_stats.txt, *-accessed_value_size_distribution.txt, *-iterator_length_distribution.txt, and *-qps_stats.txt are mainly used to fit the Matlab model fitting. After that, user can get the parameters of the workload distributions (the modeling details are described: [here](https://github.com/facebook/rocksdb/wiki/RocksDB-Trace%2C-Replay%2C-and-Analyzer)) The key access distribution follows the The two-term power model. The probability density function is: `f(x) = ax^{b}+c`. The corresponding parameters are key_dist_a, key_dist_b, and key_dist_c in db_bench For the value size distribution and iterator scan length distribution, they both follow the Generalized Pareto Distribution. The probability density function is `f(x) = (1/sigma)(1+k*(x-theta)/sigma))^{-1-1/k)`. The parameters are: value_k, value_theta, value_sigma and iter_k, iter_theta, iter_sigma. For more information about the Generalized Pareto Distribution, users can find the [wiki](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution) and [Matalb page](https://www.mathworks.com/help/stats/generalized-pareto-distribution.html) As for the QPS, it follows the diurnal pattern. So Sine is a good model to fit it. `F(x) = sine_a*sin(sine_b*x + sine_c) + sine_d`. The trace_will tell you the average QPS in the print out resutls, which is sine_d. After user fit the "*-qps_stats.txt" to the Matlab model, user can get the sine_a, sine_b, and sine_c. By using the 4 parameters, user can control the QPS variation including the period, average, changes. To use the bench mark, user can indicate the following parameters as examples: ``` -benchmarks="mixgraph" -key_dist_a=0.002312 -key_dist_b=0.3467 -value_k=0.9233 -value_sigma=226.4092 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.7 -mix_put_ratio=0.25 -mix_seek_ratio=0.05 -sine_mix_rate_interval_milliseconds=500 -sine_a=15000 -sine_b=1 -sine_d=20000 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4788 Differential Revision: D13573940 Pulled By: sagar0 fbshipit-source-id: e184c27e07b4f1bc0b436c2be36c5090c1fb0222 22 January 2019, 18:44:26 UTC
16a5ac5 Update HISTORY.md with new use of ZSTD_CDict (#4901) Summary: Mention feature introduced by #4849 in HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4901 Differential Revision: D13746430 Pulled By: ajkr fbshipit-source-id: f7bdea6f0522ed55428cbc521f8a9f3cd0002d4e 20 January 2019, 03:17:50 UTC
01013ae Digest ZSTD compression dictionary once when writing SST file (#4849) Summary: This is essentially a re-submission of #4251 with a few improvements: - Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict` - Eliminated `Init` functions. Instead do all initialization work in constructors. - Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849 Differential Revision: D13606039 Pulled By: ajkr fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447 19 January 2019, 03:12:57 UTC
b1ad6eb WritePrepared: fix two versions in compaction see different status for released snapshots (#4890) Summary: Fix how CompactionIterator::findEarliestVisibleSnapshots handles released snapshot. It fixing the two scenarios: Scenario 1: key1 has two values v1 and v2. There're two snapshots s1 and s2 taken after v1 and v2 are committed. Right after compaction output v2, s1 is released. Now findEarliestVisibleSnapshot may see s1 being released, and return the next snapshot, which is s2. That's larger than v2's earliest visible snapshot, which was s1. The fix: the only place we check against last snapshot and current key snapshot is when we decide whether to compact out a value if it is hidden by a later value. In the check if we see current snapshot is even larger than last snapshot, we know last snapshot is released, and we are safe to compact out current key. Scenario 2: key1 has two values v1 and v2. there are two snapshots s1 and s2 taken after v1 and v2 are committed. During compaction before we process the key, s1 is released. When compaction process v2, snapshot checker may return kSnapshotReleased, and the earliest visible snapshot for v2 become s2. When compaction process v1, snapshot checker may return kIsInSnapshot (for WritePrepared transaction, it could be because v1 is still in commit cache). The result will become inconsistent here. The fix: remember the set of released snapshots ever reported by snapshot checker, and ignore them when finding result for findEarliestVisibleSnapshot. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4890 Differential Revision: D13705538 Pulled By: maysamyabandeh fbshipit-source-id: e577f0d9ee1ff5a6035f26859e56902ecc85a5a4 19 January 2019, 01:24:06 UTC
7fd9813 WritePrepared: commit of delayed prepared entries (#4894) Summary: Here is the order of ops in a commit: 1) update commit cache 2) publish seq, 3) RemovePrepared. In case of a delayed prepared, there will be a gap between when the commit is visible to snapshots until delayed_prepared_ is cleaned up. To tell apart this case from a delayed uncommitted txn from, the commit entry of a delayed prepared is also stored in delayed_prepared_commits_, which is updated before publishing the commit. Also logic in GetSnapshotInternal that ensures that each new snapshot is always larger than max_evicted_seq_ is updated to check against the upcoming value of max_evicted_seq_ rather than its current one. This is because AdvanceMaxEvictedSeq gets the list of snapshots lower than the new max, before updating max_evicted_seq_. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4894 Differential Revision: D13726988 Pulled By: maysamyabandeh fbshipit-source-id: 1e70d78061b50c944c9816bf4b6dac405ab4ccd3 18 January 2019, 19:36:36 UTC
73ff15c WritePrepared: fix typo in comments Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4891 Differential Revision: D13718016 Pulled By: miasantreble fbshipit-source-id: 90bd372cff453a1c2d104c1cf49731d5dd770c14 17 January 2019, 20:36:36 UTC
dd9eca1 Remove unused variable to fix clang compilation err (#4893) Summary: as title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4893 Differential Revision: D13716733 Pulled By: riversand963 fbshipit-source-id: 6811d6a99fe2094d5344f854e8939f01238b2adb 17 January 2019, 19:57:31 UTC
3cfc751 Remove an unused option (#4888) Summary: Remove `garbage_collection_deletion_size_threshold` as it is not used anywhere. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4888 Differential Revision: D13685982 Pulled By: sagar0 fbshipit-source-id: e08d3017b9a0c8fa99bc332b595ee4ed9db70c87 16 January 2019, 19:48:43 UTC
128f532 WritePrepared: fix issue with snapshot released during compaction (#4858) Summary: Compaction iterator keep a copy of list of live snapshots at the beginning of compaction, and then query snapshot checker to verify if values of a sequence number is visible to these snapshots. However when the snapshot is released in the middle of compaction, the snapshot checker implementation (i.e. WritePreparedSnapshotChecker) may remove info with the snapshot and may report incorrect result, which lead to values being compacted out when it shouldn't. This patch conservatively keep the values if snapshot checker determines that the snapshots is released. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4858 Differential Revision: D13617146 Pulled By: maysamyabandeh fbshipit-source-id: cf18a94f6f61a94bcff73c280f117b224af5fbc3 16 January 2019, 17:55:32 UTC
e79df37 Use chrono::time_point instead of time_t (#4868) Summary: By convention, time_t almost always stores the integral number of seconds since 00:00 hours, Jan 1, 1970 UTC, according to http://www.cplusplus.com/reference/ctime/time_t/. We surely want more precision than seconds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4868 Differential Revision: D13633046 Pulled By: riversand963 fbshipit-source-id: 4e01e23a22e8838023c51a91247a286dbf3a5396 16 January 2019, 17:51:05 UTC
5d4fddf WritePrepared: Fix visible key compacted out by compaction (#4883) Summary: With WritePrepared transaction, flush/compaction can contain uncommitted keys, and those keys can get committed during compaction. If a snapshot is taken before the key is committed, it should not see the key. On the other hand, compaction grab the list of snapshots at its beginning, and only consider those snapshots to dedup keys. Consider the case: ``` seq = 1: put "foo" = "bar" seq = 2: transaction T: delete "foo", prepare seq = 3: compaction start seq = 4: take snapshot S seq = 5: transaction T: commit. ... seq = N: compaction iterator reached key "foo". ``` When compaction start, the list of snapshot is empty. Compaction doesn't take snapshot S into account. When it reached "foo", transaction T is committed. Compaction may think the value "foo=bar" is not visible by any snapshot (which is wrong), and compact the value out. The fix is to explicitly take a snapshot before compaction grabbing the list of snapshots. Compaction will then has to keep keys visible to this snapshot. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4883 Differential Revision: D13668775 Pulled By: maysamyabandeh fbshipit-source-id: 1cab9615f94b7d3e8522cc3d44c3a14c7d4720e4 16 January 2019, 05:34:38 UTC
back to top