https://github.com/facebook/rocksdb

sort by:
Revision Author Date Message Commit Date
94e3429 make FileSize() one line instead of using unnecessary pointer 16 December 2016, 23:43:04 UTC
8460fd7 Init filesize in constructor 16 December 2016, 05:55:55 UTC
362cf32 Add the line that deleted by mistake 15 December 2016, 23:24:16 UTC
1ff82aa Expose SST file size in SstFileWriter 15 December 2016, 23:18:39 UTC
36d42e6 Disable test to unblock travis build Summary: The two tests keep failing in travis. Disable them and will fix later. Closes https://github.com/facebook/rocksdb/pull/1648 Differential Revision: D4316389 Pulled By: yiwu-arbug fbshipit-source-id: 0a370e7 13 December 2016, 19:54:14 UTC
b57dd92 C API: support writebatch delete range Summary: Seem that writebatch delete range can work now, so I add C API for later use. Btw, can we use this feature in production now? Closes https://github.com/facebook/rocksdb/pull/1647 Differential Revision: D4314534 Pulled By: ajkr fbshipit-source-id: e835165 13 December 2016, 19:24:18 UTC
2ba59b5 Disallow ingesting files into dropped CFs Summary: This PR update IngestExternalFile to return an error if we try to ingest a file into a dropped CF. Right now if IngestExternalFile want to flush a memtable, and it's ingesting a file into a dropped CF, it will wait forever since flushing is not possible for the dropped CF Closes https://github.com/facebook/rocksdb/pull/1657 Differential Revision: D4318657 Pulled By: IslamAbdelRahman fbshipit-source-id: ed6ea2b 13 December 2016, 08:54:14 UTC
1f6f7e3 cast to signed char in ldb_cmd_test for ppc64le Summary: char is unsigned on power by default causing this test to fail with the FF case. ppc64 return 255 while x86 returned -1. Casting works on both platforms. Closes https://github.com/facebook/rocksdb/pull/1500 Differential Revision: D4308775 Pulled By: yiwu-arbug fbshipit-source-id: db3e6e0 12 December 2016, 22:39:18 UTC
243975d More accurate error status for BackupEngine::Open Summary: Some users are assuming NotFound means the backup does not exist at the provided path, which is a reasonable assumption. We need to stop returning NotFound for system errors. Depends on #1644 Closes https://github.com/facebook/rocksdb/pull/1645 Differential Revision: D4312233 Pulled By: ajkr fbshipit-source-id: 5343c10 12 December 2016, 21:24:21 UTC
f0c509e Return finer-granularity status from Env::GetChildren* Summary: It'd be nice to use the error status type to distinguish between user error and system error. For example, GetChildren can fail listing a backup directory's contents either because a bad path was provided (user error) or because an operation failed, e.g., a remote storage service call failed (system error). In the former case, we want to continue and treat the backup directory as empty; in the latter case, we want to immediately propagate the error to the caller. This diff uses NotFound to indicate user error and IOError to indicate system error. Previously IOError indicated both. Closes https://github.com/facebook/rocksdb/pull/1644 Differential Revision: D4312157 Pulled By: ajkr fbshipit-source-id: 51b4f24 12 December 2016, 20:54:13 UTC
dc64f46 Add db_bench option for stderr logging Summary: The info log file ("LOG") is stored in the db directory by default. When the db is on a distributed env, this is unnecessarily slow. So, I added an option to db_bench to just print the info log messages to stderr. Closes https://github.com/facebook/rocksdb/pull/1641 Differential Revision: D4309348 Pulled By: ajkr fbshipit-source-id: 1e6f851 10 December 2016, 00:54:14 UTC
2cabdb8 Increase buffer size Summary: When compiling with GCC>=7.0.0, "db/internal_stats.cc" fails to compile as the data being written to the buffer potentially exceeds its size. This fix simply doubles the size of the buffer, thus accommodating the max possible data size. Closes https://github.com/facebook/rocksdb/pull/1635 Differential Revision: D4302162 Pulled By: yiwu-arbug fbshipit-source-id: c76ad59 09 December 2016, 19:54:22 UTC
4a17b47 Remove unnecessary header include Summary: Remove "util/testharness.h" from list of includes for "db/db_filesnapshot.cc", as it wasn't being used and thus caused an extraneous dependency on gtest. Closes https://github.com/facebook/rocksdb/pull/1634 Differential Revision: D4302146 Pulled By: yiwu-arbug fbshipit-source-id: e900c0b 09 December 2016, 19:54:21 UTC
8c2b921 Fixed a crash in debug build in flush_job.cc Summary: It was doing `&range_del_iters[0]` on an empty vector. Even though the resulting pointer is never dereferenced, it's still bad for two reasons: * the practical reason: it crashes with `std::out_of_range` exception in our debug build, * the "C++ standard lawyer" reason: it's undefined behavior because, in `std::vector` implementation, it probably "dereferences" a null pointer, which is invalid even though it doesn't actually read the pointed memory, just converts a pointer into a reference (and then flush_job.cc converts it back to pointer); nullptr references are undefined behavior. Closes https://github.com/facebook/rocksdb/pull/1612 Differential Revision: D4265625 Pulled By: al13n321 fbshipit-source-id: db26fb9 09 December 2016, 18:39:12 UTC
20ce081 Fix issue where IngestExternalFile insert blocks in block cache with g_seqno=0 Summary: When we Ingest an external file we open it to read some metadata and first/last key during doing that we insert blocks into the block cache with global_seqno = 0 If we move the file (did not copy it) into the DB, we will use these blocks with the wrong seqno in the read path Closes https://github.com/facebook/rocksdb/pull/1627 Differential Revision: D4293332 Pulled By: yiwu-arbug fbshipit-source-id: 3ce5523 08 December 2016, 21:39:18 UTC
5241e0d fix db_bench argument type Summary: Closes https://github.com/facebook/rocksdb/pull/1633 Differential Revision: D4298161 Pulled By: yiwu-arbug fbshipit-source-id: 2c7af35 08 December 2016, 19:09:14 UTC
c04f6a0 Specify shell in makefile Summary: The second variable "SHELL" simply tells make explicitly which shell to use, instead of allowing it to default to "/bin/sh", which may or may not be Bash. However, simply defining the second variable by itself causes make to throw an error concerning a circular definition, as it would be attempting to use the "shell" command while simultaneously trying to set which shell to use. Thus, the first variable "BASH_EXISTS" is defined such that make already knows about "/path/to/bash" before trying to use it to set "SHELL". A more technically correct solution would be to edit the makefile itself to make it compatible with non-bash shells (see the original Issue discussion for details). However, as it seems very few of the people working on this project were building with non-bash shells, I figured this solution would be good enough. Closes https://github.com/facebook/rocksdb/pull/1631 Differential Revision: D4295689 Pulled By: yiwu-arbug fbshipit-source-id: e4f9532 08 December 2016, 06:24:15 UTC
45c7ce1 CompactRangeOptions C API Summary: Add C API for CompactRangeOptions. Closes https://github.com/facebook/rocksdb/pull/1596 Differential Revision: D4252339 Pulled By: yiwu-arbug fbshipit-source-id: f768f93 08 December 2016, 01:54:14 UTC
2c2ba68 db_stress support for range deletions Summary: made db_stress capable of adding range deletions to its db and verifying their correctness. i'll make db_crashtest.py use this option later once the collapsing optimization (https://github.com/facebook/rocksdb/pull/1614) is committed because currently it slows down the test too much. Closes https://github.com/facebook/rocksdb/pull/1625 Differential Revision: D4293939 Pulled By: ajkr fbshipit-source-id: d3beb3a 07 December 2016, 21:09:24 UTC
b821984 DeleteRange read path end-to-end tests Summary: Closes https://github.com/facebook/rocksdb/pull/1592 Differential Revision: D4246260 Pulled By: ajkr fbshipit-source-id: ce03fa2 07 December 2016, 20:54:17 UTC
2f4fc53 Compaction::IsTrivialMove relaxing Summary: IsTrivialMove returns true if no input file overlaps with output_level+1 with more than max_compaction_bytes_ bytes. Closes https://github.com/facebook/rocksdb/pull/1619 Differential Revision: D4278338 Pulled By: yiwu-arbug fbshipit-source-id: 994c001 07 December 2016, 19:54:11 UTC
1dce75b Update USERS.md Summary: Closes https://github.com/facebook/rocksdb/pull/1630 Differential Revision: D4293200 Pulled By: IslamAbdelRahman fbshipit-source-id: 9c81f85 07 December 2016, 19:24:14 UTC
304b3c7 Update USERS.md Summary: Closes https://github.com/facebook/rocksdb/pull/1629 Differential Revision: D4293183 Pulled By: IslamAbdelRahman fbshipit-source-id: 759ea92 07 December 2016, 19:24:14 UTC
fa50fff Option to expand range tombstones in db_bench Summary: When enabled, this option replaces range tombstones with a sequence of point tombstones covering the same range. This can be used to A/B test perf of range tombstones vs sequential point tombstones, and help us find the cross-over point, i.e., the size of the range above which range tombstones outperform point tombstones. Closes https://github.com/facebook/rocksdb/pull/1594 Differential Revision: D4246312 Pulled By: ajkr fbshipit-source-id: 3b00b23 07 December 2016, 02:09:14 UTC
c26a4d8 Fix compile error in trasaction_lock_mgr.cc Summary: Fix error on mac/windows build since they don't recognize `uint`. Closes https://github.com/facebook/rocksdb/pull/1624 Differential Revision: D4287139 Pulled By: yiwu-arbug fbshipit-source-id: b7cc88f 06 December 2016, 22:39:16 UTC
ed8fbdb Add EventListener::OnExternalFileIngested() event Summary: Add EventListener::OnExternalFileIngested() to allow user to subscribe to external file ingestion events Closes https://github.com/facebook/rocksdb/pull/1623 Differential Revision: D4285844 Pulled By: IslamAbdelRahman fbshipit-source-id: 0b95a88 06 December 2016, 22:09:17 UTC
2005c88 Implement non-exclusive locks Summary: This is an implementation of non-exclusive locks for pessimistic transactions. It is relatively simple and does not prevent starvation (ie. it's possible that request for exclusive access will never be granted if there are always threads holding shared access). It is done by changing `KeyLockInfo` to hold an set a transaction ids, instead of just one, and adding a flag specifying whether this lock is currently held with exclusive access or not. Some implementation notes: - Some lock diagnostic functions had to be updated to return a set of transaction ids for a given lock, eg. `GetWaitingTxn` and `GetLockStatusData`. - Deadlock detection is a bit more complicated since a transaction can now wait on multiple other transactions. A BFS is done in this case, and deadlock detection depth is now just a limit on the number of transactions we visit. - Expirable transactions do not work efficiently with shared locks at the moment, but that's okay for now. Closes https://github.com/facebook/rocksdb/pull/1573 Differential Revision: D4239097 Pulled By: lth fbshipit-source-id: da7c074 06 December 2016, 01:39:17 UTC
0b0f235 Mention IngestExternalFile changes in HISTORY.md Summary: I hit the land button too fast and didn't include the line. Closes https://github.com/facebook/rocksdb/pull/1622 Differential Revision: D4281316 Pulled By: yiwu-arbug fbshipit-source-id: c7b38e0 06 December 2016, 00:09:11 UTC
23db48e Update HISTORY.md for 5.0 branch Summary: These changes are included in the new branch-cut. Closes https://github.com/facebook/rocksdb/pull/1621 Differential Revision: D4281015 Pulled By: yiwu-arbug fbshipit-source-id: d88858b 05 December 2016, 23:39:11 UTC
beb36d9 Fixed CompactionFilter::Decision::kRemoveAndSkipUntil Summary: Embarassingly enough, the first time I tried to use my new feature in logdevice it crashed with this assertion failure: db/pinned_iterators_manager.h:30: void rocksdb::PinnedIteratorsManager::StartPinning(): Assertion `pinning_enabled == false' failed The issue was that `pinned_iters_mgr_.StartPinning()` was called but `pinned_iters_mgr_.ReleasePinnedData()` wasn't. Closes https://github.com/facebook/rocksdb/pull/1611 Differential Revision: D4265622 Pulled By: al13n321 fbshipit-source-id: 747b10f 05 December 2016, 23:24:11 UTC
67f37cf Allow user to specify a CF for SST files generated by SstFileWriter Summary: Allow user to explicitly specify that the generated file by SstFileWriter will be ingested in a specific CF. This allow us to persist the CF id in the generated file Closes https://github.com/facebook/rocksdb/pull/1615 Differential Revision: D4270422 Pulled By: IslamAbdelRahman fbshipit-source-id: 7fb954e 05 December 2016, 22:24:16 UTC
9053fe2 Made delete_obsolete_files_period_micros option dynamic Summary: Made delete_obsolete_files_period_micros option dynamic. It can be updating using DB::SetDBOptions(). Closes https://github.com/facebook/rocksdb/pull/1595 Differential Revision: D4246569 Pulled By: tonek fbshipit-source-id: d23f560 05 December 2016, 22:24:16 UTC
edde954 fix clang build Summary: override is missing for FilterV2 Closes https://github.com/facebook/rocksdb/pull/1606 Differential Revision: D4263832 Pulled By: IslamAbdelRahman fbshipit-source-id: d8b337a 02 December 2016, 02:39:10 UTC
56281f3 Add memtable_insert_with_hint_prefix_size option to db_bench Summary: Add memtable_insert_with_hint_prefix_size option to db_bench Closes https://github.com/facebook/rocksdb/pull/1604 Differential Revision: D4260549 Pulled By: yiwu-arbug fbshipit-source-id: cee5ef7 02 December 2016, 00:54:16 UTC
4a21b14 Cache heap::downheap() root comparison (optimize heap cmp call) Summary: Reduce number of comparisons in heap by caching which child node in the first level is smallest (left_child or right_child) So next time we can compare directly against the smallest child I see that the total number of calls to comparator drops significantly when using this optimization Before caching (~2mil key comparison for iterating the DB) ``` $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000 --perf_level=2 readseq : 0.338 micros/op 2959201 ops/sec; 327.4 MB/s user_key_comparison_count = 2000008 ``` After caching (~1mil key comparison for iterating the DB) ``` $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000 --perf_level=2 readseq : 0.309 micros/op 3236801 ops/sec; 358.1 MB/s user_key_comparison_count = 1000011 ``` It also improves Closes https://github.com/facebook/rocksdb/pull/1600 Differential Revision: D4256027 Pulled By: IslamAbdelRahman fbshipit-source-id: 76fcc66 01 December 2016, 21:39:14 UTC
e39d080 Fix travis (compile for clang < 3.9) Summary: Travis fail because it uses clang 3.6 which don't recognize `__attribute__((__no_sanitize__("undefined")))` Closes https://github.com/facebook/rocksdb/pull/1601 Differential Revision: D4257175 Pulled By: IslamAbdelRahman fbshipit-source-id: fb4d1ab 01 December 2016, 18:09:22 UTC
3f407b0 Kill flashcache code in RocksDB Summary: Now that we have userspace persisted cache, we don't need flashcache anymore. Closes https://github.com/facebook/rocksdb/pull/1588 Differential Revision: D4245114 Pulled By: igorcanadi fbshipit-source-id: e2c1c72 01 December 2016, 18:09:22 UTC
b77007d Bug: paralle_group status updated in WriteThread::CompleteParallelWorker Summary: Multi-write thread may update the status of the parallel_group in WriteThread::CompleteParallelWorker if the status of Writer is not ok! When copy write status to the paralle_group, the write thread just hold the mutex of the the writer processed by itself. it is useless. The thread should held the the leader of the parallel_group instead. Closes https://github.com/facebook/rocksdb/pull/1598 Differential Revision: D4252335 Pulled By: siying fbshipit-source-id: 3864cf7 01 December 2016, 17:54:11 UTC
247d097 Support for range skips in compaction filter Summary: This adds the ability for compaction filter to say "drop this key-value, and also drop everything up to key x". This will cause the compaction to seek input iterator to x, without reading the data. This can make compaction much faster when large consecutive chunks of data are filtered out. See the changes in include/rocksdb/compaction_filter.h for the new API. Along the way this diff also adds ability for compaction filter changing merge operands, similar to how it can change values; we're not going to use this feature, it just seemed easier and cleaner to implement it than to document that it's not implemented :) The diff is not as big as it may seem, about half of the lines are a test. Closes https://github.com/facebook/rocksdb/pull/1599 Differential Revision: D4252092 Pulled By: al13n321 fbshipit-source-id: 41e1e48 01 December 2016, 15:09:15 UTC
96fcefb c api: expose option for dynamic level size target Summary: Closes https://github.com/facebook/rocksdb/pull/1587 Differential Revision: D4245923 Pulled By: yiwu-arbug fbshipit-source-id: 6ee7291 30 November 2016, 19:24:14 UTC
00197cf Add C API to set base_backgroud_compactions Summary: Add C API to set base_backgroud_compactions Closes https://github.com/facebook/rocksdb/pull/1571 Differential Revision: D4245709 Pulled By: yiwu-arbug fbshipit-source-id: 792c6b8 30 November 2016, 19:09:13 UTC
5b219ec deleterange end-to-end test improvements for lite/robustness Summary: Closes https://github.com/facebook/rocksdb/pull/1591 Differential Revision: D4246019 Pulled By: ajkr fbshipit-source-id: 0c4aa37 29 November 2016, 20:24:13 UTC
aad1191 pass rocksdb oncall to mysql_mtr_filter otherwise tasks get created w… Summary: …rong owner mysql_mtr_filter script needs proper oncall Closes https://github.com/facebook/rocksdb/pull/1586 Differential Revision: D4245150 Pulled By: anirbanr-fb fbshipit-source-id: fd8577c 29 November 2016, 20:09:12 UTC
e333528 DeleteRange write path end-to-end tests Summary: Closes https://github.com/facebook/rocksdb/pull/1578 Differential Revision: D4241171 Pulled By: ajkr fbshipit-source-id: ce5fd83 29 November 2016, 19:09:22 UTC
7784980 Fix mis-reporting of compaction read bytes to the base level Summary: In dynamic leveled compaction, when calculating read bytes, output level bytes may be wronglyl calculated as input level inputs. Fix it. Closes https://github.com/facebook/rocksdb/pull/1475 Differential Revision: D4148412 Pulled By: siying fbshipit-source-id: f2f475a 29 November 2016, 19:09:22 UTC
3c6b49e Fix implicit conversion between int64_t to int Summary: Make conversion explicit, implicit conversion breaks the build Closes https://github.com/facebook/rocksdb/pull/1589 Differential Revision: D4245158 Pulled By: IslamAbdelRahman fbshipit-source-id: aaec00d 29 November 2016, 18:54:15 UTC
b3b8756 Remove unused assignment in db/db_iter.cc Summary: "make analyze" complains the assignment is not useful. Remove it. Closes https://github.com/facebook/rocksdb/pull/1581 Differential Revision: D4241697 Pulled By: siying fbshipit-source-id: 178f67a 29 November 2016, 17:09:14 UTC
4f6e89b Fix range deletion covering key in same SST file Summary: AddTombstones() needs to be before t->Get(), oops :'( Closes https://github.com/facebook/rocksdb/pull/1576 Differential Revision: D4241041 Pulled By: ajkr fbshipit-source-id: 781ceea 29 November 2016, 06:54:13 UTC
a2bf265 Avoid intentional overflow in GetL0ThresholdSpeedupCompaction Summary: https://github.com/facebook/rocksdb/commit/99c052a34f93d119b75eccdcd489ecd581d48ee9 fixes integer overflow in GetL0ThresholdSpeedupCompaction() by checking if int become -ve. UBSAN will complain about that since this is still an overflow, we can fix the issue by simply using int64_t Closes https://github.com/facebook/rocksdb/pull/1582 Differential Revision: D4241525 Pulled By: IslamAbdelRahman fbshipit-source-id: b3ae21f 29 November 2016, 02:39:13 UTC
52fd1ff disable UBSAN for functions with intentional -ve shift / overflow Summary: disable UBSAN for functions with intentional left shift on -ve number / overflow These functions are rocksdb:: Hash FixedLengthColBufEncoder::Append FaultInjectionTest:: Key Closes https://github.com/facebook/rocksdb/pull/1577 Differential Revision: D4240801 Pulled By: IslamAbdelRahman fbshipit-source-id: 3e1caf6 29 November 2016, 01:54:12 UTC
1886c43 Fix CompactionJob::Install division by zero Summary: Fix CompactionJob::Install division by zero Closes https://github.com/facebook/rocksdb/pull/1580 Differential Revision: D4240794 Pulled By: IslamAbdelRahman fbshipit-source-id: 7286721 29 November 2016, 00:54:16 UTC
63c30de fix options_test ubsan Summary: Having -ve value for max_write_buffer_number does not make sense and cause us to do a left shift on a -ve value number Closes https://github.com/facebook/rocksdb/pull/1579 Differential Revision: D4240798 Pulled By: IslamAbdelRahman fbshipit-source-id: bd6267e 29 November 2016, 00:39:14 UTC
13e66a8 Fix compaction_job.cc division by zero Summary: Fix division by zero in compaction_job.cc Closes https://github.com/facebook/rocksdb/pull/1575 Differential Revision: D4240818 Pulled By: IslamAbdelRahman fbshipit-source-id: a8bc757 29 November 2016, 00:39:13 UTC
01eabf7 Fix double-counted deletion stat Summary: Both the single deletion and the value are included in compaction outputs, so no need to update the stat for the value's deletion yet, otherwise it'd be double-counted. Closes https://github.com/facebook/rocksdb/pull/1574 Differential Revision: D4241181 Pulled By: ajkr fbshipit-source-id: c9aaa15 28 November 2016, 23:54:12 UTC
7ffb10f DeleteRange compaction statistics Summary: - "rocksdb.compaction.key.drop.range_del" - number of keys dropped during compaction due to a range tombstone covering them - "rocksdb.compaction.range_del.drop.obsolete" - number of range tombstones dropped due to compaction to bottom level and no snapshot saving them - s/CompactionIteratorStats/CompactionIterationStats/g since this class is no longer specific to CompactionIterator -- it's also updated for range tombstone iteration during compaction - Move the above class into a separate .h file to avoid circular dependency. Closes https://github.com/facebook/rocksdb/pull/1520 Differential Revision: D4187179 Pulled By: ajkr fbshipit-source-id: 10c2103 28 November 2016, 19:54:12 UTC
236d4c6 Less linear search in DBIter::Seek() when keys are overwritten a lot Summary: In one deployment we saw high latencies (presumably from slow iterator operations) and a lot of CPU time reported by perf with this stack: ``` rocksdb::MergingIterator::Next rocksdb::DBIter::FindNextUserEntryInternal rocksdb::DBIter::Seek ``` I think what's happening is: 1. we create a snapshot iterator, 2. we do lots of Put()s for the same key x; this creates lots of entries in memtable, 3. we seek the iterator to a key slightly smaller than x, 4. the seek walks over lots of entries in memtable for key x, skipping them because of high sequence numbers. CC IslamAbdelRahman Closes https://github.com/facebook/rocksdb/pull/1413 Differential Revision: D4083879 Pulled By: IslamAbdelRahman fbshipit-source-id: a83ddae 28 November 2016, 18:24:11 UTC
cd7c414 Improve Write Stalling System Summary: Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before. Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system. Closes https://github.com/facebook/rocksdb/pull/1562 Differential Revision: D4218519 Pulled By: siying fbshipit-source-id: 95e4088 23 November 2016, 17:24:15 UTC
dfb6fe6 Unified InlineSkipList::Insert algorithm with hinting Summary: This PR is based on nbronson's diff with small modifications to wire it up with existing interface. Comparing to previous version, this approach works better for inserting keys in decreasing order or updating the same key, and impose less restriction to the prefix extractor. ---- Summary from original diff ---- This diff introduces a single InlineSkipList::Insert that unifies the existing sequential insert optimization (prev_), concurrent insertion, and insertion using externally-managed insertion point hints. There's a deep symmetry between insertion hints (cursors) and the concurrent algorithm. In both cases we have partial information from the recent past that is likely but not certain to be accurate. This diff introduces the struct InlineSkipList::Splice, which encodes predecessor and successor information in the same form that was previously only used within a single call to InsertConcurrently. Splice holds information about an insertion point that can be used to levera Closes https://github.com/facebook/rocksdb/pull/1561 Differential Revision: D4217283 Pulled By: yiwu-arbug fbshipit-source-id: 33ee437 22 November 2016, 22:09:13 UTC
3068870 Making persistent cache more resilient to filesystem failures Summary: The persistent cache is designed to hop over errors and return key not found. So far, it has shown resilience to write errors, encoding errors, data corruption etc. It is not resilient against disappearing files/directories. This was exposed during testing when multiple instances of persistence cache was started sharing the same directory simulating an unpredictable filesystem environment. This patch - makes the write code path more resilient to errors while creating files - makes the read code path more resilient to handle situation where files are not found - added a test that does negative write/read testing by removing the directory while writes are in progress Closes https://github.com/facebook/rocksdb/pull/1472 Differential Revision: D4143413 Pulled By: kradhakrishnan fbshipit-source-id: fd25e9b 22 November 2016, 18:39:10 UTC
734e4ac Eliminate redundant cache lookup with range deletion Summary: When we introduced range deletion block, TableCache::Get() and TableCache::NewIterator() each did two table cache lookups, one for range deletion block iterator and another for getting the table reader to which the Get()/NewIterator() is delegated. This extra cache lookup was very CPU-intensive (about 10% overhead in a read-heavy benchmark). We can avoid it by reusing the Cache::Handle created for range deletion block iterator to get the file reader. Closes https://github.com/facebook/rocksdb/pull/1537 Differential Revision: D4201167 Pulled By: ajkr fbshipit-source-id: d33ffd8 22 November 2016, 05:24:11 UTC
182b940 Add WriteOptions.no_slowdown Summary: If the WriteOptions.no_slowdown flag is set AND we need to wait or sleep for the write request, then fail immediately with Status::Incomplete(). Closes https://github.com/facebook/rocksdb/pull/1527 Differential Revision: D4191405 Pulled By: maysamyabandeh fbshipit-source-id: 7f3ce3f 22 November 2016, 02:09:13 UTC
4118e13 Persistent Cache: Expose stats to user via public API Summary: Exposing persistent cache stats (counters) to the user via public API. Closes https://github.com/facebook/rocksdb/pull/1485 Differential Revision: D4155274 Pulled By: siying fbshipit-source-id: 30a9f50 22 November 2016, 01:39:13 UTC
f2a8f92 rocks_lua_compaction_filter: add unused attribute to a variable Summary: Release build shows warning without this fix. Closes https://github.com/facebook/rocksdb/pull/1558 Differential Revision: D4215831 Pulled By: yiwu-arbug fbshipit-source-id: 888a755 21 November 2016, 22:54:14 UTC
4444256 Remove use of deprecated LZ4 function Summary: LZ4 1.7.3 emits warnings when calling the deprecated function `LZ4_compress_limitedOutput_continue()`. Starting in r129, LZ4 introduces `LZ4_compress_fast_continue()` as a replacement, and the two functions calls are [exactly equivalent](https://github.com/lz4/lz4/blob/dev/lib/lz4.c#L1408). Closes https://github.com/facebook/rocksdb/pull/1532 Differential Revision: D4199240 Pulled By: siying fbshipit-source-id: 138c2bc 21 November 2016, 20:24:14 UTC
548d7fb Fix fd leak when using direct IOs Summary: We should close the fd, before overriding it. This bug was introduced by f89caa127baa086cb100976b14da1a531cf0e823 Closes https://github.com/facebook/rocksdb/pull/1553 Differential Revision: D4214101 Pulled By: siying fbshipit-source-id: 0d65de0 21 November 2016, 20:24:13 UTC
fd43ee0 Range deletion microoptimizations Summary: - Made RangeDelAggregator's InternalKeyComparator member a reference-to-const so we don't need to copy-construct it. Also added InternalKeyComparator to ImmutableCFOptions so we don't need to construct one for each DBIter. - Made MemTable::NewRangeTombstoneIterator and the table readers' NewRangeTombstoneIterator() functions return nullptr instead of NewEmptyInternalIterator to avoid the allocation. Updated callers accordingly. Closes https://github.com/facebook/rocksdb/pull/1548 Differential Revision: D4208169 Pulled By: ajkr fbshipit-source-id: 2fd65cf 21 November 2016, 20:24:13 UTC
23a18ca Reword support a little bit to more clear and concise Summary: I tried to do this in #1556, but it landed before the change could be imported. Closes https://github.com/facebook/rocksdb/pull/1557 Differential Revision: D4214572 Pulled By: siying fbshipit-source-id: 718d4a4 21 November 2016, 19:39:13 UTC
481856a Update support to separate code issues with general questions Summary: Closes https://github.com/facebook/rocksdb/pull/1556 Differential Revision: D4214184 Pulled By: siying fbshipit-source-id: c1abf47 21 November 2016, 18:54:12 UTC
a0deec9 Fix deadlock when calling getMergedHistogram Summary: When calling StatisticsImpl::HistogramInfo::getMergedHistogram(), if there is a dying thread, which is calling ThreadLocalPtr::StaticMeta::OnThreadExit() to merge its thread values to HistogramInfo, deadlock will occur. Because the former try to hold merge_lock then ThreadMeta::mutex_, but the later try to hold ThreadMeta::mutex_ then merge_lock. In short, the locking order isn't the same. This patch addressed this issue by releasing merge_lock before folding thread values. Closes https://github.com/facebook/rocksdb/pull/1552 Differential Revision: D4211942 Pulled By: ajkr fbshipit-source-id: ef89bcb 21 November 2016, 02:24:12 UTC
fe349db Remove Arena in RangeDelAggregator Summary: The Arena construction/destruction introduced significant overhead to read-heavy workload just by creating empty vectors for its blocks, so avoid it in RangeDelAggregator. Closes https://github.com/facebook/rocksdb/pull/1547 Differential Revision: D4207781 Pulled By: ajkr fbshipit-source-id: 9d1c130 19 November 2016, 22:24:12 UTC
e63350e Use more efficient hash map for deadlock detection Summary: Currently, deadlock cycles are held in std::unordered_map. The problem with it is that it allocates/deallocates memory on every insertion/deletion. This limits throughput since we're doing this expensive operation while holding a global mutex. Fix this by using a vector which caches memory instead. Running the deadlock stress test, this change increased throughput from 39k txns/s -> 49k txns/s. The effect is more noticeable in MyRocks. Closes https://github.com/facebook/rocksdb/pull/1545 Differential Revision: D4205662 Pulled By: lth fbshipit-source-id: ff990e4 19 November 2016, 19:39:15 UTC
a13bde3 Skip ldb test in Travis Summary: Travis now is building for ldb tests. Disable for now to unblock other tests while we are investigating. Closes https://github.com/facebook/rocksdb/pull/1546 Differential Revision: D4209404 Pulled By: siying fbshipit-source-id: 47edd97 19 November 2016, 03:24:13 UTC
73843aa Direct I/O Reads Handle the last sector correctly. Summary: Currently, in the Direct I/O read mode, the last sector of the file, if not full, is not handled correctly. If the return value of pread is not multiplier of kSectorSize, we still go ahead and continue reading, even if the buffer is not aligned. With the commit, if the return value is not multiplier of kSectorSize, and all but the last sector has been read, we simply return. Closes https://github.com/facebook/rocksdb/pull/1550 Differential Revision: D4209609 Pulled By: lightmark fbshipit-source-id: cb0b439 19 November 2016, 03:24:13 UTC
9d60151 Implement PositionedAppend for PosixWritableFile Summary: This patch clarifies the contract of PositionedAppend with some unit tests and also implements it for PosixWritableFile. (Tasks: 14524071) Closes https://github.com/facebook/rocksdb/pull/1514 Differential Revision: D4204907 Pulled By: maysamyabandeh fbshipit-source-id: 06eabd2 19 November 2016, 01:24:13 UTC
3f62215 Lazily initialize RangeDelAggregator's map and pinning manager Summary: Since a RangeDelAggregator is created for each read request, these heap-allocating member variables were consuming significant CPU (~3% total) which slowed down request throughput. The map and pinning manager are only necessary when range deletions exist, so we can defer their initialization until the first range deletion is encountered. Currently lazy initialization is done for reads only since reads pass us a single snapshot, which is easier to store on the stack for later insertion into the map than the vector passed to us by flush or compaction. Note the Arena member variable is still expensive, I will figure out what to do with it in a subsequent diff. It cannot be lazily initialized because we currently use this arena even to allocate empty iterators, which is necessary even when no range deletions exist. Closes https://github.com/facebook/rocksdb/pull/1539 Differential Revision: D4203488 Pulled By: ajkr fbshipit-source-id: 3b36279 19 November 2016, 01:09:11 UTC
41e77b8 cmake: s/STEQUAL/STREQUAL/ Summary: Signed-off-by: Kefu Chai <tchaikov@gmail.com> Closes https://github.com/facebook/rocksdb/pull/1540 Differential Revision: D4207564 Pulled By: siying fbshipit-source-id: 567415b 18 November 2016, 22:54:14 UTC
c1038d2 Release RocksDB 5.0 Summary: Update HISTORY.md and version.h Closes https://github.com/facebook/rocksdb/pull/1536 Differential Revision: D4202987 Pulled By: IslamAbdelRahman fbshipit-source-id: 94985e3 18 November 2016, 02:39:15 UTC
635a7bd refactor TableCache Get/NewIterator for single exit points Summary: these functions were too complicated to change with exit points everywhere, so refactored them. btw, please review urgently, this is a prereq to fix the 5.0 perf regression Closes https://github.com/facebook/rocksdb/pull/1534 Differential Revision: D4198972 Pulled By: ajkr fbshipit-source-id: 04ebfb7 17 November 2016, 22:39:13 UTC
f39452e Fix heap use after free ASAN/Valgrind Summary: Dont use c_str() of temp std::string in RocksLuaCompactionFilter::Name() Closes https://github.com/facebook/rocksdb/pull/1535 Differential Revision: D4199094 Pulled By: IslamAbdelRahman fbshipit-source-id: e56ce62 17 November 2016, 20:24:12 UTC
a4eb738 Allow plain table to store index on file with bloom filter disabled Summary: Currently plain table bloom filter is required if storing metadata on file. Remove the constraint. Closes https://github.com/facebook/rocksdb/pull/1525 Differential Revision: D4190977 Pulled By: siying fbshipit-source-id: be60442 17 November 2016, 19:09:13 UTC
36e4762 Remove Ticker::SEQUENCE_NUMBER Summary: Remove the ticker count because: * Having to reset the ticker count in WriteImpl is ineffiecent; * It doesn't make sense to have it as a ticker count if multiple db instance share a statistics object. Closes https://github.com/facebook/rocksdb/pull/1531 Differential Revision: D4194442 Pulled By: yiwu-arbug fbshipit-source-id: e2110a9 17 November 2016, 06:39:09 UTC
86eb2b9 Fix src.mk 17 November 2016, 02:05:19 UTC
0765bab Remove LATEST_BACKUP file Summary: This has been unused since D42069 but kept around for backward compatibility. I think it is unlikely anyone will use a much older version of RocksDB for restore than they use for backup, so I propose removing it. It is also causing recurring confusion, e.g., https://www.facebook.com/groups/rocksdb.dev/permalink/980454015386446/ Ported from https://reviews.facebook.net/D60735 Closes https://github.com/facebook/rocksdb/pull/1529 Differential Revision: D4194199 Pulled By: ajkr fbshipit-source-id: 82f9bf4 17 November 2016, 01:24:15 UTC
647eafd Introduce Lua Extension: RocksLuaCompactionFilter Summary: This diff includes an implementation of CompactionFilter that allows users to write CompactionFilter in Lua. With this ability, users can dynamically change compaction filter logic without requiring building the rocksdb binary and restarting the database. To compile, WITH_LUA_PATH must be specified to the base directory of lua. Closes https://github.com/facebook/rocksdb/pull/1478 Differential Revision: D4150138 Pulled By: yhchiang fbshipit-source-id: ed84222 16 November 2016, 23:39:12 UTC
760ef68 fix deleterange asan issue Summary: pinned_iters_mgr_ pins iterators allocated with arena_, so we should order the instance variable declarations such that the pinned iterators have their destructors executed before the arena is destroyed. Closes https://github.com/facebook/rocksdb/pull/1528 Differential Revision: D4191984 Pulled By: ajkr fbshipit-source-id: 1386f20 16 November 2016, 22:09:07 UTC
327085b fix valgrind Summary: Closes https://github.com/facebook/rocksdb/pull/1526 Differential Revision: D4191257 Pulled By: ajkr fbshipit-source-id: d09dc76 16 November 2016, 20:09:11 UTC
715591b Ask travis to use JDK 7 Summary: yhchiang This may or may not help Closes https://github.com/facebook/rocksdb/pull/1385 Differential Revision: D4098424 Pulled By: yhchiang fbshipit-source-id: 9f9782e 16 November 2016, 18:54:12 UTC
972e3ff Enable allow_concurrent_memtable_write and enable_write_thread_adaptive_yield by default Summary: Closes https://github.com/facebook/rocksdb/pull/1496 Differential Revision: D4168080 Pulled By: siying fbshipit-source-id: 056ae62 16 November 2016, 17:39:09 UTC
420bdb4 option_change_migration_test: force full compaction when needed Summary: When option_change_migration_test decides to go with a full compaction, we don't force a compaction but allow trivial move. This can cause assert failure if the destination is level 0. Fix it by forcing the full compaction to skip trivial move if the destination level is L0. Closes https://github.com/facebook/rocksdb/pull/1518 Differential Revision: D4183610 Pulled By: siying fbshipit-source-id: dea482b 16 November 2016, 06:09:34 UTC
1543d5d Report memory usage by memtable insert hints map. Summary: It is hard to measure acutal memory usage by std containers. Even providing a custom allocator will miss count some of the usage. Here we only do a wild guess on its memory usage. Closes https://github.com/facebook/rocksdb/pull/1511 Differential Revision: D4179945 Pulled By: yiwu-arbug fbshipit-source-id: 32ab929 16 November 2016, 04:24:13 UTC
018bb2e DeleteRange support for db_bench Summary: Added a few options to configure when to add range tombstones during any benchmark involving writes. Closes https://github.com/facebook/rocksdb/pull/1522 Differential Revision: D4187388 Pulled By: ajkr fbshipit-source-id: 2c8a473 16 November 2016, 01:39:47 UTC
dc51bd7 CMakeLists.txt: FreeBSD has jemalloc as default malloc Summary: This will allow reference to `malloc_stats_print` Closes https://github.com/facebook/rocksdb/pull/1516 Differential Revision: D4187258 Pulled By: siying fbshipit-source-id: 34ae9f9 16 November 2016, 01:39:47 UTC
48e8bae Decouple data iterator and range deletion iterator in TableCache Summary: Previously we used TableCache::NewIterator() for multiple purposes (data block iterator and range deletion iterator), and returned non-ok status in the data block iterator. In one case where the caller only used the range deletion block iterator (https://github.com/facebook/rocksdb/blob/9e7cf3469bc626b092ec48366d12873ecab22b4e/db/version_set.cc#L965-L973), we didn't check/free the data block iterator containing non-ok status, which caused a valgrind error. So, this diff decouples creation of data block and range deletion block iterators, and updates the callers accordingly. Both functions can return non-ok status in an InternalIterator. Since the non-ok status is returned in an iterator that the callers will definitely use, it should be more usable/less error-prone. Closes https://github.com/facebook/rocksdb/pull/1513 Differential Revision: D4181423 Pulled By: ajkr fbshipit-source-id: 835b8f5 16 November 2016, 01:24:28 UTC
4b0aa3c Fix failed compaction_filter_example and add it into make all Summary: Simple patch as title Closes https://github.com/facebook/rocksdb/pull/1512 Differential Revision: D4186994 Pulled By: siying fbshipit-source-id: 880f9b8 16 November 2016, 01:09:10 UTC
53b693f ldb support for range delete Summary: Add a subcommand to ldb with which we can delete a range of keys. Closes https://github.com/facebook/rocksdb/pull/1521 Differential Revision: D4186338 Pulled By: ajkr fbshipit-source-id: b8e9861 15 November 2016, 23:54:20 UTC
661e4c9 DeleteRange unsupported in non-block-based tables Summary: Return an error from DeleteRange() (or Write() if the user is using the low-level WriteBatch API) if an unsupported table type is configured. Closes https://github.com/facebook/rocksdb/pull/1519 Differential Revision: D4185933 Pulled By: ajkr fbshipit-source-id: abcdf84 15 November 2016, 23:24:16 UTC
489d142 DeleteRange interface Summary: Expose DeleteRange() interface since we think the implementation is functionally correct now. Closes https://github.com/facebook/rocksdb/pull/1503 Differential Revision: D4171921 Pulled By: ajkr fbshipit-source-id: 5e21c98 15 November 2016, 23:24:16 UTC
eba99c2 Fix min_write_buffer_number_to_merge = 0 bug Summary: It's possible that we set min_write_buffer_number_to_merge to 0. This should never happen Closes https://github.com/facebook/rocksdb/pull/1515 Differential Revision: D4183356 Pulled By: yiwu-arbug fbshipit-source-id: c9d39d7 15 November 2016, 21:54:08 UTC
2ef92fe Remove all instances of relative_url until GitHub pages problem is fixed. I am in email thread with GitHub support about what is happening here. 15 November 2016, 15:40:18 UTC
91300d0 Dynamic max_total_wal_size option Summary: Closes https://github.com/facebook/rocksdb/pull/1509 Differential Revision: D4176426 Pulled By: yiwu-arbug fbshipit-source-id: b57689d 15 November 2016, 06:54:17 UTC
back to top