https://github.com/facebook/rocksdb

sort by:
Revision Author Date Message Commit Date
8e68ffb Fix deadlock when calling getMergedHistogram Summary: When calling StatisticsImpl::HistogramInfo::getMergedHistogram(), if there is a dying thread, which is calling ThreadLocalPtr::StaticMeta::OnThreadExit() to merge its thread values to HistogramInfo, deadlock will occur. Because the former try to hold merge_lock then ThreadMeta::mutex_, but the later try to hold ThreadMeta::mutex_ then merge_lock. In short, the locking order isn't the same. This patch addressed this issue by releasing merge_lock before folding thread values. Closes https://github.com/facebook/rocksdb/pull/1552 Differential Revision: D4211942 Pulled By: ajkr fbshipit-source-id: ef89bcb 09 December 2016, 20:59:51 UTC
41526f4 Remove Ticker::SEQUENCE_NUMBER Summary: Remove the ticker count because: * Having to reset the ticker count in WriteImpl is ineffiecent; * It doesn't make sense to have it as a ticker count if multiple db instance share a statistics object. Closes https://github.com/facebook/rocksdb/pull/1531 Differential Revision: D4194442 Pulled By: yiwu-arbug fbshipit-source-id: e2110a9 09 December 2016, 20:55:55 UTC
1412fd9 Bumb version to 4.13.4 09 December 2016, 03:06:47 UTC
bda9def Use skiplist rep for range tombstone memtable Summary: somehow missed committing this update in D62217 Test Plan: make check Reviewers: sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65361 09 December 2016, 03:03:02 UTC
5467d7a Kill flashcache code in RocksDB Summary: Now that we have userspace persisted cache, we don't need flashcache anymore. Closes https://github.com/facebook/rocksdb/pull/1588 Differential Revision: D4245114 Pulled By: igorcanadi fbshipit-source-id: e2c1c72 01 December 2016, 21:29:14 UTC
28b01eb Fix implicit conversion between int64_t to int Summary: Make conversion explicit, implicit conversion breaks the build Closes https://github.com/facebook/rocksdb/pull/1589 Differential Revision: D4245158 Pulled By: IslamAbdelRahman fbshipit-source-id: aaec00d 29 November 2016, 20:05:21 UTC
82fb020 Bumb version to 4.13.3 29 November 2016, 02:49:33 UTC
c4d49a7 Avoid intentional overflow in GetL0ThresholdSpeedupCompaction Summary: https://github.com/facebook/rocksdb/commit/99c052a34f93d119b75eccdcd489ecd581d48ee9 fixes integer overflow in GetL0ThresholdSpeedupCompaction() by checking if int become -ve. UBSAN will complain about that since this is still an overflow, we can fix the issue by simply using int64_t Closes https://github.com/facebook/rocksdb/pull/1582 Differential Revision: D4241525 Pulled By: IslamAbdelRahman fbshipit-source-id: b3ae21f 29 November 2016, 02:48:57 UTC
60f60c8 disable UBSAN for functions with intentional -ve shift / overflow Summary: disable UBSAN for functions with intentional left shift on -ve number / overflow These functions are rocksdb:: Hash FixedLengthColBufEncoder::Append FaultInjectionTest:: Key Closes https://github.com/facebook/rocksdb/pull/1577 Differential Revision: D4240801 Pulled By: IslamAbdelRahman fbshipit-source-id: 3e1caf6 29 November 2016, 02:20:38 UTC
826b501 Fix integer overflow in GetL0ThresholdSpeedupCompaction (#1378) 29 November 2016, 02:20:29 UTC
b91318f Fix CompactionJob::Install division by zero Summary: Fix CompactionJob::Install division by zero Closes https://github.com/facebook/rocksdb/pull/1580 Differential Revision: D4240794 Pulled By: IslamAbdelRahman fbshipit-source-id: 7286721 29 November 2016, 01:11:35 UTC
55cb17e fix options_test ubsan Summary: Having -ve value for max_write_buffer_number does not make sense and cause us to do a left shift on a -ve value number Closes https://github.com/facebook/rocksdb/pull/1579 Differential Revision: D4240798 Pulled By: IslamAbdelRahman fbshipit-source-id: bd6267e 29 November 2016, 01:11:29 UTC
71cd1c2 Fix compaction_job.cc division by zero Summary: Fix division by zero in compaction_job.cc Closes https://github.com/facebook/rocksdb/pull/1575 Differential Revision: D4240818 Pulled By: IslamAbdelRahman fbshipit-source-id: a8bc757 29 November 2016, 01:11:18 UTC
d800ab7 option_change_migration_test: force full compaction when needed Summary: When option_change_migration_test decides to go with a full compaction, we don't force a compaction but allow trivial move. This can cause assert failure if the destination is level 0. Fix it by forcing the full compaction to skip trivial move if the destination level is L0. Closes https://github.com/facebook/rocksdb/pull/1518 Differential Revision: D4183610 Pulled By: siying fbshipit-source-id: dea482b 18 November 2016, 22:25:46 UTC
0418aab Bump version to 4.13.2 18 November 2016, 06:19:41 UTC
753499f Allow plain table to store index on file with bloom filter disabled Summary: Currently plain table bloom filter is required if storing metadata on file. Remove the constraint. Closes https://github.com/facebook/rocksdb/pull/1525 Differential Revision: D4190977 Pulled By: siying fbshipit-source-id: be60442 17 November 2016, 21:41:45 UTC
2a21b04 Fix min_write_buffer_number_to_merge = 0 bug Summary: It's possible that we set min_write_buffer_number_to_merge to 0. This should never happen Closes https://github.com/facebook/rocksdb/pull/1515 Differential Revision: D4183356 Pulled By: yiwu-arbug fbshipit-source-id: c9d39d7 17 November 2016, 21:41:24 UTC
35b5a76 Fix SstFileWriter destructor Summary: If user did not call SstFileWriter::Finish() or called Finish() but it failed. We need to abandon the builder, to avoid destructing it while it's open Closes https://github.com/facebook/rocksdb/pull/1502 Differential Revision: D4171660 Pulled By: IslamAbdelRahman fbshipit-source-id: ab6f434 14 November 2016, 23:46:39 UTC
749cc74 Fix Forward Iterator Seek()/SeekToFirst() Summary: In ForwardIterator::SeekInternal(), we may end up passing empty Slice representing an internal key to InternalKeyComparator::Compare. and when we try to extract the user key from this empty Slice, we will create a slice with size = 0 - 8 ( which will overflow and cause us to read invalid memory as well ) Scenarios to reproduce these issues are in the unit tests Closes https://github.com/facebook/rocksdb/pull/1467 Differential Revision: D4136660 Pulled By: lightmark fbshipit-source-id: 151e128 12 November 2016, 00:36:43 UTC
42b82e7 fix open failure with empty wal Summary: Closes https://github.com/facebook/rocksdb/pull/1490 Differential Revision: D4158821 Pulled By: IslamAbdelRahman fbshipit-source-id: 59b73f4 10 November 2016, 06:27:19 UTC
a71e393 Fix compile 10 November 2016, 00:16:58 UTC
f74f512 Fix deadlock between (WriterThread/Compaction/IngestExternalFile) Summary: A deadlock is possible if this happen (1) Writer thread is stopped because it's waiting for compaction to finish (2) Compaction is waiting for current IngestExternalFile() calls to finish (3) IngestExternalFile() is waiting to be able to acquire the writer thread (4) WriterThread is held by stopped writes that are waiting for compactions to finish This patch fix the issue by not incrementing num_running_ingest_file_ except when we acquire the writer thread. This patch include a unittest to reproduce the described scenario Closes https://github.com/facebook/rocksdb/pull/1480 Differential Revision: D4151646 Pulled By: IslamAbdelRahman fbshipit-source-id: 09b39db 10 November 2016, 00:10:22 UTC
b2b06e5 Bump version to 4.13.1 10 November 2016, 00:07:54 UTC
7faa39b Revert "forbid merge during recovery" This reverts commit 715256338ae21bc7cdfa21b720ed73143c61e5aa. 10 November 2016, 00:07:20 UTC
f201a44 Support IngestExternalFile (remove AddFile restrictions) Summary: Changes in the diff API changes: - Introduce IngestExternalFile to replace AddFile (I think this make the API more clear) - Introduce IngestExternalFileOptions (This struct will encapsulate the options for ingesting the external file) - Deprecate AddFile() API Logic changes: - If our file overlap with the memtable we will flush the memtable - We will find the first level in the LSM tree that our file key range overlap with the keys in it - We will find the lowest level in the LSM tree above the the level we found in step 2 that our file can fit in and ingest our file in it - We will assign a global sequence number to our new file - Remove AddFile restrictions by using global sequence numbers Other changes: - Refactor all AddFile logic to be encapsulated in ExternalSstFileIngestionJob Test Plan: unit tests (still need to add more) addfile_stress (https://reviews.facebook.net/D65037) Reviewers: yiwu, andrewkr, lightmark, yhchiang, sdong Reviewed By: sdong Subscribers: jkedgar, hcz, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D65061 25 October 2016, 17:52:06 UTC
21fb8c0 OptionChangeMigration() to support FIFO compaction Summary: OptionChangeMigration() to support FIFO compaction. If the DB before migration is using FIFO compaction, nothing should be done. If the desitnation option is FIFO options, compact to one single L0 file if the source has more than one levels. Test Plan: Run option_change_migration_test Reviewers: andrewkr, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D65289 25 October 2016, 17:23:58 UTC
74bcb5e revert Prev() in MergingIterator to use previous code in non-prefix-seek mode Summary: Siying suggested to keep old code for normal mode prev() for safety Test Plan: make check -j64 Reviewers: yiwu, andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65439 24 October 2016, 22:29:30 UTC
4edd39f Implement deadlock detection Summary: Implement deadlock detection. This is done by maintaining a TxnID -> TxnID map which represents the edges in the wait for graph (this is named `wait_txn_map_`). Test Plan: transaction_test Reviewers: IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64491 20 October 2016, 02:45:57 UTC
48fd619 Minor fixes to RocksJava Native Library initialization (#1287) * [bugfix] Make sure the Native Library is initialized. Closes https://github.com/facebook/rocksdb/issues/989 * [bugfix] Just load the native libraries once 20 October 2016, 01:21:22 UTC
48e4e84 Disable auto compactions in memory_test and re-enable the test (#1408) Summary: Auto-compactions will change memory usage of DB but memory_test didn't take it into account. This PR disable auto compactions in the test and hopefully it fixes its flakyness. Test Plan: UBSAN build used to catch the flakyness. Run `make ubsan_check` and it passes. 20 October 2016, 01:18:42 UTC
fb2e412 column_family_test: disable some tests in LITE Summary: Some tests in column_family_test depend on functions that are not available in LITE build, which sometimes cause flakiness. Disable them. Test Plan: Run those tests in LITE build. Reviewers: yiwu, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D65271 19 October 2016, 22:55:56 UTC
5af651d fix data race in compact_files_test Summary: fix data race Test Plan: compact_files_test Reviewers: sdong, yiwu, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65259 19 October 2016, 20:37:51 UTC
a0ba0aa Fix uninitialized variable gcc error for MyRocks Summary: make sure seq_ is properly initialized even if ParseInternalKey() fails. Test Plan: run myrocks release tests Reviewers: lightmark, mung, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65199 19 October 2016, 17:59:46 UTC
b88f8e8 Support SST files with Global sequence numbers [reland] Summary: reland https://reviews.facebook.net/D62523 - Update SstFileWriter to include a property for a global sequence number in the SST file `rocksdb.external_sst_file.global_seqno` - Update TableProperties to be aware of the offset of each property in the file - Update BlockBasedTableReader and Block to be able to honor the sequence number in `rocksdb.external_sst_file.global_seqno` property and use it to overwrite all sequence number in the file Something worth mentioning is that we don't update the seqno in the index block since and when doing a binary search, the reason for that is that it's guaranteed that SST files with global seqno will have only one user_key and each key will have seqno=0 encoded in it, This mean that this key is greater than any other key with seqno> 0. That mean that we can actually keep the current logic for these blocks Test Plan: unit tests Reviewers: sdong, yhchiang Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D65211 18 October 2016, 23:59:37 UTC
08616b4 [db_bench] add filldeterministic (Universal+level compaction) Summary: in db_bench, we can dynamically create a rocksdb database that guarantees the shape of its LSM. universal + level compaction no fifo compaction no multi db support Test Plan: ./db_bench -benchmarks=fillseqdeterministic -compaction_style=1 -num_levels=3 --disable_auto_compactions -num=1000000 -value_size=1000 ``` ---------------------- LSM --------------------- Level[0]: /000480.sst(size: 35060275 bytes) Level[0]: /000479.sst(size: 70443197 bytes) Level[0]: /000478.sst(size: 141600383 bytes) Level[1]: /000341.sst - /000475.sst(total size: 284726629 bytes) Level[2]: /000071.sst - /000340.sst(total size: 568649806 bytes) fillseqdeterministic : 60.447 micros/op 16543 ops/sec; 16.0 MB/s ``` Reviewers: sdong, andrewkr, IslamAbdelRahman, yhchiang Reviewed By: yhchiang Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D63111 18 October 2016, 23:30:57 UTC
52c9808 not split file in compaciton on level 0 Summary: we should not split file on level 0 in compaction because it will fail the following verification of seqno order on level 0 Test Plan: check with filldeterministic in db_bench Reviewers: yhchiang, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65193 18 October 2016, 23:30:34 UTC
5e0d6b4 fix db_stress assertion failure Summary: in rocksdb::DBIter::FindValueForCurrentKey(), last_not_merge_type could also be SingleDelete() which is omitted Test Plan: db_iter_test Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65187 18 October 2016, 23:07:10 UTC
ab53998 Bump RocksDB version to 4.13 (#1405) Summary: Bump RocksDB version to 4.13 Test Plan: unit tests Reviewers: sdong, IslamAbdelRahman, andrewkr, lightmark Subscribers: leveldb 18 October 2016, 22:39:10 UTC
b4d0712 SamePrefixTest.InDomainTest to clear the test directory before testing Summary: SamePrefixTest.InDomainTest may fail if the previous run of some test cases in prefix_test fail. Test Plan: Run the test Reviewers: lightmark, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D65163 18 October 2016, 21:01:10 UTC
aa09d03 Avoid calling GetDBOptions() inside GetFromBatchAndDB() Summary: MyRocks hit a regression, @mung generated perf reports showing that the reason is the cost of calling `GetDBOptions()` inside `GetFromBatchAndDB()` This diff avoid calling `GetDBOptions` and use the `ImmutableDBOptions` instead Test Plan: make check -j64 Reviewers: sdong, yiwu Reviewed By: yiwu Subscribers: andrewkr, dhruba, mung Differential Revision: https://reviews.facebook.net/D65151 18 October 2016, 20:19:26 UTC
6fbe96b Compaction Support for Range Deletion Summary: This diff introduces RangeDelAggregator, which takes ownership of iterators provided to it via AddTombstones(). The tombstones are organized in a two-level map (snapshot stripe -> begin key -> tombstone). Tombstone creation avoids data copy by holding Slices returned by the iterator, which remain valid thanks to pinning. For compaction, we create a hierarchical range tombstone iterator with structure matching the iterator over compaction input data. An aggregator based on that iterator is used by CompactionIterator to determine which keys are covered by range tombstones. In case of merge operand, the same aggregator is used by MergeHelper. Upon finishing each file in the compaction, relevant range tombstones are added to the output file's range tombstone metablock and file boundaries are updated accordingly. To check whether a key is covered by range tombstone, RangeDelAggregator::ShouldDelete() considers tombstones in the key's snapshot stripe. When this function is used outside of compaction, it also checks newer stripes, which can contain covering tombstones. Currently the intra-stripe check involves a linear scan; however, in the future we plan to collapse ranges within a stripe such that binary search can be used. RangeDelAggregator::AddToBuilder() adds all range tombstones in the table's key-range to a new table's range tombstone meta-block. Since range tombstones may fall in the gap between files, we may need to extend some files' key-ranges. The strategy is (1) first file extends as far left as possible and other files do not extend left, (2) all files extend right until either the start of the next file or the end of the last range tombstone in the gap, whichever comes first. One other notable change is adding release/move semantics to ScopedArenaIterator such that it can be used to transfer ownership of an arena-allocated iterator, similar to how unique_ptr is used for malloc'd data. Depends on D61473 Test Plan: compaction_iterator_test, mock_table, end-to-end tests in D63927 Reviewers: sdong, IslamAbdelRahman, wanning, yhchiang, lightmark Reviewed By: lightmark Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D62205 18 October 2016, 19:04:56 UTC
257de78 Remove "-Xcheck:jni" from Java tests (#1402) Summary: Junit and our code generate lots of warning if "-Xcheck:jni" is on and force Travis to fail as the logs are too long. Test Plan: "make jtest" and see the warnings go away. 18 October 2016, 13:18:24 UTC
d88dff4 add seeforprev in history Summary: update new feature in history and avoid breaking mongorocks Test Plan: make check Reviewers: sdong, yiwu, andrewkr Reviewed By: andrewkr Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64611 17 October 2016, 22:34:13 UTC
5027dd1 Fix a minor bug in the ldb tool that was not selecting the specified (#1399) column family for compaction. 17 October 2016, 17:40:30 UTC
fea6fdd Fix @see in two Java functions (#1396) 15 October 2016, 06:03:17 UTC
b1031d6 Remove function local statics that interfere with memory pooling (#1392) 14 October 2016, 20:09:18 UTC
f470540 Handle WAL deletion when using avoid_flush_during_recovery Summary: Previously the WAL files that were avoided during recovery would never be considered for deletion. That was because alive_log_files_ was only populated when log files are created. This diff further populates alive_log_files_ with existing log files that aren't flushed during recovery, such that FindObsoleteFiles() can find them later. Depends on D64053. Test Plan: new unit test, verifies it fails before this change and passes after Reviewers: sdong, IslamAbdelRahman, yiwu Reviewed By: yiwu Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64059 14 October 2016, 19:59:51 UTC
e29d3b6 Make max_background_compactions and base_background_compactions dynamic changeable Summary: Add DB::SetDBOptions to dynamic change max_background_compactions and base_background_compactions. I'll add more dynamic changeable options soon. Test Plan: unit test. Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64749 14 October 2016, 19:25:39 UTC
21e8dac fix assertion failure in Prev() Summary: fix assertion failure in db_stress. It happens because of prefix seek key is larger than merge iterator key when they have the same user key Test Plan: ./db_stress --max_background_compactions=1 --max_write_buffer_number=3 --sync=0 --reopen=20 --write_buffer_size=33554432 --delpercent=5 --log2_keys_per_lock=10 --block_size=16384 --allow_concurrent_memtable_write=0 --test_batches_snapshots=0 --max_bytes_for_level_base=67108864 --progress_reports=0 --mmap_read=0 --writepercent=35 --disable_data_sync=0 --readpercent=50 --subcompactions=4 --ops_per_thread=20000000 --memtablerep=skip_list --prefix_size=0 --target_file_size_multiplier=1 --column_families=1 --threads=32 --disable_wal=0 --open_files=500000 --destroy_db_initially=0 --target_file_size_base=16777216 --nooverwritepercent=1 --iterpercent=10 --max_key=100000000 --prefixpercent=0 --use_clock_cache=false --kill_random_test=888887 --cache_size=1048576 --verify_checksum=1 Reviewers: sdong, andrewkr, yiwu, yhchiang Reviewed By: yhchiang Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D65025 14 October 2016, 00:36:48 UTC
b9311aa Implement WinRandomRW file and improve code reuse (#1388) 13 October 2016, 23:36:34 UTC
a249a0b check_format_compatible.sh to use some branch which allows to run with GCC 4.8 (#1393) Summary: Some older tags don't run GCC 4.8 with FB internal setting. Fixed them and created branches. Change the format compatible script accordingly. Also add more releases to check format compatibility. 13 October 2016, 23:15:55 UTC
040328a Remove an assertion for single-delete in MergeHelper::MergeUntil Summary: Previously we have an assertion which triggers when we issue Merges after a single delete. However, merges after a single delete are unrelated to that single delete. Thus this behavior should be allowed. This will address a flakyness of db_stress. Test Plan: db_stress Reviewers: IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64923 13 October 2016, 21:26:57 UTC
8cbe3e1 Relax the acceptable bias RateLimiterTest::Rate test be 25% Summary: In the current implementation of RateLimiter, the difference between the configured rate and the actual rate might be more than 20%, while our test only allows 15% difference. This diff relaxes the acceptable bias RateLimiterTest::Rate test be 25% to make the test less flaky. Test Plan: rate_limiter_test Reviewers: IslamAbdelRahman, andrewkr, yiwu, lightmark, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64941 13 October 2016, 21:26:12 UTC
f26a139 Log successful AddFile Summary: Log successful AddFile Test Plan: visually check LOG file Reviewers: yiwu, andrewkr, lightmark, sdong Reviewed By: sdong Subscribers: andrewkr, jkedgar, dhruba Differential Revision: https://reviews.facebook.net/D65019 13 October 2016, 18:56:27 UTC
5691a1d Fix compaction conflict with running compaction Summary: Issue scenario: (1) We have 3 files in L1 and we issue a compaction that will compact them into 1 file in L2 (2) While compaction (1) is running, we flush a file into L0 and trigger another compaction that decide to move this file to L1 and then move it again to L2 (this file don't overlap with any other files) (3) compaction (1) finishes and install the file it generated in L2, but this file overlap with the file we generated in (2) so we break the LSM consistency Looks like this issue can be triggered by using non-exclusive manual compaction or AddFile() Test Plan: unit tests Reviewers: sdong Reviewed By: sdong Subscribers: hermanlee4, jkedgar, andrewkr, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D64947 13 October 2016, 17:49:06 UTC
017de66 fixup commit Summary: I accidentally left out these changes from my commit of D64053 due to messing up the merge conflict resolution. Test Plan: ./db_wal_test Reviewers: Subscribers: Tasks: Blame Revision: D64053 13 October 2016, 15:48:40 UTC
1b7af5f Redo handling of recycled logs in full purge Summary: This reverts commit https://github.com/facebook/rocksdb/commit/9e4aa798c3d47c6be64324bd9d38f0813c8ead7b, which doesn't handle all cases (see inline comment). I reimplemented the logic as suggested in the initial PR: https://github.com/facebook/rocksdb/pull/1313. This approach has two benefits: - All the parsing/filtering of full_scan_candidate_files is kept together in PurgeObsoleteFiles. - We only need to check whether log file is recycled in one place where we've already determined it's a log file Test Plan: new unit test, verified fails before the original fix, still passes now. Reviewers: IslamAbdelRahman, yiwu, sdong Reviewed By: yiwu, sdong Subscribers: leveldb, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64053 13 October 2016, 06:13:09 UTC
27bfe32 Editorial change to README.md 13 October 2016, 03:24:50 UTC
89cc404 A bit of doc restructuring 13 October 2016, 03:23:00 UTC
9e7fda8 Fix arcanist Summary: Set no_proxy to fix arcanist Test Plan: will check if tests are triggered Reviewers: arahut, yiwu, lightmark, andrewkr, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D65001 13 October 2016, 03:11:30 UTC
2e4b5ca Add missing RateLimiter class to the Windows build (#1382) 12 October 2016, 22:00:37 UTC
ce4963f [doc] Document that Visual Studio 2015+ is now required for Windows builds (#1389) Closes https://github.com/facebook/rocksdb/issues/1377 12 October 2016, 20:40:20 UTC
e489270 Fix scoped arena iterator (#1387) 12 October 2016, 18:16:16 UTC
f8d8cf5 Fix log_write_bench -bytes_per_sync option. (#1375) Hello and thanks for RocksDB, When log_write_bench is run with the -bytes_per_sync option, the option does not influence any *sync* behaviour. > strace -e trace=write,sync_file_range ./log_write_bench -record_interval 0 -record_size 1048576 -num_records 11 -bytes_per_sync 2097152 2>&1 | egrep '^(sync|write.*XXXX)' write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 I suspect that this is because the bytes_per_sync option now needs to be using a `WritableFileWriter` and not a `WritableFile`. With the diff below applied, it changes to: > strace -e trace=write,sync_file_range ./log_write_bench -record_interval 0 -record_size 1048576 -num_records 11 -bytes_per_sync 2097152 2>&1 | egrep '^(sync|write.*XXXX)' write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0, 0x200000, 0x2) = 0 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0x200000, 0x200000, 0x2) = 0 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0x400000, 0x200000, 0x2) = 0 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0x600000, 0x200000, 0x2) = 0 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576 sync_file_range(0x3, 0x800000, 0x200000, 0x2) = 0 ( Note that the first 1MB is not synced as mentioned in util/file_reader_writer.cc::WritableFileWriter::Flush() ) This diff also includes the fix from https://github.com/facebook/rocksdb/pull/1373 > diff -du util/log_write_bench.cc.orig util/log_write_bench.cc --- util/log_write_bench.cc.orig 2016-10-04 12:06:29.115122580 -0400 +++ util/log_write_bench.cc 2016-10-05 07:24:09.677037576 -0400 @@ -14,6 +14,7 @@ #include <gflags/gflags.h> #include "rocksdb/env.h" +#include "util/file_reader_writer.h" #include "util/histogram.h" #include "util/testharness.h" #include "util/testutil.h" @@ -38,19 +39,21 @@ env_options.bytes_per_sync = FLAGS_bytes_per_sync; unique_ptr<WritableFile> file; env->NewWritableFile(file_name, &file, env_options); + unique_ptr<WritableFileWriter> writer; + writer.reset(new WritableFileWriter(std::move(file), env_options)); std::string record; - record.assign('X', FLAGS_record_size); + record.assign(FLAGS_record_size, 'X'); HistogramImpl hist; uint64_t start_time = env->NowMicros(); for (int i = 0; i < FLAGS_num_records; i++) { uint64_t start_nanos = env->NowNanos(); - file->Append(record); - file->Flush(); + writer->Append(record); + writer->Flush(); if (FLAGS_enable_sync) { - file->Sync(); + writer->Sync(false); } hist.Add(env->NowNanos() - start_nanos); 11 October 2016, 23:45:51 UTC
02b3e39 Make txn->GetState() const Summary: makes Transaction::GetState() a const function. Test Plan: compiles. Reviewers: mung Reviewed By: mung Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64929 11 October 2016, 22:48:50 UTC
447f171 new Prev() prefix support using SeekForPrev() Summary: 1) The previous solution for Prev() prefix support is not clean. Since I add api SeekForPrev(), now the Prev() can be symmetric to Next(). and we do not need SeekToLast() to be called in Prev() any more. Also, Next() will Seek(prefix_seek_key_) to solve the problem of possible inconsistency between db_iter and merge_iter when there is merge_operator. And prefix_seek_key is only refreshed when change direction to forward. 2) This diff also solves the bug of Iterator::SeekToLast() with iterate_upper_bound_ with prefix extractor. add test cases for the above two cases. There are some tests for the SeekToLast() in Prev(), I will clean them later. Test Plan: make all check Reviewers: IslamAbdelRahman, andrewkr, yiwu, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D63933 11 October 2016, 20:54:26 UTC
991b585 More block cache tickers Summary: Adding several missing block cache tickers. Test Plan: make all check Reviewers: IslamAbdelRahman, yhchiang, lightmark Reviewed By: lightmark Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64881 11 October 2016, 18:59:05 UTC
d6ae6de Add Statistics::getAndResetTickerCount(). Summary: A convience method to atomically get and reset ticker count. I'm wanting to use it to have a thin wrapper to the statistics object to export ticker counts to ODS for LogDevice (since they don't even use fb303). Test Plan: test in LogDevice shadow cluster. https://fburl.com/461868822 Reviewers: andrewkr, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64869 11 October 2016, 17:54:11 UTC
aea3ce4 Avoid string CONCAT which is not supported in cmake 2.6 (#1383) Signed-off-by: Bassam Tabbara <bassam.tabbara@quantum.com> 11 October 2016, 00:32:04 UTC
2ad68b9 Support running consistency checks in release mode Summary: We always run consistency checks when compiling in debug mode allow users to set Options::force_consistency_checks to true to be able to run such checks even when compiling in release mode Test Plan: make check -j64 make release Reviewers: lightmark, sdong, yiwu Reviewed By: yiwu Subscribers: hermanlee4, andrewkr, yoshinorim, jkedgar, dhruba Differential Revision: https://reviews.facebook.net/D64701 08 October 2016, 00:21:45 UTC
67501cf Fix -ve std::string::resize Summary: I saw this exception thrown because sometimes we may resize with -ve value if we have empty max_bytes_for_level_multiplier_additional vector Test Plan: run the tests Reviewers: yiwu Reviewed By: yiwu Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64791 08 October 2016, 00:16:13 UTC
04b02dd Testing asset links after config change 07 October 2016, 23:28:44 UTC
8c55bb8 Make Lock Info test multiple column families Summary: Modifies the lock info export test to test multiple column families after I was experiencing a bug while developing the MyRocks front-end for this. Test Plan: is test. Reviewers: mung Reviewed By: mung Subscribers: andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64725 07 October 2016, 22:04:05 UTC
d062328 Revert "Support SST files with Global sequence numbers" This reverts commit ab01da5437385e3142689077c647a3b13ba3402f. 07 October 2016, 21:05:12 UTC
5cd2883 [RocksJava] Adjusted RateLimiter to 3.10.0 (#1368) Summary: - Deprecated RateLimiterConfig and GenericRateLimiterConfig - Introduced RateLimiter It is now possible to use all C++ related methods also in RocksJava. A noteable method is setBytesPerSecond which can change the allowed number of bytes per second at runtime. Test Plan: make rocksdbjava make jtest Reviewers: adamretter, yhchiang, ankgup87 Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35715 07 October 2016, 19:32:21 UTC
37737c3 Expose Transaction State Publicly Summary: This exposes a transactions state through a public api rather than through a public member variable. I also do some name refactoring. ExecutionStatus => TransactionState exec_status_ => trx_state_ Test Plan: It compiles and transaction_test passes. Reviewers: IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, mung, dhruba, sdong Differential Revision: https://reviews.facebook.net/D64689 07 October 2016, 18:58:53 UTC
2c1f952 Add facility to write only a portion of WriteBatch to WAL Summary: When constructing a write batch a client may now call MarkWalTerminationPoint() on that batch. No batch operations after this call will be added written to the WAL but will still be inserted into the Memtable. This facility is used to remove one of the three WriteImpl calls in 2PC transactions. This produces a ~1% perf improvement. ``` RocksDB - unoptimized 2pc, sync_binlog=1, disable_2pc=off INFO 2016-08-31 14:30:38,814 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2619 seconds. Requests/second = 28628 RocksDB - optimized 2pc , sync_binlog=1, disable_2pc=off INFO 2016-08-31 16:26:59,442 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2581 seconds. Requests/second = 29054 ``` Test Plan: Two unit tests added. Reviewers: sdong, yiwu, IslamAbdelRahman Reviewed By: yiwu Subscribers: hermanlee4, dhruba, andrewkr Differential Revision: https://reviews.facebook.net/D64599 07 October 2016, 18:32:10 UTC
043cb62 Fix record_size in log_write_bench, swap args to std::string::assign. (#1373) Hello and thank you for RocksDB, I noticed when using log_write_bench that writes were always 88 bytes: > strace -e trace=write ./log_write_bench -num_records 2 2>&1 | head -n 2 write(3, "\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371"..., 88) = 88 write(3, "\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371"..., 88) = 88 > strace -e trace=write ./log_write_bench -record_size 4096 -num_records 2 2>&1 | head -n 2 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 88) = 88 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 88) = 88 I think this should be: << record.assign('X', FLAGS_record_size); >> record.assign(FLAGS_record_size, 'X'); So fill and not buffer. Otherwise I always see writes of size 88 (the decimal value for chr "X"). string& assign (const char* s, size_t n); buffer - Copies the first n characters from the array of characters pointed by s. string& assign (size_t n, char c); fill - Replaces the current value by n consecutive copies of character c. perl -le 'print ord "X"' 88 With the change: > strace -e trace=write ./log_write_bench -record_size 4096 -num_records 2 2>&1 | head -n 2 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 4096) = 4096 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 4096) = 4096 > strace -e trace=write ./log_write_bench -num_records 2 2>&1 | head -n 2 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 249) = 249 write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 249) = 249 Thanks. https://github.com/facebook/rocksdb/commit/01c27be5fb42524c5052b4b4a23e05501e1d1421 https://reviews.facebook.net/D16239 06 October 2016, 17:45:31 UTC
4985f60 env_mirror: fix a few leaks (#1363) * env_mirror: fix leak from LockFile Signed-off-by: Sage Weil <sage@redhat.com> * env_mirror: instruct EnvMirror whether mirrored Envs should be destroyed The lifecycle rules for Env are frustrating and undocumented. Notably, Env::Default() should *not* be freed, but any Env instances we created should be. Explicitly instruct EnvMirror whether to clean up child Env instances. Default to false so that we do not affect existing callers. Signed-off-by: Sage Weil <sage@redhat.com> 06 October 2016, 17:43:05 UTC
5aded67 update of c.h (#1371) Added rocksdb_options_set_memtable_prefix_bloom_size_ratio function implemented in c.cc but not exported via c.h 06 October 2016, 17:37:19 UTC
912aec1 "Recent Posts" -> "All Posts" Blog sidebar shows all the posts, not just the most recent ones. 05 October 2016, 17:29:11 UTC
7cbb298 Make sure that when contribtuing we call out creating appropriate directories .... if they do not exist 04 October 2016, 22:38:41 UTC
a06ad47 Add top level doc information to CONTRIBUTING.md 04 October 2016, 22:27:28 UTC
3fdd5b9 A little more generic CONTRIBUTING.md 04 October 2016, 22:22:28 UTC
ed4fc31 Add link to CONTRIBUTING.md to main docs README.md 04 October 2016, 22:21:43 UTC
e4922e1 Forgot to truncate one blog post 04 October 2016, 22:20:15 UTC
6d8cd7e Add CONTRIBUTING.md for rocksdb.org contribution guidance 04 October 2016, 22:19:00 UTC
bd55e5a Fix some formatting of compaction blog post 04 October 2016, 21:33:07 UTC
0f60358 CRLF -> LF mod (including removing trailing whitespace for those files) 04 October 2016, 21:31:36 UTC
b90e29c Truncate posts on the main /blog/ page 04 October 2016, 21:20:26 UTC
0d7acad Add author fields to blog posts Now the author associated with fbid will be shown at top of blog post 04 October 2016, 21:11:04 UTC
01be441 Add GitHub link to the landing page header 04 October 2016, 20:49:13 UTC
9d6c961 Fix Mac build 04 October 2016, 01:25:10 UTC
ab01da5 Support SST files with Global sequence numbers Summary: - Update SstFileWriter to include a property for a global sequence number in the SST file `rocksdb.external_sst_file.global_seqno` - Update TableProperties to be aware of the offset of each property in the file - Update BlockBasedTableReader and Block to be able to honor the sequence number in `rocksdb.external_sst_file.global_seqno` property and use it to overwrite all sequence number in the file Something worth mentioning is that we don't update the seqno in the index block since and when doing a binary search, the reason for that is that it's guaranteed that SST files with global seqno will have only one user_key and each key will have seqno=0 encoded in it, This mean that this key is greater than any other key with seqno> 0. That mean that we can actually keep the current logic for these blocks Test Plan: unit tests Reviewers: andrewkr, yhchiang, yiwu, sdong Reviewed By: sdong Subscribers: hcz, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D62523 03 October 2016, 23:12:39 UTC
d346ba2 Minor fixes around Windows 64 Java Artifacts (#1366) 03 October 2016, 18:58:08 UTC
e91b4d0 Add factory method for creating persistent cache that is accessible from public Summary: Currently there is no mechanism to create persistent cache from headers. Adding a simple factory method to create a simple persistent cache with default or NVM optimized settings. note: Any idea to test this factory is appreciated. Test Plan: None Reviewers: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D64527 03 October 2016, 17:55:46 UTC
be1f109 Expose transaction id, lock state information and transaction wait information Summary: This diff does 3 things: Expose TransactionID so that we can identify transactions when we retrieve locking and lock wait information. This is exposed as `Transaction::GetID`. Expose lock state information by locking all stripes in all column families and copying their contents to a data structure. This is exposed as `TransactionDB::GetLockStatusData`. Adds support for tracking the transaction and the key being waited on, and exposes this as `Transaction::GetWaitingTxn`. Test Plan: unit tests Reviewers: horuff, sdong Reviewed By: sdong Subscribers: vasilep, hermanlee4, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D64413 30 September 2016, 18:41:21 UTC
6009c47 Store range tombstones in memtable Summary: - Store range tombstones in a separate MemTableRep instantiated with ColumnFamilyOptions::memtable_factory - MemTable::NewRangeTombstoneIterator() returns a MemTableIterator over the separate MemTableRep - Part of the read path is not implemented yet (i.e., MemTable::Get()) Test Plan: see unit tests Reviewers: wanning Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D62217 30 September 2016, 16:06:43 UTC
3c21c64 Use size hint for HashMap in multiGet. Similar to https://github.com/facebook/rocksdb/pull/1344 (#1367) 29 September 2016, 22:55:53 UTC
13f7a01 Fixing JNI release build for gcc (#975) 29 September 2016, 21:11:32 UTC
back to top