swh:1:snp:5115096b921df712aeb2a08114fede57fb3331fb

sort by:
Revision Author Date Message Commit Date
678d3a1 Updating readme to 2.1 Summary: Test Plan: Reviewers: CC: Task ID: # Blame Rev: 06 August 2013, 21:43:16 UTC
68a4cdf Build fix with merge_test and ttl 06 August 2013, 18:42:21 UTC
e37eb21 minor change to fix build 06 August 2013, 18:02:19 UTC
c2d7826 [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. Summary: Here are the major changes to the Merge Interface. It has been expanded to handle cases where the MergeOperator is not associative. It does so by stacking up merge operations while scanning through the key history (i.e.: during Get() or Compaction), until a valid Put/Delete/end-of-history is encountered; it then applies all of the merge operations in the correct sequence starting with the base/sentinel value. I have also introduced an "AssociativeMerge" function which allows the user to take advantage of associative merge operations (such as in the case of counters). The implementation will always attempt to merge the operations/operands themselves together when they are encountered, and will resort to the "stacking" method if and only if the "associative-merge" fails. This implementation is conjectured to allow MergeOperator to handle the general case, while still providing the user with the ability to take advantage of certain efficiencies in their own merge-operator / data-structure. NOTE: This is a preliminary diff. This must still go through a lot of review, revision, and testing. Feedback welcome! Test Plan: -This is a preliminary diff. I have only just begun testing/debugging it. -I will be testing this with the existing MergeOperator use-cases and unit-tests (counters, string-append, and redis-lists) -I will be "desk-checking" and walking through the code with the help gdb. -I will find a way of stress-testing the new interface / implementation using db_bench, db_test, merge_test, and/or db_stress. -I will ensure that my tests cover all cases: Get-Memtable, Get-Immutable-Memtable, Get-from-Disk, Iterator-Range-Scan, Flush-Memtable-to-L0, Compaction-L0-L1, Compaction-Ln-L(n+1), Put/Delete found, Put/Delete not-found, end-of-history, end-of-file, etc. -A lot of feedback from the reviewers. Reviewers: haobo, dhruba, zshao, emayanke Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11499 06 August 2013, 03:14:32 UTC
73f9518 Fix build Summary: remove reference Test Plan: make OPT=-g Reviewers: CC: Task ID: # Blame Rev: 06 August 2013, 02:22:12 UTC
8e792e5 Add soft_rate_limit stats Summary: This diff adds histogram stats for soft_rate_limit stalls. It also renames the old rate_limit stats to hard_rate_limit. Test Plan: make -j32 check Reviewers: dhruba, haobo, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12021 06 August 2013, 01:45:23 UTC
1d7b476 Expose base db object from ttl wrapper Summary: rocksdb replicaiton will need this when writing value+TS from master to slave 'as is' Test Plan: make Reviewers: dhruba, vamsi, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11919 06 August 2013, 01:44:14 UTC
1036537 Add soft and hard rate limit support Summary: This diff adds support for both soft and hard rate limiting. The following changes are included: 1) Options.rate_limit is renamed to Options.hard_rate_limit. 2) Options.rate_limit_delay_milliseconds is renamed to Options.rate_limit_delay_max_milliseconds. 3) Options.soft_rate_limit is added. 4) If the maximum compaction score is > hard_rate_limit and rate_limit_delay_max_milliseconds == 0, then writes are delayed by 1 ms at a time until the max compaction score falls below hard_rate_limit. 5) If the max compaction score is > soft_rate_limit but <= hard_rate_limit, then writes are delayed by 0-1 ms depending on how close we are to hard_rate_limit. 6) Users can disable 4 by setting hard_rate_limit = 0. They can add a limit to the maximum amount of time waited by setting rate_limit_delay_max_milliseconds > 0. Thus, the old behavior can be preserved by setting soft_rate_limit = 0, which is the default. Test Plan: make -j32 check ./db_stress Reviewers: dhruba, haobo, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12003 05 August 2013, 22:43:49 UTC
cacd812 Support user's compaction filter in TTL logic Summary: TTL uses compaction filter to purge key-values and required the user to not pass one. This diff makes it accommodating of user's compaciton filter. Added test to ttl_test Test Plan: make; ./ttl_test Reviewers: dhruba, haobo, vamsi Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11973 05 August 2013, 18:28:01 UTC
7c9093a Changing Makefile to have rocksdb instead of leveldb in binary-names Summary: did a find-replace Test Plan: make Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11979 05 August 2013, 18:14:01 UTC
c42485f Merge operator for ttl Summary: Implemented a TtlMergeOperator class which inherits from MergeOperator and is TTL aware. It strips out timestamp from existing_value and attaches timestamp to new_value, calling user-provided-Merge in between. Test Plan: make all check Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11775 01 August 2013, 16:27:17 UTC
59d0b02 Expand KeyMayExist to return the proper value if it can be found in memory and also check block_cache Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test Test Plan: make all check;db_stress for 1 hour Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11853 01 August 2013, 16:07:46 UTC
9700677 Slow down writes gradually rather than suddenly Summary: Currently, when a certain number of level0 files (level0_slowdown_writes_trigger) are present, RocksDB will slow down each write by 1ms. There is a second limit of level0 files at which RocksDB will stop writes altogether (level0_stop_writes_trigger). This patch enables the user to supply a third parameter specifying the number of files at which Rocks will start slowing down writes (level0_start_slowdown_writes). When this number is reached, Rocks will slow down writes as a quadratic function of level0_slowdown_writes_trigger - num_level0_files. For some workloads, this improves latency and throughput. I will post some stats momentarily in https://our.intern.facebook.com/intern/tasks/?t=2613384. Test Plan: make -j32 check ./db_stress ./db_bench Reviewers: dhruba, haobo, MarkCallaghan, xjin Reviewed By: xjin CC: leveldb, xjin, zshao Differential Revision: https://reviews.facebook.net/D11859 31 July 2013, 23:20:48 UTC
0f0a24e Make arena block size configurable Summary: Add an option for arena block size, default value 4096 bytes. Arena will allocate blocks with such size. I am not sure about passing parameter to skiplist in the new virtualized framework, though I talked to Jim a bit. So add Jim as reviewer. Test Plan: new unit test, I am running db_test. For passing paramter from configured option to Arena, I tried tests like: TEST(DBTest, Arena_Option) { std::string dbname = test::TmpDir() + "/db_arena_option_test"; DestroyDB(dbname, Options()); DB* db = nullptr; Options opts; opts.create_if_missing = true; opts.arena_block_size = 1000000; // tested 99, 999999 Status s = DB::Open(opts, dbname, &db); db->Put(WriteOptions(), "a", "123"); } and printed some debug info. The results look good. Any suggestion for such a unit-test? Reviewers: haobo, dhruba, emayanke, jpaton Reviewed By: dhruba CC: leveldb, zshao Differential Revision: https://reviews.facebook.net/D11799 31 July 2013, 19:42:23 UTC
542cc10 Fix README contents. Summary: Fix README contents. Test Plan: Reviewers: CC: Task ID: # Blame Rev: 30 July 2013, 15:30:13 UTC
6db52b5 Don't use redundant Env::NowMicros() calls Summary: After my patch for stall histograms, there are redundant calls to NowMicros() by both the stop watches and DBImpl::MakeRoomForWrites. So I removed the redundant calls such that the information is gotten from the stopwatch. Test Plan: make clean make -j32 check Reviewers: dhruba, haobo, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11883 29 July 2013, 22:46:36 UTC
abc90b0 Use specific DB name in merge_test Summary: Currently, merge_test uses /tmp/testdb for the test database. It should really use something more specific to merge_test. Most of the other tests use test::TmpDir() + "/<test name>db". This patch implements such behavior for merge_test; it makes merge_test use test::TmpDir() + "/merge_testdb" Test Plan: make clean make -j32 merge_test ./merge_test Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11877 29 July 2013, 20:26:38 UTC
18afff2 Add stall counts to statistics Summary: Previously, statistics are kept on how much time is spent on stalls of different types. This patch adds support for keeping number of stalls of each type. For example, instead of just reporting how many microseconds are spent waiting for memtables to be compacted, it will also report how many times a write stalled for that to occur. Test Plan: make -j32 check ./db_stress # Not really sure what else should be done... Reviewers: dhruba, MarkCallaghan, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11841 29 July 2013, 17:34:23 UTC
d7ba5bc Revert 6fbe4e981a3d74270a0160445bd993c464c23d76: If disable wal is set, then batch commits are avoided Summary: Revert "If disable wal is set, then batch commits are avoided" because keeping the mutex while inserting into the skiplist means that readers and writes are all serialized on the mutex. Test Plan: Reviewers: CC: Task ID: # Blame Rev: 24 July 2013, 17:01:13 UTC
52d7ecf Virtualize SkipList Interface Summary: This diff virtualizes the skiplist interface so that users can provide their own implementation of a backing store for MemTables. Eventually, the backing store will be responsible for its own synchronization, allowing users (and us) to experiment with different lockless implementations. Test Plan: make clean make -j32 check ./db_stress Reviewers: dhruba, emayanke, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11739 23 July 2013, 21:42:27 UTC
6fbe4e9 If disable wal is set, then batch commits are avoided. Summary: rocksdb uses batch commit to write to transaction log. But if disable wal is set, then writes to transaction log are anyways avoided. In this case, there is not much value-add to batch things, batching can cause unnecessary delays to Puts(). This patch avoids batching when disableWal is set. Test Plan: make check. I am running db_stress now. Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11763 23 July 2013, 21:22:57 UTC
f3baeec Adding filter_deletes to crash_tests run in jenkins Summary: filter_deletes options introduced in db_stress makes it drop Deletes on key if KeyMayExist(key) returns false on the key. code change was simple and tested so not wasting reviewer's time. Test Plan: maek crash_test; python tools/db_crashtest[1|2].py CC: dhruba, vamsi Differential Revision: https://reviews.facebook.net/D11769 23 July 2013, 20:49:16 UTC
bf66c10 Use KeyMayExist for WriteBatch-Deletes Summary: Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete. Added code to skip getting Table from disk if not already present in table_cache. Some renaming of variables. Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch. Changed KeyMayExist to not be pure virtual and provided a default implementation. Expanded unit-tests in db_test to check appropriately. Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1. Test Plan: db_stress;make check Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D11745 23 July 2013, 20:36:50 UTC
d364eea [RocksDB] Fix FindMinimumEmptyLevelFitting Summary: as title Test Plan: make check; Reviewers: xjin CC: leveldb Differential Revision: https://reviews.facebook.net/D11751 22 July 2013, 19:31:43 UTC
9ee6887 [RocksDB] Enable manual compaction to move files back to an appropriate level. Summary: As title. This diff added an option reduce_level to CompactRange. When set to true, it will try to move the files back to the minimum level sufficient to hold the data set. Note that the default is set to true now, just to excerise it in all existing tests. Will set the default to false before check-in, for backward compatibility. Test Plan: make check; Reviewers: dhruba, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D11553 19 July 2013, 23:20:36 UTC
e9b675b Fix memory leak in KeyMayExist test part of db_test Summary: NewBloomFilterPolicy call requires Delete to be called later on Test Plan: make; valgrind ./db_test Reviewers: haobo, dhruba, vamsi Differential Revision: https://reviews.facebook.net/D11667 12 July 2013, 23:58:57 UTC
2a98691 Make rocksdb-deletes faster using bloom filter Summary: Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving: 1. Put of delete type 2. Space in the db,and 3. Compaction time Test Plan: make all check; will run db_stress and db_bench and enhance unit-test once the basic design gets approved Reviewers: dhruba, haobo, vamsi Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11607 11 July 2013, 19:11:11 UTC
8a5341e Newbie code question Summary: This diff is more about my question when reading compaction codes, instead of a normal diff. I don't quite understand the logic here. Test Plan: I didn't do any test. If this is a bug, I will continue doing some test. Reviewers: haobo, dhruba, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11661 11 July 2013, 16:03:40 UTC
821889e Print complete statistics in db_stress Summary: db_stress should alos print complete statistics like db_bench. Needed this when I wanted to measure number of delete-IOs dropped due to CheckKeyMayExist to be introduced to rocksdb codebase later- to make deltes in rocksdb faster Test Plan: make db_stress;./db_stress --max_key=100 --ops_per_thread=1000 --statistics=1 Reviewers: sheki, dhruba, vamsi, haobo Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D11655 11 July 2013, 01:07:13 UTC
a8d5f8d [RocksDB] Remove old readahead options Summary: As title. Test Plan: make check; db_bench Reviewers: dhruba, MarkCallaghan CC: leveldb Differential Revision: https://reviews.facebook.net/D11643 09 July 2013, 18:22:33 UTC
9ba8278 [RocksDB] Provide contiguous sequence number even in case of write failure Summary: Replication logic would be simplifeid if we can guarantee that write sequence number is always contiguous, even if write failure occurs. Dhruba and I looked at the sequence number generation part of the code. It seems fixable. Note that if WAL was successful and insert into memtable was not, we would be in an unfortunate state. The approach in this diff is : IO error is expected and error status will be returned to client, sequence number will not be advanced; In-mem error is not expected and we panic. Test Plan: make check; db_stress Reviewers: dhruba, sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D11439 08 July 2013, 22:31:09 UTC
92ca816 [RocksDB] Support internal key/value dump for ldb Summary: This diff added a command 'idump' to ldb tool, which dumps the internal key/value pairs. It could be useful for diagnosis and estimating the per user key 'overhead'. Also cleaned up the ldb code a bit where I touched. Test Plan: make check; ldb idump Reviewers: emayanke, sheki, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11517 03 July 2013, 17:41:31 UTC
d56523c Update rocksdb version Summary: rocksdb-2.0 released to third party Test Plan: visual inspection Reviewers: dhruba, haobo, sheki Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11559 01 July 2013, 21:30:04 UTC
71e0f69 [RocksDB] Expose count for WriteBatch Summary: As title. Exposed a Count function that returns the number of updates in a batch. Could be handy for replication sequence number check. Test Plan: make check; Reviewers: emayanke, sheki, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11523 26 June 2013, 22:13:21 UTC
34ef873 Added stringappend_test back into the unit tests. Summary: With the Makefile now updated to correctly update all .o files, this should fix the issues recompiling stringappend_test. This should also fix the "segmentation-fault" that we were getting earlier. Now, stringappend_test should be clean, and I have added it back to the unit-tests. Also made some minor updates to the tests themselves. Test Plan: 1. make clean; make stringappend_test -j 32 (will test it by itself) 2. make clean; make all check -j 32 (to run all unit tests) 3. make clean; make release (test in release mode) 4. valgrind ./stringappend_test (valgrind tests) Reviewers: haobo, jpaton, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11505 26 June 2013, 18:41:13 UTC
6894a50 Updated "make clean" to remove all .o files Summary: The old Makefile did not remove ALL .o and .d files, but rather only those that happened to be in the root folder and one-level deep. This was causing issues when recompiling files in deeper folders. This fix now causes make clean to find ALL .o and .d files via a unix "find" command, and then remove them. Test Plan: make clean; make all -j 32; Reviewers: haobo, jpaton, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11493 25 June 2013, 18:30:37 UTC
b858da7 Simplify bucketing logic in ldb-ttl Summary: [start_time, end_time) is waht I'm following for the buckets and the whole time-range. Also cleaned up some code in db_ttl.* Not correcting the spacing/indenting convention for util/ldb_cmd.cc in this diff. Test Plan: python ldb_test.py, make ttl_test, Run mcrocksdb-backup tool, Run the ldb tool on 2 mcrocksdb production backups form sigmafio033.prn1 Reviewers: vamsi, haobo Reviewed By: vamsi Differential Revision: https://reviews.facebook.net/D11433 21 June 2013, 16:49:24 UTC
61f1baa Introducing timeranged scan, timeranged dump in ldb. Also the ability to count in time-batches during Dump Summary: Scan and Dump commands in ldb use iterator. We need to also print timestamp for ttl databases for debugging. For this I create a TtlIterator class pointer in these functions and assign it the value of Iterator pointer which actually points to t TtlIterator object, and access the new function ValueWithTS which can return TS also. Buckets feature for dump command: gives a count of different key-values in the specified time-range distributed across the time-range partitioned according to bucket-size. start_time and end_time are specified in unixtimestamp and bucket in seconds on the user-commandline Have commented out 3 ines from ldb_test.py so that the test does not break right now. It breaks because timestamp is also printed now and I have to look at wildcards in python to compare properly. Test Plan: python tools/ldb_test.py Reviewers: vamsi, dhruba, haobo, sheki Reviewed By: vamsi CC: leveldb Differential Revision: https://reviews.facebook.net/D11403 20 June 2013, 01:45:13 UTC
0f78fad [RocksDB] add back --mmap_read options to crashtest Summary: As title, now that db_stress supports --map_read properly Test Plan: make crash_test Reviewers: vamsi, emayanke, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11391 19 June 2013, 23:15:59 UTC
4deaa0d [RocksDB] Minor change to statistics.h Summary: as title, use initialize list so that lines fit in 80 chars. Test Plan: make check; Reviewers: sheki, dhruba Differential Revision: https://reviews.facebook.net/D11385 19 June 2013, 19:44:42 UTC
96be2c4 [RocksDB] Add mmap_read option for db_stress Summary: as title, also removed an incorrect assertion Test Plan: make check; db_stress --mmap_read=1; db_stress --mmap_read=0 Reviewers: dhruba, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D11367 19 June 2013, 17:28:32 UTC
5ef6bb8 [rocksdb][refactor] statistic printing code to one place Summary: $title Test Plan: db_bench --statistics=1 Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11373 19 June 2013, 03:28:41 UTC
09de7a3 Fix Zlib_Compress and Zlib_Uncompress Summary: Zlib_{Compress,Uncompress} did not handle very small input buffers properly. In addition, they did not call inflate/deflate until Z_STREAM_END was returned; it was possible for them to exit when only Z_OK had returned. This diff also fixes a bunch of lint errors. Test Plan: Run make check Reviewers: dhruba, sheki, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11301 18 June 2013, 23:57:42 UTC
3cc1af2 [RocksDB] Option for incremental sync Summary: This diff added an option to control the incremenal sync frequency. db_bench has a new flag bytes_per_sync for easy tuning exercise. Test Plan: make check; db_bench Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11295 18 June 2013, 22:00:32 UTC
79f4fd2 [Rocksdb] Simplify Printing code in db_bench Summary: simplify the printing code in db_bench use TickersMap and HistogramsNameMap introduced in previous diffs. Test Plan: ./db_bench --statistics=1 and see if all the statistics are printed Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11355 18 June 2013, 21:58:00 UTC
6acbe0f Compact multiple memtables before flushing to storage. Summary: Merge multiple multiple memtables in memory before writing it out to a file in L0. There is a new config parameter min_write_buffer_number_to_merge that specifies the number of write buffers that should be merged together to a single file in storage. The system will not flush wrte buffers to storage unless at least these many buffers have accumulated in memory. The default value of this new parameter is 1, which means that a write buffer will be immediately flushed to disk as soon it is ready. Test Plan: make check Differential Revision: https://reviews.facebook.net/D11241 18 June 2013, 21:28:04 UTC
f561b3a [Rocksdb] Rename one stat key from leveldb to rocksdb 17 June 2013, 21:33:05 UTC
836534d Enhance dbstress to allow specifying compaction trigger for L0. Summary: Rocksdb allos specifying the number of files in L0 that triggers compactions. Expose this api as a command line parameter for running db_stress. Test Plan: Run test Reviewers: sheki, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D11343 17 June 2013, 21:15:09 UTC
0012468 [rocksdb] do not trim range for level0 in manual compaction Summary: https://code.google.com/p/leveldb/issues/detail?can=1&q=178&colspec=ID%20Type%20Status%20Priority%20Milestone%20Owner%20Summary&id=178 Ported the solution as is to RocksDB. Test Plan: moved the unit test as manual_compaction_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11331 17 June 2013, 20:58:17 UTC
39ee47f [Rocksdb] Record WriteBlock Times into a histogram Summary: Add a histogram to track WriteBlock times Test Plan: db_bench and print Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11319 17 June 2013, 17:11:10 UTC
8926b72 Minor tweaks to StringAppend MergeOperator. Summary: I'm concerned about a random seg-fault that sometimes occurs when running stringappend_test. I will investigate further. First, I am removing stringappend_test from the regular release tests, and making some clean-ups to the code. Test Plan: 1. make stringappend_test 2. ./stringappend_test Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11313 14 June 2013, 23:44:39 UTC
bff718d [Rocksdb] Implement filluniquerandom Summary: Use a bit set to keep track of which random number is generated. Currently only supports single-threaded. All our perf tests are run with threads=1 Copied over bitset implementation from common/datastructures Test Plan: printed the generated keys, and verified all keys were present. Reviewers: MarkCallaghan, haobo, dhruba Reviewed By: MarkCallaghan CC: leveldb Differential Revision: https://reviews.facebook.net/D11247 14 June 2013, 23:17:56 UTC
2a52e1d Fix db_bench for release build. Test Plan: make release Reviewers: haobo, dhruba, jpaton Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11307 14 June 2013, 23:00:47 UTC
1afdf28 [RocksDB] Compaction Filter Cleanup Summary: This hopefully gives the right semantics to compaction filter. Will write a small wiki to explain the ideas. Test Plan: make check; db_stress Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11121 14 June 2013, 21:23:08 UTC
7a5f71d [Rocksdb] measure table open io in a histogram Summary: Table is setup for compaction using Table::SetupForCompaction. So read block calls can be differentiated b/w Gets/Compaction. Use this and measure times. Test Plan: db_bench --statistics=1 Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb, MarkCallaghan Differential Revision: https://reviews.facebook.net/D11217 14 June 2013, 00:25:09 UTC
0c2a2dd [RocksDB] Fix build. Removed deprecated option --mmap_read from db_crashtest Summary: As title Test Plan: db_crashtest Reviewers: vamsi, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D11271 13 June 2013, 20:48:35 UTC
778e179 [RocksDB] Sync file to disk incrementally Summary: During compaction, we sync the output files after they are fully written out. This causes unnecessary blocking of the compaction thread and burstiness of the write traffic. This diff simply asks the OS to sync data incrementally as they are written, on the background. The hope is that, at the final sync, most of the data are already on disk and we would block less on the sync call. Thus, each compaction runs faster and we could use fewer number of compaction threads to saturate IO. In addition, the write traffic will be smoothed out, hopefully reducing the IO P99 latency too. Some quick tests show 10~20% improvement in per thread compaction throughput. Combined with posix advice on compaction read, just 5 threads are enough to almost saturate the udb flash bandwidth for 800 bytes write only benchmark. What's more promising is that, with saturated IO, iostat shows average wait time is actually smoother and much smaller. For the write only test 800bytes test: Before the change: await occillate between 10ms and 3ms After the change: await ranges 1-3ms Will test against read-modify-write workload too, see if high read latency P99 could be resolved. Will introduce a parameter to control the sync interval in a follow up diff after cleaning up EnvOptions. Test Plan: make check; db_bench; db_stress Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11115 12 June 2013, 19:53:59 UTC
4985a9f [Rocksdb] [Multiget] Introduced multiget into db_bench Summary: Preliminary! Introduced the --use_multiget=1 and --keys_per_multiget=n flags for db_bench. Also updated and tested the ReadRandom() method to include an option to use multiget. By default, keys_per_multiget=100. Preliminary tests imply that multiget is at least 1.25x faster per key than regular get. Will continue adding Multiget for ReadMissing, ReadHot, RandomWithVerify, ReadRandomWriteRandom; soon. Will also think about ways to better verify benchmarks. Test Plan: 1. make db_bench 2. ./db_bench --benchmarks=fillrandom 3. ./db_bench --benchmarks=readrandom --use_existing_db=1 --use_multiget=1 --threads=4 --keys_per_multiget=100 4. ./db_bench --benchmarks=readrandom --use_existing_db=1 --threads=4 5. Verify ops/sec (and 1000000 of 1000000 keys found) Reviewers: haobo, MarkCallaghan, dhruba Reviewed By: MarkCallaghan CC: leveldb Differential Revision: https://reviews.facebook.net/D11127 12 June 2013, 19:42:21 UTC
bdf1085 [RocksDB] cleanup EnvOptions Summary: This diff simplifies EnvOptions by treating it as POD, similar to Options. - virtual functions are removed and member fields are accessed directly. - StorageOptions is removed. - Options.allow_readahead and Options.allow_readahead_compactions are deprecated. - Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite Test Plan: make check; db_stress Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11175 12 June 2013, 18:17:19 UTC
5679107 Completed the implementation and test cases for Redis API. Summary: Completed the implementation for the Redis API for Lists. The Redis API uses rocksdb as a backend to persistently store maps from key->list. It supports basic operations for appending, inserting, pushing, popping, and accessing a list, given its key. Test Plan: - Compile with: make redis_test - Test with: ./redis_test - Run all unit tests (for all rocksdb) with: make all check - To use an interactive REDIS client use: ./redis_test -m - To clean the database before use: ./redis_test -m -d Reviewers: haobo, dhruba, zshao Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D10833 11 June 2013, 18:19:49 UTC
e673d5d Do not submit multiple simultaneous seek-compaction requests. Summary: The code was such that if multi-threaded-compactions as well as seek compaction are enabled then it submits multiple compaction request for the same range of keys. This causes extraneous sst-files to accumulate at various levels. Test Plan: I am not able to write a very good unit test for this one but can easily reproduce this bug with 'dbstress' with the following options. batch=1;maxk=100000000;ops=100000000;ro=0;fm=2;bpl=10485760;of=500000; wbn=3; mbc=20; mb=2097152; wbs=4194304; dds=1; sync=0; t=32; bs=16384; cs=1048576; of=500000; ./db_stress --disable_seek_compaction=0 --mmap_read=0 --threads=$t --block_size=$bs --cache_size=$cs --open_files=$of --verify_checksum=1 --db=/data/mysql/leveldb/dbstress.dir --sync=$sync --disable_wal=1 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --target_file_size_multiplier=$fm --max_write_buffer_number=$wbn --max_background_compactions=$mbc --max_bytes_for_level_base=$bpl --reopen=$ro --ops_per_thread=$ops --max_key=$maxk --test_batches_snapshots=$batch Reviewers: leveldb, emayanke Reviewed By: emayanke Differential Revision: https://reviews.facebook.net/D11055 10 June 2013, 22:49:19 UTC
3c35eda Make Write API work for TTL databases Summary: Added logic to make another WriteBatch with Timestamps during the Write function execution in TTL class. Also expanded the ttl_test to test for it. Have done nothing for Merge for now. Test Plan: make ttl_test;./ttl_test Reviewers: haobo, vamsi, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D10827 10 June 2013, 22:23:44 UTC
1b69f1e Fix refering freed memory in earlier commit. Summary: Fix refering freed memory in earlier commit by https://reviews.facebook.net/D11181 Test Plan: make check Reviewers: haobo, sheki Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11193 10 June 2013, 22:08:13 UTC
4a8554d [Rocksdb] fix wrong assert Summary: the assert was wrong in D11145. Broke build Test Plan: make db_bench run it Reviewers: dhruba, haobo, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11187 10 June 2013, 20:14:14 UTC
c5de1b9 Print name of user comparator in LOG. Summary: The current code prints the name of the InternalKeyComparator in the log file. We would also like to print the name of the user-specified comparator for easier debugging. Test Plan: make check Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D11181 10 June 2013, 19:11:55 UTC
a4913c5 [rocksdb] names for all metrics provided in statistics.h Summary: Provide a map of histograms and ticker vs strings. Fb303 libraries can use this to provide the mapping. We will not have to duplicate the code during release. Test Plan: db_bench with statistics=1 Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11145 10 June 2013, 18:57:55 UTC
184343a Max_mem_compaction_level can have maximum value of num_levels-1 Summary: Without this files could be written out to a level greater than the maximum level possible and is the source of the segfaults that wormhole awas getting. The sequence of steps that was followed: 1. WriteLevel0Table was called when memtable was to be flushed for a file. 2. PickLevelForMemTableOutput was called to determine the level to which this file should be pushed. 3. PickLevelForMemTableOutput returned a wrong result because max_mem_compaction_level was equal to 2 even when num_levels was equal to 0. The fix to re-initialize max_mem_compaction_level based on num_levels passed seems correct. Test Plan: make all check; Also made a dummy file to mimic the wormhole-file behaviour which was causing the segfaults and found that the same segfault occurs without this change and not with this. Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11157 09 June 2013, 17:38:55 UTC
7a6bd8e Modifying options to db_stress when it is run with db_crashtest Summary: These extra options caught some bugs. Will be run via Jenkins now with the crash_test Test Plan: ./make crashtest Reviewers: dhruba, vamsi Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11151 09 June 2013, 16:58:46 UTC
3bb9449 [Fix whilebox crash test failure] Summary: I think the check for "error" that I added had caused false alarm. Fixed that. Test Plan: Revert Plan: OK Task ID: # Reviewers: emayanke, dhruba Reviewed By: emayanke Differential Revision: https://reviews.facebook.net/D11139 07 June 2013, 18:34:46 UTC
e982b5a [Rocksdb] measure table open io in a histogram Summary: as title Test Plan: db_bench --statistics=1 check for statistic. Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11109 07 June 2013, 17:02:28 UTC
8ef328e ctags and cscope support to Makefile Summary: Added a target to Makefile called 'tags' that runs ctags and cscope on all *.cc and *.h file Test Plan: Run 'make tags'. Then start vim and do :set tags=./tags :cs add cscope.out These commands should give you no error messages. You should then be able to access cscope db and ctags as normal in vim. Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D11103 07 June 2013, 16:13:40 UTC
5cf7a00 [Make most of the changes suggested by Aaron] Summary: $title Test Plan: Revert Plan: OK Task ID: # Reviewers: emayanke, akushner Reviewed By: akushner Differential Revision: https://reviews.facebook.net/D10923 07 June 2013, 00:31:45 UTC
db1f0cd Fixed valgrind errors 06 June 2013, 00:25:16 UTC
d8c7c45 Very basic Multiget and simple test cases. Summary: Implemented the MultiGet operator which takes in a list of keys and returns their associated values. Currently uses std::vector as its container data structure. Otherwise, it works identically to "Get". Test Plan: 1. make db_test ; compile it 2. ./db_test ; test it 3. make all check ; regress / run all tests 4. make release ; (optional) compile with release settings Reviewers: haobo, MarkCallaghan, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10875 05 June 2013, 18:22:38 UTC
d91b42e [Rocksdb] Measure all FSYNC/SYNC times Summary: Add stop watches around all sync calls. Test Plan: db_bench check if respective histograms are printed Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11073 05 June 2013, 18:06:21 UTC
ee522d0 [Rocksdb] Log on disable/enable file deletions Summary: as title Test Plan: compile Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11085 05 June 2013, 17:48:24 UTC
043573b [RocksDB] Include 64bit random number generator Summary: As title. Test Plan: make check; Reviewers: chip, MarkCallaghan CC: leveldb Differential Revision: https://reviews.facebook.net/D11061 04 June 2013, 20:52:27 UTC
d9f538e Improve output for GetProperty('leveldb.stats') Summary: Display separate values for read, write & total compaction IO. Display compaction amplification and write amplification. Add similar values for the period since the last call to GetProperty. Results since the server started are reported as "cumulative" stats. Results since the last call to GetProperty are reported as "interval" stats. Level Files Size(MB) Time(sec) Read(MB) Write(MB) Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s) Rn Rnp1 Wnp1 NewW Count Ln-stall ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0 7 13 21 0 211 0 0 211 0.0 0.0 10.1 0 0 0 0 113 0.0 1 79 157 88 993 989 198 795 194 9.0 11.3 11.2 106 405 502 97 14 0.0 2 19 36 5 63 63 37 27 36 2.4 12.3 12.2 19 14 32 18 12 0.0 >>>>>>>>>>>>>>>>>>>>>>>>> text below has been is new and/or reformatted Uptime(secs): 122.2 total, 0.9 interval Compaction IO cumulative (GB): 0.21 new, 1.03 read, 1.23 write, 2.26 read+write Compaction IO cumulative (MB/sec): 1.7 new, 8.6 read, 10.3 write, 19.0 read+write Amplification cumulative: 6.0 write, 11.0 compaction Compaction IO interval (MB): 5.59 new, 0.00 read, 5.59 write, 5.59 read+write Compaction IO interval (MB/sec): 6.5 new, 0.0 read, 6.5 write, 6.5 read+write Amplification interval: 1.0 write, 1.0 compaction >>>>>>>>>>>>>>>>>>>>>>>> text above is new and/or reformatted Stalls(secs): 90.574 level0_slowdown, 0.000 level0_numfiles, 10.165 memtable_compaction, 0.000 leveln_slowdown Task ID: # Blame Rev: Test Plan: make check, run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin *PUBLIC* platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11049 04 June 2013, 03:24:49 UTC
2b1fb5b [RocksDB] Add score column to leveldb.stats Summary: Added the 'score' column to the compaction stats output, which shows the level total size devided by level target size. Could be useful when monitoring compaction decisions... Test Plan: make check; db_bench Reviewers: dhruba CC: leveldb, MarkCallaghan Differential Revision: https://reviews.facebook.net/D11025 04 June 2013, 00:38:27 UTC
d897d33 [RocksDB] Introduce Fast Mutex option Summary: This diff adds an option to specify whether PTHREAD_MUTEX_ADAPTIVE_NP will be enabled for the rocksdb single big kernel lock. db_bench also have this option now. Quickly tested 8 thread cpu bound 100 byte random read. No fast mutex: ~750k/s ops With fast mutex: ~880k/s ops Test Plan: make check; db_bench; db_stress Reviewers: dhruba CC: MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D11031 02 June 2013, 06:11:34 UTC
ab8d2f6 [RocksDB] [Performance] Allow different posix advice to be applied to the same table file Summary: Current posix advice implementation ties up the access pattern hint with the creation of a file. It is not possible to apply different advice for different access (random get vs compaction read), without keeping two open files for the same table. This patch extended the RandomeAccessFile interface to accept new access hint at anytime. Particularly, we are able to set different access hint on the same table file based on when/how the file is used. Two options are added to set the access hint, after the file is first opened and after the file is being compacted. Test Plan: make check; db_stress; db_bench Reviewers: dhruba Reviewed By: dhruba CC: MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D10905 31 May 2013, 02:08:44 UTC
2df65c1 [RocksDB] Dump counters and histogram data periodically with compaction stats Summary: As title Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10995 29 May 2013, 19:00:18 UTC
a8d807e Record the number of open db iterators. Summary: Enhance the statitics to report the number of open db iterators. Test Plan: make check Reviewers: haobo, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D10983 29 May 2013, 15:47:08 UTC
fb684da [RocksDB] Fix CorruptionTest Summary: Overriding block_size_deviation to zero, so that CorruptionTest can pass. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D10977 28 May 2013, 19:36:42 UTC
4c47d8f add block deviation option to terminate a block before it exceeds block_size Summary: a new option block_size_deviation is added. Test Plan: run db_test and db_bench Reviewers: dhruba, haobo Reviewed By: haobo Differential Revision: https://reviews.facebook.net/D10821 24 May 2013, 23:21:52 UTC
4b29651 add block deviation option to terminate a block before it exceeds block_size Summary: a new option block_size_deviation is added. Test Plan: run db_test and db_bench Reviewers: dhruba, haobo Reviewed By: haobo Differential Revision: https://reviews.facebook.net/D10821 24 May 2013, 22:52:49 UTC
ef15b9d [RocksDB] Fix MaybeDumpStats Summary: MaybeDumpStats was causing lock problem Test Plan: make check; db_stress Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D10935 24 May 2013, 22:43:16 UTC
0e879c9 [RocksDB] dump leveldb.stats periodically in LOG file. Summary: Added an option stats_dump_period_sec to dump leveldb.stats to LOG periodically for diagnosis. By defauly, it's set to a very big number 3600 (1 hour). Test Plan: make check; Reviewers: dhruba Reviewed By: dhruba CC: leveldb, zshao Differential Revision: https://reviews.facebook.net/D10761 23 May 2013, 23:56:59 UTC
2654186 The max size of the write buffer size can be 64 GB. Summary: There was an artifical limit on the size of the write buffer size. Test Plan: make check Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D10911 23 May 2013, 22:00:27 UTC
898f793 Fix valgrind errors introduced by https://reviews.facebook.net/D10863 Summary: The valgrind errors were in the unit tests where we change the number of levels of a database using internal methods. Test Plan: valgrind ./reduce_levels_test valgrind ./db_test Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D10893 23 May 2013, 19:14:41 UTC
c2e2460 [RocksDB] Expose DBStatistics Summary: Make Statistics usable by client Test Plan: make check; db_bench Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D10899 23 May 2013, 18:49:38 UTC
760dd47 [Kill randomly at various points in source code for testing] Summary: This is initial version. A few ways in which this could be extended in the future are: (a) Killing from more places in source code (b) Hashing stack and using that hash in determining whether to crash. This is to avoid crashing more often at source lines that are executed more often. (c) Raising exceptions or returning errors instead of killing Test Plan: This whole thing is for testing. Here is part of output: python2.7 tools/db_crashtest2.py -d 600 Running db_stress db_stress retncode -15 output LevelDB version : 1.5 Number of threads : 32 Ops per thread : 10000000 Read percentage : 50 Write-buffer-size : 4194304 Delete percentage : 30 Max key : 1000 Ratio #ops/#keys : 320000 Num times DB reopens: 0 Batches/snapshots : 1 Purge redundant % : 50 Num keys per lock : 4 Compression : snappy ------------------------------------------------ No lock creation because test_batches_snapshots set 2013/04/26-17:55:17 Starting database operations Created bg thread 0x7fc1f07ff700 ... finished 60000 ops Running db_stress db_stress retncode -15 output LevelDB version : 1.5 Number of threads : 32 Ops per thread : 10000000 Read percentage : 50 Write-buffer-size : 4194304 Delete percentage : 30 Max key : 1000 Ratio #ops/#keys : 320000 Num times DB reopens: 0 Batches/snapshots : 1 Purge redundant % : 50 Num keys per lock : 4 Compression : snappy ------------------------------------------------ Created bg thread 0x7ff0137ff700 No lock creation because test_batches_snapshots set 2013/04/26-17:56:15 Starting database operations ... finished 90000 ops Revert Plan: OK Task ID: #2252691 Reviewers: dhruba, emayanke Reviewed By: emayanke CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D10581 22 May 2013, 01:21:49 UTC
87d0af1 [RocksDB] Introduce an option to skip log error on recovery Summary: Currently, with paranoid_check on, DB::Open will fail on any log read error on recovery. If client is ok with losing most recent updates, we could simply skip those errors. However, it's important to introduce an additional flag, so that paranoid_check can still guard against more serious problems. Test Plan: make check; db_stress Reviewers: dhruba, emayanke Reviewed By: emayanke CC: leveldb, emayanke Differential Revision: https://reviews.facebook.net/D10869 21 May 2013, 21:30:36 UTC
d1aaaf7 Ability to set different size fanout multipliers for every level. Summary: There is an existing field Options.max_bytes_for_level_multiplier that sets the multiplier for the size of each level in the database. This patch introduces the ability to set different multipliers for every level in the database. The size of a level is determined by using both max_bytes_for_level_multiplier as well as the per-level fanout. size of level[i] = size of level[i-1] * max_bytes_for_level_multiplier * fanout[i-1] The default value of fanout is 1, so that it is backward compatible. Test Plan: make check Reviewers: haobo, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D10863 21 May 2013, 20:50:20 UTC
c3c13db [RocksDB] [Performance Bug] MemTable::Get Slow Summary: The merge operator diff introduced a performance problem in MemTable::Get. An exit condition is missed when the current key does not match the user key. This could lead to full memtable scan if the user key is not found. Test Plan: make check; db_bench Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10851 21 May 2013, 20:40:38 UTC
3827403 Check to db_stress to not allow disable_wal and reopens set together Summary: db can't reopen safely with disable_wal set! Test Plan: make db_stress; run db_stress with disable_wal and reopens set and see error Reviewers: dhruba, vamsi Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10857 21 May 2013, 18:49:29 UTC
839f6db [RocksDB] Fix PosixLogger and AutoRollLogger thread safety Summary: PosixLogger and AutoRollLogger do not seem to be thread safe. For PosixLogger, log_size_ is not atomically updated. For AutoRollLogger, the underlying logger_ might be deleted by one thread while still being accessed by another. Test Plan: make check Reviewers: kailiu, dhruba, heyongqiang Reviewed By: kailiu CC: leveldb, zshao, sheki Differential Revision: https://reviews.facebook.net/D9699 21 May 2013, 18:39:44 UTC
15ccd10 A nit to db_stress to terminate generated value at proper length Summary: Will help while debugging if the generated value is truncated at proper length. Test Plan: make db_stress;/db_stress --max_key=10000 --db=/tmp/mcr --threads=1 --ops_per_thread=10000 Reviewers: dhruba, vamsi Reviewed By: vamsi Differential Revision: https://reviews.facebook.net/D10845 21 May 2013, 01:13:32 UTC
8a59ed9 [RockdDB] fix build Summary: assert => ASSERT_TRUE Test Plan: make release; make check Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10839 17 May 2013, 23:15:44 UTC
e117430 [RocksDB] Simplify StopWatch implementation Summary: Make stop watch a simple implementation, instead of subclass of a virtual class Allocate stop watches off the stack instead of heap. Code is more terse now. Test Plan: make all check, db_bench with --statistics=1 Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D10809 17 May 2013, 17:55:34 UTC
back to top