https://github.com/facebook/rocksdb

sort by:
Revision Author Date Message Commit Date
9e47084 Bump version to 5.8.6 20 November 2017, 21:51:43 UTC
36074ba Enable cacheline_aligned_alloc() to allocate from jemalloc if enabled. Summary: Reuse WITH_JEMALLOC option in preparation for module search unification. Move jemalloc overrides into a separate .cc Remote obsolete JEMALLOC_NOINIT option. Closes https://github.com/facebook/rocksdb/pull/3078 Differential Revision: D6174826 Pulled By: yiwu-arbug fbshipit-source-id: 9970a0289b4490272d15853920d9d7531af91140 20 November 2017, 21:45:23 UTC
aa00523 Add -DPORTABLE=1 to MSVC CI build Summary: Add -DPORTABLE=1 port::cacheline_aligned_alloc() has arguments swapped which prevents every single test from running. Closes https://github.com/facebook/rocksdb/pull/2815 Differential Revision: D5751661 Pulled By: siying fbshipit-source-id: e0857d6e138ec46035b3c23d7c3c751901a0a4a0 20 November 2017, 21:45:10 UTC
cf2b982 Bump version to 5.8.5 14 November 2017, 18:38:22 UTC
e8c9350 Blob DB: not using PinnableSlice move assignment Summary: The current implementation of PinnableSlice move assignment have an issue #3163. We are moving away from it instead of try to get the move assignment right, since it is too tricky. Closes https://github.com/facebook/rocksdb/pull/3164 Differential Revision: D6319201 Pulled By: yiwu-arbug fbshipit-source-id: 8f3279021f3710da4a4caa14fd238ed2df902c48 14 November 2017, 18:37:50 UTC
4907d24 Bump version to 5.8.4 09 November 2017, 05:28:42 UTC
5d928c7 Blob DB: Fix race condition between flush and write Summary: A race condition will happen when: * a user thread writes a value, but it hits the write stop condition because there are too many un-flushed memtables, while holding blob_db_impl.write_mutex_. * Flush is triggered and call flush begin listener and try to acquire blob_db_impl.write_mutex_. Fixing it. Closes https://github.com/facebook/rocksdb/pull/3149 Differential Revision: D6279805 Pulled By: yiwu-arbug fbshipit-source-id: 0e3c58afb78795ebe3360a2c69e05651e3908c40 09 November 2017, 05:28:02 UTC
725bb9d Blob DB: Fix release build Summary: `compression` shadow the method name in `BlobFile`. Rename it. Closes https://github.com/facebook/rocksdb/pull/3148 Differential Revision: D6274498 Pulled By: yiwu-arbug fbshipit-source-id: 7d293596530998b23b6b8a8940f983f9b6343a98 09 November 2017, 05:27:53 UTC
b7367fe Bump version to 5.8.3 08 November 2017, 06:59:14 UTC
13b2a9b Blob DB: use compression in file header instead of global options Summary: To fix the issue of failing to decompress existing value after reopen DB with a different compression settings. Closes https://github.com/facebook/rocksdb/pull/3142 Differential Revision: D6267260 Pulled By: yiwu-arbug fbshipit-source-id: c7cf7f3e33b0cd25520abf4771cdf9180cc02a5f 08 November 2017, 06:57:51 UTC
5dc70a1 Fix PinnableSlice move assignment Summary: After move assignment, we need to re-initialized the moved PinnableSlice. Also update blob_db_impl.cc to not reuse the moved PinnableSlice since it is supposed to be in an undefined state after move. Closes https://github.com/facebook/rocksdb/pull/3127 Differential Revision: D6238585 Pulled By: yiwu-arbug fbshipit-source-id: bd99f2e37406c4f7de160c7dee6a2e8126bc224e 07 November 2017, 21:01:49 UTC
9019e91 dynamically change current memtable size Summary: Previously setting `write_buffer_size` with `SetOptions` would only apply to new memtables. An internal user wanted it to take effect immediately, instead of at an arbitrary future point, to prevent OOM. This PR makes the memtable's size mutable, and makes `SetOptions()` mutate it. There is one case when we preserve the old behavior, which is when memtable prefix bloom filter is enabled and the user is increasing the memtable's capacity. That's because the prefix bloom filter's size is fixed and wouldn't work as well on a larger memtable. Closes https://github.com/facebook/rocksdb/pull/3119 Differential Revision: D6228304 Pulled By: ajkr fbshipit-source-id: e44bd9d10a5f8c9d8c464bf7436070bb3eafdfc9 03 November 2017, 21:16:57 UTC
7f1815c Bump version to 5.8.2 03 November 2017, 19:25:30 UTC
2584a18 Blob DB: Fix BlobDBTest::SnapshotAndGarbageCollection asan failure Summary: Fix unreleased snapshot at the end of the test. Closes https://github.com/facebook/rocksdb/pull/3126 Differential Revision: D6232867 Pulled By: yiwu-arbug fbshipit-source-id: 651ca3144fc573ea2ab0ab20f0a752fb4a101d26 03 November 2017, 19:14:12 UTC
17f67b5 PinnableSlice move assignment Summary: Allow `std::move(pinnable_slice)`. Closes https://github.com/facebook/rocksdb/pull/2997 Differential Revision: D6036782 Pulled By: yiwu-arbug fbshipit-source-id: 583fb0419a97e437ff530f4305822341cd3381fa 03 November 2017, 19:14:07 UTC
6fb56c5 Blob DB: Add compaction filter to remove expired blob index entries Summary: After adding expiration to blob index in #3066, we are now able to add a compaction filter to cleanup expired blob index entries. Closes https://github.com/facebook/rocksdb/pull/3090 Differential Revision: D6183812 Pulled By: yiwu-arbug fbshipit-source-id: 9cb03267a9702975290e758c9c176a2c03530b83 03 November 2017, 06:42:39 UTC
f90ced9 Blob DB: fix snapshot handling Summary: Blob db will keep blob file if data in the file is visible to an active snapshot. Before this patch it checks whether there is an active snapshot has sequence number greater than the earliest sequence in the file. This is problematic since we take snapshot on every read, if it keep having reads, old blob files will not be cleanup. Change to check if there is an active snapshot falls in the range of [earliest_sequence, obsolete_sequence) where obsolete sequence is 1. if data is relocated to another file by garbage collection, it is the latest sequence at the time garbage collection finish 2. otherwise, it is the latest sequence of the file Closes https://github.com/facebook/rocksdb/pull/3087 Differential Revision: D6182519 Pulled By: yiwu-arbug fbshipit-source-id: cdf4c35281f782eb2a9ad6a87b6727bbdff27a45 03 November 2017, 06:40:01 UTC
632f36d Blob DB: option to enable garbage collection Summary: Add an option to enable/disable auto garbage collection, where we keep counting how many keys have been evicted by either deletion or compaction and decide whether to garbage collect a blob file. Default disable auto garbage collection for now since the whole logic is not fully tested and we plan to make major change to it. Closes https://github.com/facebook/rocksdb/pull/3117 Differential Revision: D6224756 Pulled By: yiwu-arbug fbshipit-source-id: cdf53bdccec96a4580a2b3a342110ad9e8864dfe 03 November 2017, 06:39:50 UTC
11bacd5 Blob DB: Fix flaky BlobDBTest::GCExpiredKeyWhileOverwriting test Summary: The test intent to wait until key being overwritten until proceed with garbage collection. It failed to wait for `PutUntil` finally finish. Fixing it. Closes https://github.com/facebook/rocksdb/pull/3116 Differential Revision: D6222833 Pulled By: yiwu-arbug fbshipit-source-id: fa9b57a772b92a66cf250b44e7975c43f62f45c5 03 November 2017, 06:39:36 UTC
f98efcb Blob DB: Evict oldest blob file when close to blob db size limit Summary: Evict oldest blob file and put it in obsolete_files list when close to blob db size limit. The file will be delete when the `DeleteObsoleteFiles` background job runs next time. For now I set `kEvictOldestFileAtSize` constant, which controls when to evict the oldest file, at 90%. It could be tweaked or made into an option if really needed; I didn't want to expose it as an option pre-maturely as there are already too many :) . Closes https://github.com/facebook/rocksdb/pull/3094 Differential Revision: D6187340 Pulled By: sagar0 fbshipit-source-id: 687f8262101b9301bf964b94025a2fe9d8573421 03 November 2017, 06:39:21 UTC
c1e99ed Blob DB: cleanup unused options Summary: * cleanup num_concurrent_simple_blobs. We don't do concurrent writes (by taking write_mutex_) so it doesn't make sense to have multiple non TTL files open. We can revisit later when we want to improve writes. * cleanup eviction callback. we don't have plan to use it now. * rename s/open_simple_blob_files_/open_non_ttl_file_/ and s/open_blob_files_/open_ttl_files_/ to avoid confusion. Closes https://github.com/facebook/rocksdb/pull/3088 Differential Revision: D6182598 Pulled By: yiwu-arbug fbshipit-source-id: 99e6f5e01fa66d31309cdb06ce48502464bac6ad 03 November 2017, 06:39:05 UTC
ffc3c62 Blob DB: Initialize all fields in Blob Header, Footer and Record structs Summary: Fixing un-itializations caught by valgrind. Closes https://github.com/facebook/rocksdb/pull/3103 Differential Revision: D6200195 Pulled By: sagar0 fbshipit-source-id: bf35a3fb03eb1d308e4c5ce30dee1e345d7b03b3 03 November 2017, 06:38:20 UTC
9e82540 Blob DB: update blob file format Summary: Changing blob file format and some code cleanup around the change. The change with blob log format are: * Remove timestamp field in blob file header, blob file footer and blob records. The field is not being use and often confuse with expiration field. * Blob file header now come with column family id, which always equal to default column family id. It leaves room for future support of column family. * Compression field in blob file header now is a standalone byte (instead of compact encode with flags field) * Blob file footer now come with its own crc. * Key length now being uint64_t instead of uint32_t * Blob CRC now checksum both key and value (instead of value only). * Some reordering of the fields. The list of cleanups: * Better inline comments in blob_log_format.h * rename ttlrange_t and snrange_t to ExpirationRange and SequenceRange respectively. * simplify blob_db::Reader * Move crc checking logic to inside blob_log_format.cc Closes https://github.com/facebook/rocksdb/pull/3081 Differential Revision: D6171304 Pulled By: yiwu-arbug fbshipit-source-id: e4373e0d39264441b7e2fbd0caba93ddd99ea2af 03 November 2017, 06:37:56 UTC
d66bb21 Blob DB: Inline small values in base DB Summary: Adding the `min_blob_size` option to allow storing small values in base db (in LSM tree) together with the key. The goal is to improve performance for small values, while taking advantage of blob db's low write amplification for large values. Also adding expiration timestamp to blob index. It will be useful to evict stale blob indexes in base db by adding a compaction filter. I'll work on the compaction filter in future patches. See blob_index.h for the new blob index format. There are 4 cases when writing a new key: * small value w/o TTL: put in base db as normal value (i.e. ValueType::kTypeValue) * small value w/ TTL: put (type, expiration, value) to base db. * large value w/o TTL: write value to blob log and put (type, file, offset, size, compression) to base db. * large value w/TTL: write value to blob log and put (type, expiration, file, offset, size, compression) to base db. Closes https://github.com/facebook/rocksdb/pull/3066 Differential Revision: D6142115 Pulled By: yiwu-arbug fbshipit-source-id: 9526e76e19f0839310a3f5f2a43772a4ad182cd0 03 November 2017, 06:37:16 UTC
05d5c57 Return write error on reaching blob dir size limit Summary: I found that we continue accepting writes even when the blob db goes beyond the configured blob directory size limit. Now, we return an error for writes on reaching `blob_dir_size` limit and if `is_fifo` is set to false. (We cannot just drop any file when `is_fifo` is true.) Deleting the oldest file when `is_fifo` is true will be handled in a later PR. Closes https://github.com/facebook/rocksdb/pull/3060 Differential Revision: D6136156 Pulled By: sagar0 fbshipit-source-id: 2f11cb3f2eedfa94524fbfa2613dd64bfad7a23c 03 November 2017, 06:37:16 UTC
2b8893b Blob DB: Store blob index as kTypeBlobIndex in base db Summary: Blob db insert blob index to base db as kTypeBlobIndex type, to tell apart values written by plain rocksdb or blob db. This is to make it possible to migrate from existing rocksdb to blob db. Also with the patch blob db garbage collection get away from OptimisticTransaction. Instead it use a custom write callback to achieve similar behavior as OptimisticTransaction. This is because we need to pass the is_blob_index flag to DBImpl::Get but OptimisticTransaction don't support it. Closes https://github.com/facebook/rocksdb/pull/3000 Differential Revision: D6050044 Pulled By: yiwu-arbug fbshipit-source-id: 61dc72ab9977625e75f78cd968e7d8a3976e3632 03 November 2017, 06:37:07 UTC
419b93c Blob DB: not writing sequence number as blob record footer Summary: Previously each time we write a blob we write blog_record_header + key + value + blob_record_footer to blob log. The footer only contains a sequence and a crc for the sequence number. The sequence number was used in garbage collection to verify the value is recent. After #2703 we moved to use optimistic transaction and no longer use sequence number from the footer. Remove the footer altogether. There's another usage of sequence number and we are keeping it: Each blob log file keep track of sequence number range of keys in it, and use it to check if it is reference by a snapshot, before being deleted. Closes https://github.com/facebook/rocksdb/pull/3005 Differential Revision: D6057585 Pulled By: yiwu-arbug fbshipit-source-id: d6da53c457a316e9723f359a1b47facfc3ffe090 03 November 2017, 06:07:27 UTC
8afb003 fix lite build Summary: * make `checksum_type_string_map` available for lite * comment out `FilesPerLevel` in lite mode. * travis and legocastle lite build also build `all` target and run tests Closes https://github.com/facebook/rocksdb/pull/3015 Differential Revision: D6069822 Pulled By: yiwu-arbug fbshipit-source-id: 9fe92ac220e711e9e6ed4e921bd25ef4314796a0 03 November 2017, 06:07:03 UTC
dded348 Blob DB: Move BlobFile definition to a separate file Summary: simply move BlobFile definition from blob_db_impl.h to blob_file.h. Closes https://github.com/facebook/rocksdb/pull/3002 Differential Revision: D6050143 Pulled By: yiwu-arbug fbshipit-source-id: a8fb6e094fe39bdeace6279569834bc65aa64a34 03 November 2017, 06:04:14 UTC
3747361 add GetLiveFiles and GetLiveFilesMetaData for BlobDB Summary: Closes https://github.com/facebook/rocksdb/pull/2976 Differential Revision: D5994759 Pulled By: miasantreble fbshipit-source-id: 985c31dccb957cb970c302f813cd07a1e8cb6438 03 November 2017, 06:03:54 UTC
8cff6e9 Enable WAL for blob index Summary: Enabled WAL, during GC, for blob index which is stored on regular RocksDB. Closes https://github.com/facebook/rocksdb/pull/2975 Differential Revision: D5997384 Pulled By: sagar0 fbshipit-source-id: b76c1487d8b5be0e36c55e8d77ffe3d37d63d85b 03 November 2017, 06:03:17 UTC
c293472 Add ValueType::kTypeBlobIndex Summary: Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to 1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex. 2. Make rocksdb able to detect if the db contains value written by blob db, if so return error. 3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type). The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob(). Changes on blob db side will be in a separate patch. Closes https://github.com/facebook/rocksdb/pull/2886 Differential Revision: D5838431 Pulled By: yiwu-arbug fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca 03 November 2017, 06:02:50 UTC
eae53de Make it explicit blob db doesn't support CF Summary: Blob db doesn't currently support column families. Return NotSupported status explicitly. Closes https://github.com/facebook/rocksdb/pull/2825 Differential Revision: D5757438 Pulled By: yiwu-arbug fbshipit-source-id: 44de9408fd032c98e8ae337d4db4ed37169bd9fa 03 November 2017, 05:29:42 UTC
65aec19 Fix memory leak on blob db open Summary: Fixes #2820 Closes https://github.com/facebook/rocksdb/pull/2826 Differential Revision: D5757527 Pulled By: yiwu-arbug fbshipit-source-id: f495b63700495aeaade30a1da5e3675848f3d72f 03 November 2017, 00:37:40 UTC
30b38c9 TableProperty::oldest_key_time defaults to 0 Summary: We don't propagate TableProperty::oldest_key_time on compaction and just write the default value to SST files. It is more natural to default the value to 0. Also revert db_sst_test back to before #2842. Closes https://github.com/facebook/rocksdb/pull/3079 Differential Revision: D6165702 Pulled By: yiwu-arbug fbshipit-source-id: ca3ce5928d96ae79a5beb12bb7d8c640a71478a0 27 October 2017, 22:29:11 UTC
2879f4b Bump version to 5.8.1 24 October 2017, 05:12:51 UTC
88595c8 Add DB::Properties::kEstimateOldestKeyTime Summary: With FIFO compaction we would like to get the oldest data time for monitoring. The problem is we don't have timestamp for each key in the DB. As an approximation, we expose the earliest of sst file "creation_time" property. My plan is to override the property with a more accurate value with blob db, where we actually have timestamp. Closes https://github.com/facebook/rocksdb/pull/2842 Differential Revision: D5770600 Pulled By: yiwu-arbug fbshipit-source-id: 03833c8f10bbfbee62f8ea5c0d03c0cafb5d853a 24 October 2017, 05:12:30 UTC
266ac24 Bumping version to 5.8 Summary: Closes https://github.com/facebook/rocksdb/pull/2738 Differential Revision: D5736261 Pulled By: maysamyabandeh fbshipit-source-id: 49d27e9ccd786c4056a3d586a060fe460ea883ac 30 August 2017, 21:26:12 UTC
64185c2 update HISTORY.md for DeleteRange bug fix Summary: fixed in #2799 Closes https://github.com/facebook/rocksdb/pull/2805 Differential Revision: D5734324 Pulled By: ajkr fbshipit-source-id: a285d4e84bf1018dc2257fd6c3e7c075a7243263 30 August 2017, 05:26:47 UTC
e83d6a0 Not using aligned_alloc with gcc4 + asan Summary: GCC < 5 + ASAN does not instrument aligned_alloc, which can make ASAN report false-positive with "free on address which was not malloc" error. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61693 Also suppress leak warning with LRUCache::DisownData(). Closes https://github.com/facebook/rocksdb/pull/2783 Differential Revision: D5696465 Pulled By: yiwu-arbug fbshipit-source-id: 87c607c002511fa089b18cc35e24909bee0e74b4 30 August 2017, 04:56:02 UTC
0980dc6 Fix wrong smallest key of delete range tombstones Summary: Since tombstones are not stored in order, we may get a wrong smallest key if we only consider the first added tombstone. Check https://github.com/facebook/rocksdb/issues/2752 for more details. Closes https://github.com/facebook/rocksdb/pull/2799 Differential Revision: D5728217 Pulled By: ajkr fbshipit-source-id: 4a53edb0ca80d2a9fcf10749e52d47d57d6417d3 30 August 2017, 01:41:35 UTC
b767972 avoid use-after-move error Summary: * db/range_del_aggregator.cc (AddTombstone): Avoid a potential use-after-move bug. The original code would both use and move `tombstone` in a context where the order of those operations is not specified. The fix is to perform the use on a new, preceding statement. Author: meyering Closes https://github.com/facebook/rocksdb/pull/2796 Differential Revision: D5721163 Pulled By: ajkr fbshipit-source-id: a1d328d6a77a17c6425e8069860a202e615e2f48 29 August 2017, 19:11:56 UTC
c417442 CMake: Fix formatting Summary: This if followup of #2195. Closes https://github.com/facebook/rocksdb/pull/2772 Differential Revision: D5722495 Pulled By: sagar0 fbshipit-source-id: 169d0cef53b03056ea7b9454954a35c707a67d52 29 August 2017, 01:41:29 UTC
c21ea8f CMake: Add support for CMake packages Summary: Adds support for CMake packages: https://cmake.org/cmake/help/v3.9/manual/cmake-packages.7.html#creating-packages. This allow using RocksDB by other CMake projects this way: ``` cmake_minimum_required(VERSION 3.5) project(rdbt) find_package(RocksDB CONFIG) add_executable(rdbt test.cpp) target_link_libraries(rdbt PRIVATE RocksDB::rocksdb) ``` Closes https://github.com/facebook/rocksdb/pull/2773 Differential Revision: D5722587 Pulled By: sagar0 fbshipit-source-id: 0d90dc4a77b42a617cdbe1348a370e719c282b87 29 August 2017, 00:14:37 UTC
5444345 add Erlang to the list of language bindings Summary: small edit of the language binding file to add the Erlang binding. Closes https://github.com/facebook/rocksdb/pull/2797 Differential Revision: D5722235 Pulled By: sagar0 fbshipit-source-id: 8ecd74996dad4cac19666783256cfa4d9ce09160 28 August 2017, 23:43:16 UTC
2972a70 Minor updates to FlushWAL blog Summary: Closes https://github.com/facebook/rocksdb/pull/2792 Differential Revision: D5715365 Pulled By: maysamyabandeh fbshipit-source-id: 0837b93ea1d4b0a08dfb3cd0d1feb6e098ef26a4 27 August 2017, 14:41:02 UTC
fbfa3e7 WriteAtPrepare: Efficient read from snapshot list Summary: Divide the old snapshots to two lists: a few that fit into a cached array and the rest in a vector, which is expected to be empty in normal cases. The former is to optimize concurrent reads from snapshots without requiring locks. It is done by an array of std::atomic, from which std::memory_order_acquire reads are compiled to simple read instructions in most of the x86_64 architectures. Closes https://github.com/facebook/rocksdb/pull/2758 Differential Revision: D5660504 Pulled By: maysamyabandeh fbshipit-source-id: 524fcf9a8e7f90a92324536456912a99aaa6740c 26 August 2017, 08:00:38 UTC
b01f426 Blog post for FlushWAL Summary: Closes https://github.com/facebook/rocksdb/pull/2790 Differential Revision: D5711609 Pulled By: maysamyabandeh fbshipit-source-id: ea103dac013c0a6a031834541ad67e7d95a80fe8 25 August 2017, 23:11:57 UTC
503db68 make blob file close synchronous Summary: Fixing flaky blob_db_test. To close a blob file, blob db used to add a CloseSeqWrite job to the background thread to close it. Changing file close to be synchronous in order to simplify logic, and fix flaky blob_db_test. Closes https://github.com/facebook/rocksdb/pull/2787 Differential Revision: D5699387 Pulled By: yiwu-arbug fbshipit-source-id: dd07a945cd435cd3808fce7ee4ea57817409474a 25 August 2017, 17:41:49 UTC
3c840d1 Allow DB reopen with reduced options.num_levels Summary: Allow user to reduce number of levels in LSM by issue a full CompactRange() and put the result in a lower level, and then reopen DB with reduced options.num_levels. Previous this will fail on reopen on when recovery replaying the previous MANIFEST and found a historical file was on a higher level than the new options.num_levels. The workaround was after CompactRange(), reopen the DB with old num_levels, which will create a new MANIFEST, and then reopen the DB again with new num_levels. This patch relax the check of levels during recovery. It allows DB to open if there was a historical file on level > options.num_levels, but was also deleted. Closes https://github.com/facebook/rocksdb/pull/2740 Differential Revision: D5629354 Pulled By: yiwu-arbug fbshipit-source-id: 545903f6b36b6083e8cbaf777176aef2f488021d 24 August 2017, 23:10:54 UTC
92bfd6c Fix DropColumnFamily data race Summary: It should hold db mutex while accessing max_total_in_memory_state_. Closes https://github.com/facebook/rocksdb/pull/2784 Differential Revision: D5696536 Pulled By: yiwu-arbug fbshipit-source-id: 45430634d7fe11909b38e42e5f169f618681c4ee 24 August 2017, 21:56:04 UTC
7fdf735 Pinnableslice examples and blog post Summary: Closes https://github.com/facebook/rocksdb/pull/2788 Differential Revision: D5700189 Pulled By: maysamyabandeh fbshipit-source-id: 6f043e652093ff904e52f6d35190855781b87673 24 August 2017, 19:26:07 UTC
7fbb9ec support disabling checksum in block-based table Summary: store a zero as the checksum when disabled since it's easier to keep block trailer a fixed length. Closes https://github.com/facebook/rocksdb/pull/2781 Differential Revision: D5694702 Pulled By: ajkr fbshipit-source-id: 69cea9da415778ba2b600dfd9d0dfc8cb5188ecd 24 August 2017, 02:40:47 UTC
19cc66d fix clang bug in block-based table reader Summary: This is the warning that clang considers a bug and has been causing it to fail: ``` table/block_based_table_reader.cc:240:27: warning: Potential leak of memory pointed to by 'block.value' for (; biter.Valid(); biter.Next()) { ^~~~~ ``` Actually clang just doesn't have enough knowledge to statically determine it's safe. We can teach it using an assert. Closes https://github.com/facebook/rocksdb/pull/2779 Differential Revision: D5691225 Pulled By: ajkr fbshipit-source-id: 3f0d545bf44636953b30ee5243c63239e8f16d8e 23 August 2017, 22:12:05 UTC
7eba54e test compaction input-level split range tombstone assumption Summary: One of the core assumptions of DeleteRange is that files containing portions of the same range tombstone are treated as a single unit from the perspective of compaction picker. Need better tests for this. This PR adds the tests for manual compaction. Closes https://github.com/facebook/rocksdb/pull/2769 Differential Revision: D5676677 Pulled By: ajkr fbshipit-source-id: 1b4b3382b300ff7048b872911405fdf900e4fbec 23 August 2017, 21:11:32 UTC
cd26af3 Add unit test for WritePrepared skeleton Summary: Closes https://github.com/facebook/rocksdb/pull/2756 Differential Revision: D5660516 Pulled By: maysamyabandeh fbshipit-source-id: f3f3d3b5f544007a7fbdd78e49e4738b4437c7ee 23 August 2017, 20:56:03 UTC
a124798 Improved transactions support in C API Summary: Solves #2632 Added OptimisticTransactionDB to the C API. Added missing merge operations to Transaction. Added missing get_for_update operation to transaction If required I will create tests for this another day. Closes https://github.com/facebook/rocksdb/pull/2633 Differential Revision: D5600906 Pulled By: yiwu-arbug fbshipit-source-id: da23e4484433d8f59d471f778ff2ae210e3fe4eb 23 August 2017, 19:40:28 UTC
c10b391 LANGUAGE-BINDINGS.md: add another rust binding Summary: I made another rust binding. 👻 * Use C++ API (instead of C API) * Try to follow [Rust Guidelines](https://aturon.github.io/README.html) * Working in progress (the APIs are not stable yet) Closes https://github.com/facebook/rocksdb/pull/2438 Differential Revision: D5690612 Pulled By: siying fbshipit-source-id: 11d3956c33b5e5366555afbf3786b782be3046e7 23 August 2017, 19:12:21 UTC
9017743 Remove leftover references to phutil_module_cache Reviewed By: mzlee Differential Revision: D5688624 fbshipit-source-id: c726b4e56bd823b994a7b713488fef93c6f796d0 23 August 2017, 19:12:21 UTC
234f33a allow nullptr Slice only as sentinel Summary: Allow `Slice` holding nullptr as a sentinel value but not in comparisons. This new restriction eliminates the need for the manual checks in 39ef900551a4d88c8546ca086baaba76730e6162, while still conforming to glibc's `memcmp` API. Thanks siying for the idea. Users may need to migrate, so mentioned it in HISTORY.md. Closes https://github.com/facebook/rocksdb/pull/2777 Differential Revision: D5686016 Pulled By: ajkr fbshipit-source-id: 03a2ca3fd9a0ebade9d0d5686c81d59a9534f563 23 August 2017, 17:56:06 UTC
ccf7f83 Use PinnableSlice in Transactions Summary: The ::Get from DB is not augmented with an overload method that takes a PinnableSlice instead of a string. Transactions however are not yet upgraded to use the new API. As a result, transaction users such as MyRocks cannot benefit from it. This patch updates the transactional API with a PinnableSlice overload. Closes https://github.com/facebook/rocksdb/pull/2736 Differential Revision: D5645770 Pulled By: maysamyabandeh fbshipit-source-id: f6af520df902f842de1bcf99bed3e8dfc43ad96d 23 August 2017, 17:11:45 UTC
1dfcdb1 Extend pin_l0 to filter partitions Summary: This is the continuation of https://github.com/facebook/rocksdb/pull/2661 for filter partitions. When pin_l0 is set (along with cache_xxx), then open table open the filter partitions are loaded into the cache and pinned there. Closes https://github.com/facebook/rocksdb/pull/2766 Differential Revision: D5671098 Pulled By: maysamyabandeh fbshipit-source-id: 174f24018f1d7f1129621e7380287b65b67d2115 23 August 2017, 14:56:08 UTC
39ef900 stop calling memcmp with nullptrs Summary: it doesn't take nullptr according to its declaration in glibc, and calling it in this way causes our sanitizers (ubsan, clang analyze) to fail. Closes https://github.com/facebook/rocksdb/pull/2776 Differential Revision: D5683260 Pulled By: ajkr fbshipit-source-id: 114b137ee188172f96eedc43139255cae7bee80a 22 August 2017, 23:55:44 UTC
78cb6b6 Provide byte[] version of SstFileWriter.merge to reduce GC Stall Summary: In Java API, `SstFileWriter.put/merge/delete` takes `Slice` type of key and value, which is a Java wrapper object around C++ Slice object. The Slice object inherited [ `finalize`](https://github.com/facebook/rocksdb/blob/3c327ac2d0fd50bbd82fe1f1af5de909dad769e6/java/src/main/java/org/rocksdb/AbstractNativeReference.java#L69) method, which [added huge overhead](https://softwareengineering.stackexchange.com/questions/288715/is-overriding-object-finalize-really-bad/288753#288753) to JVM while creating new SstFile. To address this issue, this PR overload the merge method to take Java byte array instead of the Slice object, and added unit test for it. We also benchmark these two different merge function, where we could see GC Stall reduced from 50% to 1%, and the throughput increased from 50MB to 200MB. Closes https://github.com/facebook/rocksdb/pull/2746 Reviewed By: sagar0 Differential Revision: D5653145 Pulled By: scv119 fbshipit-source-id: b55ea58554b573d0b1c6f6170f8d9223811bc4f5 22 August 2017, 19:55:24 UTC
867fe92 Scale histogram bucket size by constant factor Summary: The goal is to reduce the number of histogram buckets, particularly now that we print these histograms for each column family. I chose 1.5 as the factor. We can adjust it later to either make buckets more granular or make fewer buckets. Closes https://github.com/facebook/rocksdb/pull/2139 Differential Revision: D4872076 Pulled By: ajkr fbshipit-source-id: 87790d782a605506c3d24190a028cecbd7aa564a 22 August 2017, 00:10:40 UTC
f004307 CMake improvements Summary: - Allow setting custom installation prefix. - Add option to disable building tests. Closes https://github.com/facebook/rocksdb/pull/2195 Differential Revision: D5054239 Pulled By: sagar0 fbshipit-source-id: 2de6bef8b7eafed60a830e1796b262f9e6f79da0 21 August 2017, 21:56:50 UTC
09ac620 Circumvent ASAN false positive Summary: Changes: * checks if ASAN mode is on, and uses malloc and free in the constructor and destructor Closes https://github.com/facebook/rocksdb/pull/2767 Differential Revision: D5671243 Pulled By: armishra fbshipit-source-id: 8e4ad0f7f163400c4effa8617d3b30134119d802 21 August 2017, 19:10:43 UTC
5b68b11 Blob db create a snapshot before every read Summary: If GC kicks in between * A Get() reads index entry from base db. * The Get() read from a blob file The GC can delete the corresponding blob file, making the key not found. Fortunately we have existing logic to avoid deleting a blob file if it is referenced by a snapshot. So the fix is to explicitly create a snapshot before reading index entry from base db. Closes https://github.com/facebook/rocksdb/pull/2754 Differential Revision: D5655956 Pulled By: yiwu-arbug fbshipit-source-id: e4ccbc51331362542e7343175bbcbdea5830f544 21 August 2017, 01:26:19 UTC
4624ae5 GC the oldest file when out of space Summary: When out of space, blob db should GC the oldest file. The current implementation GC the newest one instead. Fixing it. Closes https://github.com/facebook/rocksdb/pull/2757 Differential Revision: D5657611 Pulled By: yiwu-arbug fbshipit-source-id: 56c30a4c52e6ab04551dda8c5c46006d4070b28d 21 August 2017, 00:11:06 UTC
8ace1f7 add counter for deletion dropping optimization Summary: add this counter stat to track usage of deletion-dropping optimization. if usage is low, we can delete it to prevent bugs like #2726. Closes https://github.com/facebook/rocksdb/pull/2761 Differential Revision: D5665421 Pulled By: ajkr fbshipit-source-id: 881befa2d199838dac88709e7b376a43d304e3d4 19 August 2017, 21:10:08 UTC
0d8e992 Revert the mistake in version update Summary: https://github.com/facebook/rocksdb/pull/2661 mistakenly updates the version. This patch reverts it. Closes https://github.com/facebook/rocksdb/pull/2760 Differential Revision: D5662089 Pulled By: maysamyabandeh fbshipit-source-id: f4735e37921c0ced6081a89080c78ac3728aa8bd 18 August 2017, 21:29:39 UTC
5358a80 add VerifyChecksum to HISTORY.md Summary: it's a new feature that'll be released in 5.8, introduced by PR #2498. Closes https://github.com/facebook/rocksdb/pull/2759 Differential Revision: D5661923 Pulled By: ajkr fbshipit-source-id: 9ba9f0d146c453715358ef2dd298aa7765649d7c 18 August 2017, 21:29:39 UTC
ed0a4c9 perf_context measure user bytes read Summary: With this PR, we can measure read-amp for queries where perf_context is enabled as follows: ``` SetPerfLevel(kEnableCount); Get(1, "foo"); double read_amp = static_cast<double>(get_perf_context()->block_read_byte / get_perf_context()->get_read_bytes); SetPerfLevel(kDisable); ``` Our internal infra enables perf_context for a sampling of queries. So we'll be able to compute the read-amp for the sample set, which can give us a good estimate of read-amp. Closes https://github.com/facebook/rocksdb/pull/2749 Differential Revision: D5647240 Pulled By: ajkr fbshipit-source-id: ad73550b06990cf040cc4528fa885360f308ec12 18 August 2017, 18:43:33 UTC
1efc600 Preload l0 index partitions Summary: This fixes the existing logic for pinning l0 index partitions. The patch preloads the partitions into block cache and pin them if they belong to level 0 and pin_l0 is set. The drawback is that it does many small IOs when preloading all the partitions into the cache is direct io is enabled. Working for a solution for that. Closes https://github.com/facebook/rocksdb/pull/2661 Differential Revision: D5554010 Pulled By: maysamyabandeh fbshipit-source-id: 1e6f32a3524d71355c77d4138516dcfb601ca7b2 18 August 2017, 17:56:20 UTC
bddd5d3 Added mechanism to track deadlock chain Summary: Changes: * extended the wait_txn_map to track additional information * designed circular buffer to store n latest deadlocks' information * added test coverage to verify the additional information tracked is accurately stored in the buffer Closes https://github.com/facebook/rocksdb/pull/2630 Differential Revision: D5478025 Pulled By: armishra fbshipit-source-id: 2b138de7b5a73f5ca554fc3ff8220a3be49f39e7 18 August 2017, 01:56:21 UTC
c1384a7 fix db_stress uint64_t to int32 cast Summary: Clang complain about an cast from uint64_t to int32 in db_stress. Fixing it. Closes https://github.com/facebook/rocksdb/pull/2755 Differential Revision: D5655947 Pulled By: yiwu-arbug fbshipit-source-id: cfac10e796e0adfef4727090b50975b0d6e2c9be 18 August 2017, 00:56:55 UTC
29877ec Fix blob db crash during calculating write amp Summary: On initial call to BlobDBImpl::WaStats() `all_periods_write_` would be empty, so it will crash when we call pop_front() at line 1627. Apparently it is mean to pop only when `all_periods_write_.size() > kWriteAmplificationStatsPeriods`. The whole write amp calculation doesn't seems to be correct and it is not being exposed. Will work on it later. Test Plan Change kWriteAmplificationStatsPeriodMillisecs to 1000 (1 second) and run db_bench --use_blob_db for 5 minutes. Closes https://github.com/facebook/rocksdb/pull/2751 Differential Revision: D5648269 Pulled By: yiwu-arbug fbshipit-source-id: b843d9a09bb5f9e1b713d101ec7b87e54b5115a4 17 August 2017, 22:01:09 UTC
8f2598a Enable Cassandra merge operator to be called with a single merge operand Summary: Updating Cassandra merge operator to make use of a single merge operand when needed. Single merge operand support has been introduced in #2721. Closes https://github.com/facebook/rocksdb/pull/2753 Differential Revision: D5652867 Pulled By: sagar0 fbshipit-source-id: b9fbd3196d3ebd0b752626dbf9bec9aa53e3e26a 17 August 2017, 22:01:09 UTC
9a44b4c Allow merge operator to be called even with a single operand Summary: Added a function `MergeOperator::DoesAllowSingleMergeOperand()` to allow invoking a merge operator even with a single merge operand, if overriden. This is needed for Cassandra-on-RocksDB work. All Cassandra writes are through merges and this will allow a single merge-value to be updated in the merge-operator invoked via a compaction, if needed, due to an expired TTL. Closes https://github.com/facebook/rocksdb/pull/2721 Differential Revision: D5608706 Pulled By: sagar0 fbshipit-source-id: f299f9f91c4d1ac26e48bd5906e122c1c5e5f3fc 17 August 2017, 06:42:00 UTC
ac8fb77 fix some misspellings Summary: PTAL ajkr Closes https://github.com/facebook/rocksdb/pull/2750 Differential Revision: D5648052 Pulled By: ajkr fbshipit-source-id: 7cd1ddd61364d5a55a10fdd293fa74b2bf89dd98 17 August 2017, 04:57:20 UTC
2359317 minor improvements to db_stress Summary: fix some things that made this command hard to use from CLI: - use default values for `target_file_size_base` and `max_bytes_for_level_base`. previously we were using small values for these but default value of `write_buffer_size`, which led to enormous number of L1 files. - failure message for `value_size_mult` too big. previously there was just an assert, so in non-debug mode it'd overrun the value buffer and crash mysteriously. - only print verification success if there's no failure. before it'd print both in the failure case. - support `memtable_prefix_bloom_size_ratio` - support `num_bottom_pri_threads` (universal compaction) Closes https://github.com/facebook/rocksdb/pull/2741 Differential Revision: D5629495 Pulled By: ajkr fbshipit-source-id: ddad97d6d4ba0884e7c0f933b0a359712514fc1d 17 August 2017, 02:13:01 UTC
af012c0 fix deleterange with memtable prefix bloom Summary: the range delete tombstones in memtable should be added to the aggregator even when the memtable's prefix bloom filter tells us the lookup key's not there. This bug could cause data to temporarily reappear until the memtable containing range deletions is flushed. Reported in #2743. Closes https://github.com/facebook/rocksdb/pull/2745 Differential Revision: D5639007 Pulled By: ajkr fbshipit-source-id: 04fc6facb6f978340a3f639536f4ca7c0d73dfc9 17 August 2017, 02:13:01 UTC
1c8dbe2 update scores after picking universal compaction Summary: We forgot to recompute compaction scores after picking a universal compaction like we do in level compaction (https://github.com/facebook/rocksdb/blob/a34b2e388ee51173e44f6aa290f1301c33af9e67/db/compaction_picker.cc#L691-L695). This leads to a fairness issue where we waste compactions on CFs/DB instances that don't need it while others can starve. Previously, ccecf3f4fb8e6eeaa06504b9d477b6db4137831a fixed the issue for the read-amp-based compaction case; this PR avoids the issue earlier and also for size-ratio-based compactions. Closes https://github.com/facebook/rocksdb/pull/2688 Differential Revision: D5566191 Pulled By: ajkr fbshipit-source-id: 010bccb2a107f6a76f3d3022b90aadce5cc48feb 17 August 2017, 01:42:33 UTC
eb64253 Update WritePrepared with the pseudo code Summary: Implement the main body of WritePrepared pseudo code. This includes PrepareInternal and CommitInternal, as well as AddCommitted which updates the commit map. It also provides a IsInSnapshot method that could be later called form the read path to decide if a version is in the read snapshot or it should other be skipped. This patch lacks unit tests and does not attempt to offer an efficient implementation. The idea is that to have the API specified so that we can work on related tasks in parallel. Closes https://github.com/facebook/rocksdb/pull/2713 Differential Revision: D5640021 Pulled By: maysamyabandeh fbshipit-source-id: bfa7a05e8d8498811fab714ce4b9c21530514e1c 16 August 2017, 23:57:47 UTC
132306f Remove PartialMerge implementation from Cassandra merge operator Summary: `PartialMergeMulti` implementation is enough for Cassandra, and `PartialMerge` is not required. Implementing both will just duplicate the code. As per https://github.com/facebook/rocksdb/blob/master/include/rocksdb/merge_operator.h#L130-L135 : ``` // The default implementation of PartialMergeMulti will use this function // as a helper, for backward compatibility. Any successor class of // MergeOperator should either implement PartialMerge or PartialMergeMulti, // although implementing PartialMergeMulti is suggested as it is in general // more effective to merge multiple operands at a time instead of two // operands at a time. ``` Closes https://github.com/facebook/rocksdb/pull/2737 Reviewed By: scv119 Differential Revision: D5633073 Pulled By: sagar0 fbshipit-source-id: ef4fa102c22fec6a0175ed12f5c44c15afe3c8ca 15 August 2017, 21:59:34 UTC
71598cd Fix false removal of tombstone issue in FIFO and kCompactionStyleNone Summary: Similar to the bug fixed by https://github.com/facebook/rocksdb/pull/2726, FIFO with compaction and kCompactionStyleNone during user customized CompactFiles() with output level to be 0 can suffer from the same problem. Fix it by leveraging the bottommost_level_ flag. Closes https://github.com/facebook/rocksdb/pull/2735 Differential Revision: D5626906 Pulled By: siying fbshipit-source-id: 2b148d0461c61dbd986d74655e384419ae442158 15 August 2017, 20:02:19 UTC
3204a4f Fix missing stdlib include required for abort() Summary: If ROCKSDB_LITE is defined, a call to abort() is introduced. This call requires stdlib.h. Build log of unpatched 5.7.1: http://beefy9.nyi.freebsd.org/data/110amd64-default/447974/logs/rocksdb-lite-5.7.1.log Closes https://github.com/facebook/rocksdb/pull/2744 Reviewed By: yiwu-arbug Differential Revision: D5632372 Pulled By: lxcode fbshipit-source-id: b2a8e692bf14ccf1f875f3a00463e87bba310a2b 15 August 2017, 19:32:11 UTC
7aa96db db_stress rolling active window Summary: Support a window of `active_width` keys that rolls through `[0, max_key)` over the duration of the test. Operations only affect keys inside the window. This gives us the ability to detect L0->L0 deletion bug (#2722). Closes https://github.com/facebook/rocksdb/pull/2739 Differential Revision: D5628555 Pulled By: ajkr fbshipit-source-id: 9cb2d8f4ab1a7c73f7797b8e19f7094970ea8749 15 August 2017, 19:02:16 UTC
dfa6c23 Update RocksDBCommonHelper to use escapeshellarg Summary: Most of the data used here in shell commands is not generated directly from user input but some data (ie: from environment variables) may have been external influenced. It is a good practice to escape this data before using it in a shell command. Originally D4800264 but we never quite got it merged. Reviewed By: yiwu-arbug Differential Revision: D5595052 fbshipit-source-id: c09d8b47fe35fc6a47afb4933ccad9d56ca8d7be 15 August 2017, 13:56:31 UTC
e367774 Overload new[] to properly align LRUCacheShard Summary: Also verify it fixes gcc7 compile failure #2672 (see also #2699) Closes https://github.com/facebook/rocksdb/pull/2732 Differential Revision: D5620348 Pulled By: yiwu-arbug fbshipit-source-id: 87db657ab734f23b1bfaaa9db9b9956d10eaef59 14 August 2017, 21:41:56 UTC
ad42d2f Remove residual arcanist_util directory 14 August 2017, 17:51:48 UTC
279296f properly set C[XX]FLAGS during CMake configure-time checks Summary: Some compilers require `-std=c++11` for the `cstdint` header to be available. We already have logic to add `-std=c++11` to `CXXFLAGS` when the compiler is not MSVC; simply reorder CMakeLists.txt so that logic happens before the calls to `CHECK_CXX_SOURCE_COMPILES`. Additionally add a missing `set(CMAKE_REQUIRED_FLAGS, ...)` before a call to `CHECK_C_SOURCE_COMPILES`. Closes https://github.com/facebook/rocksdb/pull/2535 Differential Revision: D5384244 Pulled By: yiwu-arbug fbshipit-source-id: 2dbae4297c5d8ab4636e08b1457ffb2d3e37aef4 14 August 2017, 04:47:45 UTC
c5f0c6c compile with correct flags to determine SSE4.2 support Summary: With some compilers, `-std=c++11` is necessary for <cstdint> to be available. Pass this flag via $PLATFORM_CXXFLAGS. Fixes #2488. Closes https://github.com/facebook/rocksdb/pull/2545 Differential Revision: D5620610 Pulled By: yiwu-arbug fbshipit-source-id: 2f975b8c1ad52e283e677d9a33543abd064f13ce 14 August 2017, 04:47:45 UTC
185ade4 cmake: support more compression type Summary: This pr enables linking all the supported compression libraries via cmake. Closes https://github.com/facebook/rocksdb/pull/2552 Differential Revision: D5620607 Pulled By: yiwu-arbug fbshipit-source-id: b6949181f305bfdf04a98f898c92fd0caba0c45a 14 August 2017, 04:47:45 UTC
5449c09 rocksdb: make buildable on aarch64 Summary: - Remove default arch-specified flags. - Move non-default arch-specific flags to arch-specific param. Reviewed By: yiwu-arbug Differential Revision: D5597499 fbshipit-source-id: c53108ac39c73ac36893d3fd9aaf3b5e3080f1ae 14 August 2017, 00:13:54 UTC
a144a97 Fix for CMakeLists.txt on Windows for RocksJava Summary: Closes https://github.com/facebook/rocksdb/pull/2730 Differential Revision: D5619256 Pulled By: ajkr fbshipit-source-id: c80d697eeceab91964259132e58f5cd2219efb93 12 August 2017, 23:44:12 UTC
acf935e fix deletion dropping in intra-L0 Summary: `KeyNotExistsBeyondOutputLevel` didn't consider L0 files' key-ranges. So if a key only was covered by older L0 files' key-ranges, we would incorrectly drop deletions of that key. This PR just skips the deletion-dropping optimization when output level is L0. Closes https://github.com/facebook/rocksdb/pull/2726 Differential Revision: D5617286 Pulled By: ajkr fbshipit-source-id: 4bff1396b06d49a828ba4542f249191052915bce 12 August 2017, 01:12:38 UTC
8254e9b make sst_dump compression size command consistent Summary: - like other subcommands, reporting compression sizes should be specified with the `--command` CLI arg. - also added `--compression_types` arg as it's useful to restrict the types of compression used, at least in my dictionary compression experiments. Closes https://github.com/facebook/rocksdb/pull/2706 Differential Revision: D5589520 Pulled By: ajkr fbshipit-source-id: 305bb4ebcc95eecc8a85523cd3b1050619c9ddc5 11 August 2017, 23:03:44 UTC
74f18c1 db_bench support for non-uniform column family ops Summary: Previously we could only select the CF on which to operate uniformly at random. This is a limitation, e.g., when testing universal compaction as all CFs would need to run full compaction at roughly the same time, which isn't realistic. This PR allows the user to specify the probability distribution for selecting CFs via the `--column_family_distribution` argument. Closes https://github.com/facebook/rocksdb/pull/2677 Differential Revision: D5544436 Pulled By: ajkr fbshipit-source-id: 478d56260995236ae90895ce5bd51f38882e185a 11 August 2017, 20:57:17 UTC
5de98f2 approximate histogram stats to save cpu Summary: sounds like we're willing to tradeoff minor inaccuracy in stats for speed. start with histogram stats. ticker stats will be harder (and, IMO, we shouldn't change them in this manner) as many test cases rely on them being exactly correct. Closes https://github.com/facebook/rocksdb/pull/2720 Differential Revision: D5607884 Pulled By: ajkr fbshipit-source-id: 1b754cda35ea6b252d1fdd5aa3cfb58866506372 11 August 2017, 20:13:12 UTC
back to top