https://github.com/facebook/rocksdb

sort by:
Revision Author Date Message Commit Date
623774b Update version and HISTORY for 6.21.3 19 July 2021, 15:30:36 UTC
d25f018 Don't hold DB mutex for block cache entry stat scans (#8538) Summary: I previously didn't notice the DB mutex was being held during block cache entry stat scans, probably because I primarily checked for read performance regressions, because they require the block cache and are traditionally latency-sensitive. This change does some refactoring to avoid holding DB mutex and to avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats"). Some tests have to be updated because now the stats collector is populated in the Cache aggressively on DB startup rather than lazily. (I hope to clean up some of this added complexity in the future.) This change also ensures proper treatment of need_out_of_mutex for non-int DB properties. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538 Test Plan: Added unit test logic that uses sync points to fail if the DB mutex is held during a scan, covering the various ways that a scan might be triggered. Performance test - the known impact to holding the DB mutex is on TransactionDB, and the easiest way to see the impact is to hack the scan code to almost always miss and take an artificially long time scanning. Here I've injected an unconditional 5s sleep at the call to ApplyToAllEntries. Before (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op' randomtransaction : 433.219 micros/op 2308 ops/sec; 0.1 MB/s ( transactions:78999 aborts:0) rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856 $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op' randomtransaction : 448.802 micros/op 2228 ops/sec; 0.1 MB/s ( transactions:75999 aborts:0) rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323 Notice the 5s P100 write time. After (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op' randomtransaction : 303.645 micros/op 3293 ops/sec; 0.1 MB/s ( transactions:98999 aborts:0) rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407 $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op' randomtransaction : 310.383 micros/op 3221 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918 P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code: $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op' randomtransaction : 311.365 micros/op 3211 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767 $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op' randomtransaction : 308.395 micros/op 3242 ops/sec; 0.1 MB/s ( transactions:97999 aborts:0) rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832 No substantial difference. Reviewed By: siying Differential Revision: D29738847 Pulled By: pdillinger fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b 19 July 2021, 15:26:18 UTC
61b95f9 Fix double-dumping CF stats to log (#8380) Summary: DBImpl::DumpStats is supposed to do this: Dump DB stats to LOG For each CF, dump CFStatsNoFileHistogram to LOG For each CF, dump CFFileHistogram to LOG Instead, due to a longstanding bug from 2017 (https://github.com/facebook/rocksdb/issues/2126), it would dump CFStats, which includes both CFStatsNoFileHistogram and CFFileHistogram, in both loops, resulting in near-duplicate output. This fixes the bug. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8380 Test Plan: Manual inspection of LOG after db_bench Reviewed By: jay-zhuang Differential Revision: D29017535 Pulled By: pdillinger fbshipit-source-id: 3010604c4a629a80347f129cd746ce9b0d0cbda6 19 July 2021, 15:09:13 UTC
c58a32b Standardize on GCC for TSAN conditional compilation (#8543) Summary: In https://github.com/facebook/rocksdb/issues/8539 I accidentally only checked for GCC TSAN, which is what I tested locally, while CircleCI and FB CI use clang TSAN. Related: other existing code like in stack_trace.cc only check for clang TSAN. I've now standardized these to the GCC convention in port/lang.h, so now #ifdef __SANITIZE_THREAD__ can check for any TSAN (assuming lang.h include) Pull Request resolved: https://github.com/facebook/rocksdb/pull/8543 Test Plan: Put an assert(false) in slice_test and look for the NOTE about "signal-unsafe call", both GCC and clang. Eventually, CircleCI TSAN in https://github.com/facebook/rocksdb/issues/8538 Reviewed By: zhichao-cao Differential Revision: D29728483 Pulled By: pdillinger fbshipit-source-id: 8a3b8015c2ed48078214c3ee17146a2c3f11c9f7 19 July 2021, 15:05:40 UTC
7c70cee Work around falsely reported data race on LRUHandle::flags (#8539) Summary: Some bits are mutated and read while holding a lock, other immutable bits (esp. secondary cache compatibility) can be read by arbitrary threads without holding a lock. AFAIK, this doesn't cause an issue on any architecture we care about, because you will get some legitimate version of the value that includes the initialization, as long as synchronization guarantees the initialization happens before the read. I've only seen this in https://github.com/facebook/rocksdb/issues/8538 so far, but it should be fixed regardless. Otherwise, we'll surely get these false reports again some time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8539 Test Plan: some local TSAN test runs and in CircleCI Reviewed By: zhichao-cao Differential Revision: D29720262 Pulled By: pdillinger fbshipit-source-id: 365fd7e565577c648815161f71b339bcb5ce12d5 19 July 2021, 15:05:27 UTC
392d727 Update version and HISTORY for 6.21.2 14 June 2021, 22:17:14 UTC
95168f3 Pin CacheEntryStatsCollector to fix performance bug (#8385) Summary: If the block Cache is full with strict_capacity_limit=false, then our CacheEntryStatsCollector could be immediately evicted on release, so iterating through column families with shared block cache could trigger re-scan for each CF. This change fixes that problem by pinning the CacheEntryStatsCollector from InternalStats so that it's not evicted. I had originally thought that this object could participate in LRU like everything else, but even though a re-load+re-scan only touches memory, it can be orders of magnitude more expensive than other cache misses. One service in Facebook has scans that take ~20s over 100GB block cache that is mostly 4KB entries. (The up-side of this bug and https://github.com/facebook/rocksdb/issues/8369 is that we had a natural experiment on the effect on some service metrics even with block cache scans running continuously in the background--a kind of worst case scenario. Metrics like latency were not affected enough to trigger warnings.) Other smaller fixes: 20s is already a sizable portion of 600s stats dump period, or 180s default max age to force re-scan, so added logic to ensure that (for each block cache) we don't spend more than 0.2% of our background thread time scanning it. Nevertheless, "foreground" requests for cache entry stats (calls to `db->GetMapProperty(DB::Properties::kBlockCacheEntryStats)`) are permitted to consume more CPU. Renamed field to cache_entry_stats_ to match code style. This change is intended for patching in 6.21 release. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8385 Test Plan: unit test expanded to cover new logic (detect regression), some manual testing with db_bench Reviewed By: ajkr Differential Revision: D29042759 Pulled By: pdillinger fbshipit-source-id: 236faa902397f50038c618f50fbc8cf3f277308c 14 June 2021, 22:06:50 UTC
b47bbbf Fix runtime linkage with libasan in Facebook platform009 (#8402) Summary: Was seeing ./cache_test: error while loading shared libraries: libasan.so.5: cannot open shared object file: No such file or directory etc. using COMPILE_WITH_ASAN=1 without USE_CLANG=1 Now including compiler libs in runtime ld path. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8402 Test Plan: reproduced with local builds Reviewed By: akankshamahajan15 Differential Revision: D29107729 Pulled By: pdillinger fbshipit-source-id: 13805b87b846b39522c9dd6a231ca245c58f1c71 14 June 2021, 22:06:32 UTC
ba98c17 Fix^2 use of binutils in Facebook platform009 (#8399) (#8401) Summary: Internal builds still failing, this time with ld Pull Request resolved: https://github.com/facebook/rocksdb/pull/8401 Test Plan: Like https://github.com/facebook/rocksdb/issues/8399 but letting build run to completion Reviewed By: bjlemaire Differential Revision: D29103512 Pulled By: pdillinger fbshipit-source-id: 0fcad2c63518cf2b721e749881da40b90f5d3133 14 June 2021, 22:06:14 UTC
8a33ede Fix use of binutils in Facebook platform009 (#8399) Summary: Internal builds failing Pull Request resolved: https://github.com/facebook/rocksdb/pull/8399 Test Plan: I can reproduce a failure by putting a bad version of `as` in my PATH. This indicates that before this change, the custom compiler is falsely relying on host `as`. This change fixes that, ignoring the bad `as` on PATH. Reviewed By: akankshamahajan15 Differential Revision: D29094159 Pulled By: pdillinger fbshipit-source-id: c432e90404ea4d39d885a685eebbb08be9eda1c8 14 June 2021, 22:05:59 UTC
5023dc5 Make platform009 default for FB developers (#8389) Summary: platform007 being phased out and sometimes broken Pull Request resolved: https://github.com/facebook/rocksdb/pull/8389 Test Plan: `make V=1` to see which compiler is being used Reviewed By: jay-zhuang Differential Revision: D29067183 Pulled By: pdillinger fbshipit-source-id: d1b07267cbc55baa9395f2f4fe3967cc6dad52f7 11 June 2021, 18:37:48 UTC
bfb9496 Modify script which generates TARGETS (#8366) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8366 Test Plan: Run it, `TARGETS` now unchanged. Reviewed By: jay-zhuang Differential Revision: D28914138 Pulled By: stepancheg fbshipit-source-id: 04d24cdf1439edf4204a3ba1f646e9e75a00d92b 08 June 2021, 17:04:33 UTC
d216050 Enable Starlark for fbcode//i* Summary: #forcetdhashing Reviewed By: ndmitchell Differential Revision: D28873060 fbshipit-source-id: 7d3be3e7d38619ec5b0b117f462ca1b9f427aa94 08 June 2021, 17:04:22 UTC
8c2f72a Update version and HISTORY for 6.21.1 08 June 2021, 15:56:21 UTC
229640f Fix a major performance bug in 6.21 for cache entry stats (#8369) Summary: In final polishing of https://github.com/facebook/rocksdb/issues/8297 (after most manual testing), I broke my own caching layer by sanitizing an input parameter with std::min(0, x) instead of std::max(0, x). I resisted unit testing the timing part of the result caching because historically, these test are either flaky or difficult to write, and this was not a correctness issue. This bug is essentially unnoticeable with a small number of column families but can explode background work with a large number of column families. This change fixes the logical error, removes some unnecessary related optimization, and adds mock time/sleeps to the unit test to ensure we can cache hit within the age limit. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8369 Test Plan: added time testing logic to existing unit test Reviewed By: ajkr Differential Revision: D28950892 Pulled By: pdillinger fbshipit-source-id: e79cd4ff3eec68fd0119d994f1ed468c38026c3b 08 June 2021, 15:53:02 UTC
c7f8ae9 SequenceIterWrapper should use internal comparator (#8328) Summary: https://github.com/facebook/rocksdb/pull/8288 introduces a bug: SequenceIterWrapper should do next for seek key using internal key comparator rather than user comparator. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8328 Test Plan: Pass all existing tests Reviewed By: ltamasi Differential Revision: D28647263 fbshipit-source-id: 4081d684fd8a86d248c485ef8a1563c7af136447 24 May 2021, 20:15:22 UTC
5d44932 fix lru caching test and fix reference binding to null pointer (#8326) Summary: Fix for https://github.com/facebook/rocksdb/issues/8315. Inhe lru caching test, 5100 is not enough to hold meta block and first block in some random case, increase to 6100. Fix the reference binding to null pointer, use template. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8326 Test Plan: make check Reviewed By: pdillinger Differential Revision: D28625666 Pulled By: zhichao-cao fbshipit-source-id: 97b85306ae3d09bfb74addc7c65e57fe55a976a5 24 May 2021, 18:04:54 UTC
436e4f1 Update version and HISTORY.md 22 May 2021, 05:37:18 UTC
55853de Fix clang-analyze: use uninitiated variable (#8325) Summary: Error: ``` db/db_compaction_test.cc:5211:47: warning: The left operand of '*' is a garbage value uint64_t total = (l1_avg_size + l2_avg_size * 10) * 10; ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8325 Test Plan: `$ make analyze` Reviewed By: pdillinger Differential Revision: D28620916 Pulled By: jay-zhuang fbshipit-source-id: f6d58ab84eefbcc905cda45afb9522b0c6d230f8 22 May 2021, 02:06:47 UTC
7303d02 Use new Insert and Lookup APIs in table reader to support secondary cache (#8315) Summary: Secondary cache is implemented to achieve the secondary cache tier for block cache. New Insert and Lookup APIs are introduced in https://github.com/facebook/rocksdb/issues/8271 . To support and use the secondary cache in block based table reader, this PR introduces the corresponding callback functions that will be used in secondary cache, and update the Insert and Lookup APIs accordingly. benchmarking: ./db_bench --benchmarks="fillrandom" -num=1000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/tmp/rocks_t/db -partition_index_and_filters=true ./db_bench -db=/tmp/rocks_t/db -use_existing_db=true -benchmarks=readrandom -num=1000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=5 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -stats_dump_period_sec=30 -reads=50000000 master benchmarking results: readrandom : 3.923 micros/op 254881 ops/sec; 33.4 MB/s (23849796 of 50000000 found) rocksdb.db.get.micros P50 : 2.820992 P95 : 5.636716 P99 : 16.450553 P100 : 8396.000000 COUNT : 50000000 SUM : 179947064 Current PR benchmarking results readrandom : 4.083 micros/op 244925 ops/sec; 32.1 MB/s (23849796 of 50000000 found) rocksdb.db.get.micros P50 : 2.967687 P95 : 5.754916 P99 : 15.665912 P100 : 8213.000000 COUNT : 50000000 SUM : 187250053 About 3.8% throughput reduction. P50: 5.2% increasing, P95, 2.09% increasing, P99 4.77% improvement Pull Request resolved: https://github.com/facebook/rocksdb/pull/8315 Test Plan: added the testing case Reviewed By: anand1976 Differential Revision: D28599774 Pulled By: zhichao-cao fbshipit-source-id: 098c4df0d7327d3a546df7604b2f1602f13044ed 22 May 2021, 01:29:12 UTC
6c7c3e8 Use large macos instance (#8320) Summary: Macos build is taking more than 1 hour, bump the instance type from the default medium to large (large macos instance was not available before). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8320 Test Plan: watch CI pass Reviewed By: ajkr Differential Revision: D28589456 Pulled By: jay-zhuang fbshipit-source-id: cff78dae5aaf9de90ade3468469290176de5ff32 22 May 2021, 01:17:03 UTC
3469d60 Add table properties for number of entries added to filters (#8323) Summary: With Ribbon filter work and possible variance in actual bits per key (or prefix; general term "entry") to achieve certain FP rates, I've received a request to be able to track actual bits per key in generated filters. This change adds a num_filter_entries table property, which can be combined with filter_size to get bits per key (entry). This can vary from num_entries in at least these ways: * Different versions of same key are only counted once in filters. * With prefix filters, several user keys map to the same filter entry. * A single filter can include both prefixes and user keys. Note that FilterBlockBuilder::NumAdded() didn't do anything useful except distinguish empty from non-empty. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8323 Test Plan: basic unit test included, others updated Reviewed By: jay-zhuang Differential Revision: D28596210 Pulled By: pdillinger fbshipit-source-id: 529a111f3c84501e5a470bc84705e436ee68c376 22 May 2021, 00:11:32 UTC
6c86543 Fix manual compaction `max_compaction_bytes` under-calculated issue (#8269) Summary: Fix a bug that for manual compaction, `max_compaction_bytes` is only limit the SST files from input level, but not overlapped files on output level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8269 Test Plan: `make check` Reviewed By: ajkr Differential Revision: D28231044 Pulled By: jay-zhuang fbshipit-source-id: 9d7d03004f30cc4b1b9819830141436907554b7c 21 May 2021, 21:03:44 UTC
bd3d080 Try to build with liburing by default. (#8322) Summary: By default, try to build with liburing. For make, if ROCKSDB_USE_IO_URING is not set, treat as 1, which means RocksDB will try to build with liburing. For cmake, add WITH_LIBURING to control it, with default on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8322 Test Plan: Build using cmake and make. Reviewed By: anand1976 Differential Revision: D28586498 fbshipit-source-id: cfd39159ab697f4b93a9293a59c07f839b1e7ed5 21 May 2021, 17:21:53 UTC
2f1984d Compare memtable insert and flush count (#8288) Summary: When a memtable is flushed, it will validate number of entries it reads, and compare the number with how many entries inserted into memtable. This serves as one sanity c\ heck against memory corruption. This change will also allow more counters to be added in the future for better validation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8288 Test Plan: Pass all existing tests Reviewed By: ajkr Differential Revision: D28369194 fbshipit-source-id: 7ff870380c41eab7f99eee508550dcdce32838ad 20 May 2021, 23:07:28 UTC
94b4faa Deflake ExternalSSTFileTest.PickedLevelBug (#8307) Summary: The test want to make sure these's no compaction during `AddFile` (between `DBImpl::AddFile:MutexLock` and `DBImpl::AddFile:MutexUnlock`) but the mutex could be unlocked by `EnterUnbatched()`. Move the lock start point after bumping the ingest file number. Also fix the dead lock when ASSERT fails. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8307 Reviewed By: ajkr Differential Revision: D28479849 Pulled By: jay-zhuang fbshipit-source-id: b3c50f66aa5d5f59c5c27f815bfea189c4cd06cb 20 May 2021, 16:29:57 UTC
f76326e Bump nokogiri from 1.11.1 to 1.11.4 in /docs (#8318) Summary: Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.11.1 to 1.11.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/releases">nokogiri's releases</a>.</em></p> <blockquote> <h2>1.11.4 / 2021-05-14</h2> <h3>Security</h3> <p>[CRuby] Vendored libxml2 upgraded to v2.9.12 which addresses:</p> <ul> <li><a href="https://security.archlinux.org/CVE-2019-20388">CVE-2019-20388</a></li> <li><a href="https://security.archlinux.org/CVE-2020-24977">CVE-2020-24977</a></li> <li><a href="https://security.archlinux.org/CVE-2021-3517">CVE-2021-3517</a></li> <li><a href="https://security.archlinux.org/CVE-2021-3518">CVE-2021-3518</a></li> <li><a href="https://security.archlinux.org/CVE-2021-3537">CVE-2021-3537</a></li> <li><a href="https://security.archlinux.org/CVE-2021-3541">CVE-2021-3541</a></li> </ul> <p>Note that two additional CVEs were addressed upstream but are not relevant to this release. <a href="https://security.archlinux.org/CVE-2021-3516">CVE-2021-3516</a> via <code>xmllint</code> is not present in Nokogiri, and <a href="https://security.archlinux.org/CVE-2020-7595">CVE-2020-7595</a> has been patched in Nokogiri since v1.10.8 (see <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1992">https://github.com/facebook/rocksdb/issues/1992</a>).</p> <p>Please see <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-7rrm-v45f-jp64">nokogiri/GHSA-7rrm-v45f-jp64 </a> or <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2233">https://github.com/facebook/rocksdb/issues/2233</a> for a more complete analysis of these CVEs and patches.</p> <h3>Dependencies</h3> <ul> <li>[CRuby] vendored libxml2 is updated from 2.9.10 to 2.9.12. (Note that 2.9.11 was skipped because it was superseded by 2.9.12 a few hours after its release.)</li> </ul> <h2>1.11.3 / 2021-04-07</h2> <h3>Fixed</h3> <ul> <li>[CRuby] Passing non-<code>Node</code> objects to <code>Document#root=</code> now raises an <code>ArgumentError</code> exception. Previously this likely segfaulted. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1900">https://github.com/facebook/rocksdb/issues/1900</a>]</li> <li>[JRuby] Passing non-<code>Node</code> objects to <code>Document#root=</code> now raises an <code>ArgumentError</code> exception. Previously this raised a <code>TypeError</code> exception.</li> <li>[CRuby] arm64/aarch64 systems (like Apple's M1) can now compile libxml2 and libxslt from source (though we continue to strongly advise users to install the native gems for the best possible experience)</li> </ul> <h2>1.11.2 / 2021-03-11</h2> <h3>Fixed</h3> <ul> <li>[CRuby] <code>NodeSet</code> may now safely contain <code>Node</code> objects from multiple documents. Previously the GC lifecycle of the parent <code>Document</code> objects could lead to nodes being GCed while still in scope. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1952#issuecomment-770856928">https://github.com/facebook/rocksdb/issues/1952</a>]</li> <li>[CRuby] Patch libxml2 to avoid &quot;huge input lookup&quot; errors on large CDATA elements. (See upstream <a href="https://gitlab.gnome.org/GNOME/libxml2/-/issues/200">GNOME/libxml2#200</a> and <a href="https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/100">GNOME/libxml2!100</a>.) [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2132">https://github.com/facebook/rocksdb/issues/2132</a>].</li> <li>[CRuby+Windows] Enable Nokogumbo (and other downstream gems) to compile and link against <code>nokogiri.so</code> by including <code>LDFLAGS</code> in <code>Nokogiri::VERSION_INFO</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2167">https://github.com/facebook/rocksdb/issues/2167</a>]</li> <li>[CRuby] <code>{XML,HTML}::Document.parse</code> now invokes <code>#initialize</code> exactly once. Previously <code>#initialize</code> was invoked twice on each object.</li> <li>[JRuby] <code>{XML,HTML}::Document.parse</code> now invokes <code>#initialize</code> exactly once. Previously <code>#initialize</code> was not called, which was a problem for subclassing such as done by <code>Loofah</code>.</li> </ul> <h3>Improved</h3> <ul> <li>Reduce the number of object allocations needed when parsing an HTML::DocumentFragment. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2087">https://github.com/facebook/rocksdb/issues/2087</a>] (Thanks, <a href="https://github.com/ashmaroli"><code>@​ashmaroli</code></a>!)</li> <li>[JRuby] Update the algorithm used to calculate <code>Node#line</code> to be wrong less-often. The underlying parser, Xerces, does not track line numbers, and so we've always used a hacky solution for this method. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1223">https://github.com/facebook/rocksdb/issues/1223</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2177">https://github.com/facebook/rocksdb/issues/2177</a>]</li> <li>Introduce <code>--enable-system-libraries</code> and <code>--disable-system-libraries</code> flags to <code>extconf.rb</code>. These flags provide the same functionality as <code>--use-system-libraries</code> and the <code>NOKOGIRI_USE_SYSTEM_LIBRARIES</code> environment variable, but are more idiomatic. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2193">https://github.com/facebook/rocksdb/issues/2193</a>] (Thanks, <a href="https://github.com/eregon"><code>@​eregon</code></a>!)</li> <li>[TruffleRuby] <code>--disable-static</code> is now the default on TruffleRuby when the packaged libraries are used. This is more flexible and compiles faster. (Note, though, that the default on TR is still to use system libraries.) [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2191#issuecomment-780724627">https://github.com/facebook/rocksdb/issues/2191</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2193">https://github.com/facebook/rocksdb/issues/2193</a>] (Thanks, <a href="https://github.com/eregon"><code>@​eregon</code></a>!)</li> </ul> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md">nokogiri's changelog</a>.</em></p> <blockquote> <h2>1.11.4 / 2021-05-14</h2> <h3>Security</h3> <p>[CRuby] Vendored libxml2 upgraded to v2.9.12 which addresses:</p> <ul> <li><a href="https://security.archlinux.org/CVE-2019-20388">CVE-2019-20388</a></li> <li><a href="https://security.archlinux.org/CVE-2020-24977">CVE-2020-24977</a></li> <li><a href="https://security.archlinux.org/CVE-2021-3517">CVE-2021-3517</a></li> <li><a href="https://security.archlinux.org/CVE-2021-3518">CVE-2021-3518</a></li> <li><a href="https://security.archlinux.org/CVE-2021-3537">CVE-2021-3537</a></li> <li><a href="https://security.archlinux.org/CVE-2021-3541">CVE-2021-3541</a></li> </ul> <p>Note that two additional CVEs were addressed upstream but are not relevant to this release. <a href="https://security.archlinux.org/CVE-2021-3516">CVE-2021-3516</a> via <code>xmllint</code> is not present in Nokogiri, and <a href="https://security.archlinux.org/CVE-2020-7595">CVE-2020-7595</a> has been patched in Nokogiri since v1.10.8 (see <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1992">https://github.com/facebook/rocksdb/issues/1992</a>).</p> <p>Please see <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-7rrm-v45f-jp64">nokogiri/GHSA-7rrm-v45f-jp64 </a> or <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2233">https://github.com/facebook/rocksdb/issues/2233</a> for a more complete analysis of these CVEs and patches.</p> <h3>Dependencies</h3> <ul> <li>[CRuby] vendored libxml2 is updated from 2.9.10 to 2.9.12. (Note that 2.9.11 was skipped because it was superseded by 2.9.12 a few hours after its release.)</li> </ul> <h2>1.11.3 / 2021-04-07</h2> <h3>Fixed</h3> <ul> <li>[CRuby] Passing non-<code>Node</code> objects to <code>Document#root=</code> now raises an <code>ArgumentError</code> exception. Previously this likely segfaulted. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1900">https://github.com/facebook/rocksdb/issues/1900</a>]</li> <li>[JRuby] Passing non-<code>Node</code> objects to <code>Document#root=</code> now raises an <code>ArgumentError</code> exception. Previously this raised a <code>TypeError</code> exception.</li> <li>[CRuby] arm64/aarch64 systems (like Apple's M1) can now compile libxml2 and libxslt from source (though we continue to strongly advise users to install the native gems for the best possible experience)</li> </ul> <h2>1.11.2 / 2021-03-11</h2> <h3>Fixed</h3> <ul> <li>[CRuby] <code>NodeSet</code> may now safely contain <code>Node</code> objects from multiple documents. Previously the GC lifecycle of the parent <code>Document</code> objects could lead to nodes being GCed while still in scope. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1952#issuecomment-770856928">https://github.com/facebook/rocksdb/issues/1952</a>]</li> <li>[CRuby] Patch libxml2 to avoid &quot;huge input lookup&quot; errors on large CDATA elements. (See upstream <a href="https://gitlab.gnome.org/GNOME/libxml2/-/issues/200">GNOME/libxml2#200</a> and <a href="https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/100">GNOME/libxml2!100</a>.) [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2132">https://github.com/facebook/rocksdb/issues/2132</a>].</li> <li>[CRuby+Windows] Enable Nokogumbo (and other downstream gems) to compile and link against <code>nokogiri.so</code> by including <code>LDFLAGS</code> in <code>Nokogiri::VERSION_INFO</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2167">https://github.com/facebook/rocksdb/issues/2167</a>]</li> <li>[CRuby] <code>{XML,HTML}::Document.parse</code> now invokes <code>#initialize</code> exactly once. Previously <code>#initialize</code> was invoked twice on each object.</li> <li>[JRuby] <code>{XML,HTML}::Document.parse</code> now invokes <code>#initialize</code> exactly once. Previously <code>#initialize</code> was not called, which was a problem for subclassing such as done by <code>Loofah</code>.</li> </ul> <h3>Improved</h3> <ul> <li>Reduce the number of object allocations needed when parsing an <code>HTML::DocumentFragment</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2087">https://github.com/facebook/rocksdb/issues/2087</a>] (Thanks, <a href="https://github.com/ashmaroli"><code>@​ashmaroli</code></a>!)</li> <li>[JRuby] Update the algorithm used to calculate <code>Node#line</code> to be wrong less-often. The underlying parser, Xerces, does not track line numbers, and so we've always used a hacky solution for this method. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1223">https://github.com/facebook/rocksdb/issues/1223</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2177">https://github.com/facebook/rocksdb/issues/2177</a>]</li> <li>Introduce <code>--enable-system-libraries</code> and <code>--disable-system-libraries</code> flags to <code>extconf.rb</code>. These flags provide the same functionality as <code>--use-system-libraries</code> and the <code>NOKOGIRI_USE_SYSTEM_LIBRARIES</code> environment variable, but are more idiomatic. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2193">https://github.com/facebook/rocksdb/issues/2193</a>] (Thanks, <a href="https://github.com/eregon"><code>@​eregon</code></a>!)</li> <li>[TruffleRuby] <code>--disable-static</code> is now the default on TruffleRuby when the packaged libraries are used. This is more flexible and compiles faster. (Note, though, that the default on TR is still to use system libraries.) [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2191#issuecomment-780724627">https://github.com/facebook/rocksdb/issues/2191</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2193">https://github.com/facebook/rocksdb/issues/2193</a>] (Thanks, <a href="https://github.com/eregon"><code>@​eregon</code></a>!)</li> </ul> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/sparklemotion/nokogiri/commit/9d69b44ed3357b8069856083d39ee418cd10109b"><code>9d69b44</code></a> version bump to v1.11.4</li> <li><a href="https://github.com/sparklemotion/nokogiri/commit/058e87fdfda2cc2f309df098d18fe8856e785fcc"><code>058e87f</code></a> update CHANGELOG with complete CVE information</li> <li><a href="https://github.com/sparklemotion/nokogiri/commit/92852514a0d4621961deb6ce249441ff5140358f"><code>9285251</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2234">https://github.com/facebook/rocksdb/issues/2234</a> from sparklemotion/2233-upgrade-to-libxml-2-9-12</li> <li><a href="https://github.com/sparklemotion/nokogiri/commit/5436f6120f883e9f185d48b992f39118a4897760"><code>5436f61</code></a> update CHANGELOG</li> <li><a href="https://github.com/sparklemotion/nokogiri/commit/761d320af2872c61b91f7b147cf57481566e3c67"><code>761d320</code></a> patch: renumber libxml2 patches</li> <li><a href="https://github.com/sparklemotion/nokogiri/commit/889ee2a9cb1e190bfa664cbf3552585f4d0a09a7"><code>889ee2a</code></a> test: update behavior of namespaces in HTML</li> <li><a href="https://github.com/sparklemotion/nokogiri/commit/9751d852c005606447dac7bb17f1a56593014583"><code>9751d85</code></a> test: remove low-value HTML::SAX::PushParser encoding test</li> <li><a href="https://github.com/sparklemotion/nokogiri/commit/9fcb7d25eabfab5e701d882e72ecab3b2ea6b13c"><code>9fcb7d2</code></a> test: adjust xpath gc test to libxml2's max recursion depth</li> <li><a href="https://github.com/sparklemotion/nokogiri/commit/1c99019f5f1bee23e4bff6cf72871f470097f7b2"><code>1c99019</code></a> patch: backport libxslt configure.ac change for libxml2 config</li> <li><a href="https://github.com/sparklemotion/nokogiri/commit/82a253fe7c5bdfab5fbe4c1b0c536b5ce4c72ac3"><code>82a253f</code></a> patch: fix isnan/isinf patch to apply cleanly to libxml 2.9.12</li> <li>Additional commits viewable in <a href="https://github.com/sparklemotion/nokogiri/compare/v1.11.1...v1.11.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nokogiri&package-manager=bundler&previous-version=1.11.1&new-version=1.11.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts). </details> Pull Request resolved: https://github.com/facebook/rocksdb/pull/8318 Reviewed By: pdillinger Differential Revision: D28541823 Pulled By: jay-zhuang fbshipit-source-id: e431517d1dcd4a19b358b3a98b1578539158e1fe 20 May 2021, 15:39:28 UTC
3786181 Add remote compaction public API (#8300) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8300 Reviewed By: ajkr Differential Revision: D28464726 Pulled By: jay-zhuang fbshipit-source-id: 49e9f4fb791808a6cbf39a7b1a331373f645fc5e 20 May 2021, 04:41:31 UTC
311a544 Use deleters to label cache entries and collect stats (#8297) Summary: This change gathers and publishes statistics about the kinds of items in block cache. This is especially important for profiling relative usage of cache by index vs. filter vs. data blocks. It works by iterating over the cache during periodic stats dump (InternalStats, stats_dump_period_sec) or on demand when DB::Get(Map)Property(kBlockCacheEntryStats), except that for efficiency and sharing among column families, saved data from the last scan is used when the data is not considered too old. The new information can be seen in info LOG, for example: Block cache LRUCache@0x7fca62229330 capacity: 95.37 MB collections: 8 last_copies: 0 last_secs: 0.00178 secs_since: 0 Block cache entry stats(count,size,portion): DataBlock(7092,28.24 MB,29.6136%) FilterBlock(215,867.90 KB,0.888728%) FilterMetaBlock(2,5.31 KB,0.00544%) IndexBlock(217,180.11 KB,0.184432%) WriteBuffer(1,256.00 KB,0.262144%) Misc(1,0.00 KB,0%) And also through DB::GetProperty and GetMapProperty (here using ldb just for demonstration): $ ./ldb --db=/dev/shm/dbbench/ get_property rocksdb.block-cache-entry-stats rocksdb.block-cache-entry-stats.bytes.data-block: 0 rocksdb.block-cache-entry-stats.bytes.deprecated-filter-block: 0 rocksdb.block-cache-entry-stats.bytes.filter-block: 0 rocksdb.block-cache-entry-stats.bytes.filter-meta-block: 0 rocksdb.block-cache-entry-stats.bytes.index-block: 178992 rocksdb.block-cache-entry-stats.bytes.misc: 0 rocksdb.block-cache-entry-stats.bytes.other-block: 0 rocksdb.block-cache-entry-stats.bytes.write-buffer: 0 rocksdb.block-cache-entry-stats.capacity: 8388608 rocksdb.block-cache-entry-stats.count.data-block: 0 rocksdb.block-cache-entry-stats.count.deprecated-filter-block: 0 rocksdb.block-cache-entry-stats.count.filter-block: 0 rocksdb.block-cache-entry-stats.count.filter-meta-block: 0 rocksdb.block-cache-entry-stats.count.index-block: 215 rocksdb.block-cache-entry-stats.count.misc: 1 rocksdb.block-cache-entry-stats.count.other-block: 0 rocksdb.block-cache-entry-stats.count.write-buffer: 0 rocksdb.block-cache-entry-stats.id: LRUCache@0x7f3636661290 rocksdb.block-cache-entry-stats.percent.data-block: 0.000000 rocksdb.block-cache-entry-stats.percent.deprecated-filter-block: 0.000000 rocksdb.block-cache-entry-stats.percent.filter-block: 0.000000 rocksdb.block-cache-entry-stats.percent.filter-meta-block: 0.000000 rocksdb.block-cache-entry-stats.percent.index-block: 2.133751 rocksdb.block-cache-entry-stats.percent.misc: 0.000000 rocksdb.block-cache-entry-stats.percent.other-block: 0.000000 rocksdb.block-cache-entry-stats.percent.write-buffer: 0.000000 rocksdb.block-cache-entry-stats.secs_for_last_collection: 0.000052 rocksdb.block-cache-entry-stats.secs_since_last_collection: 0 Solution detail - We need some way to flag what kind of blocks each entry belongs to, preferably without changing the Cache API. One of the complications is that Cache is a general interface that could have other users that don't adhere to whichever convention we decide on for keys and values. Or we would pay for an extra field in the Handle that would only be used for this purpose. This change uses a back-door approach, the deleter, to indicate the "role" of a Cache entry (in addition to the value type, implicitly). This has the added benefit of ensuring proper code origin whenever we recognize a particular role for a cache entry; if the entry came from some other part of the code, it will use an unrecognized deleter, which we simply attribute to the "Misc" role. An internal API makes for simple instantiation and automatic registration of Cache deleters for a given value type and "role". Another internal API, CacheEntryStatsCollector, solves the problem of caching the results of a scan and sharing them, to ensure scans are neither excessive nor redundant so as not to harm Cache performance. Because code is added to BlocklikeTraits, it is pulled out of block_based_table_reader.cc into its own file. This is a reformulation of https://github.com/facebook/rocksdb/issues/8276, without the type checking option (could still be added), and with actual stat gathering. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8297 Test Plan: manual testing with db_bench, and a couple of basic unit tests Reviewed By: ltamasi Differential Revision: D28488721 Pulled By: pdillinger fbshipit-source-id: 472f524a9691b5afb107934be2d41d84f2b129fb 19 May 2021, 23:51:13 UTC
748e3ac Add StartThread type checking wrapper (#8303) Summary: - Add class `FunctorWrapper` to invoke the function with given parameters - Implement `StartThreadTyped` which wraps `StartThread` with type checking cover - Demonstrate `StartThreadTyped` in test `util/thread_local_test.cc` https://github.com/facebook/rocksdb/issues/8285 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8303 Reviewed By: ajkr Differential Revision: D28539318 Pulled By: pdillinger fbshipit-source-id: 624789c236bde31163deda95c1e1471aee68933e 19 May 2021, 23:51:13 UTC
13232e1 Allow cache_bench/db_bench to use a custom secondary cache (#8312) Summary: This PR adds a ```-secondary_cache_uri``` option to the cache_bench and db_bench tools to allow the user to specify a custom secondary cache URI. The object registry is used to create an instance of the ```SecondaryCache``` object of the type specified in the URI. The main cache_bench code is packaged into a separate library, similar to db_bench. An example invocation of db_bench with a secondary cache URI - ```db_bench --env_uri=ws://ws.flash_sandbox.vll1_2/ -db=anand/nvm_cache_2 -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=67108864 -cache_index_and_filter_blocks=true -secondary_cache_uri='cachelibwrapper://filename=/home/anand76/nvm_cache/cache_file;size=2147483648;regionSize=16777216;admPolicy=random;admProbability=1.0;volatileSize=8388608;bktPower=20;lockPower=12' -partition_index_and_filters=true -duration=1800``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8312 Reviewed By: zhichao-cao Differential Revision: D28544325 Pulled By: anand1976 fbshipit-source-id: 8f209b9af900c459dc42daa7a610d5f00176eeed 19 May 2021, 22:26:18 UTC
871a2cb Fix test issue in new env_test tests (#8319) Summary: The two new tests added to env_test don't clear sync points, so if tests are run in continuous mode, rather than parallel mode, the next test will trigger previous sync point and fail. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8319 Test Plan: Run the tests in continuous mode which used to fail and see them passing. Reviewed By: pdillinger Differential Revision: D28542562 fbshipit-source-id: 4052d487635188fe68a2a9df4b03d97b23f96720 19 May 2021, 17:59:02 UTC
ce0fc71 Minor improvements in env_test (#8317) Summary: Fix typo in comments in env_test and add PermitUncheckedError() to two statuses. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8317 Reviewed By: jay-zhuang Differential Revision: D28525093 fbshipit-source-id: 7a1ed3e45b6f500b8d2ae19fa339c9368111e922 19 May 2021, 17:28:08 UTC
9d61a08 Sync ingested files only if reopen is supported by the FS (#8296) Summary: Some file systems (especially distributed FS) do not support reopening a file for writing. The ExternalSstFileIngestionJob calls ReopenWritableFile in order to sync the ingested file, which typically makes sense only on a local file system with a page cache (i.e Posix). So this change tries to sync the ingested file only if ReopenWritableFile doesn't return Status::NotSupported(). Tests: Add a new unit test in external_sst_file_basic_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/8296 Reviewed By: jay-zhuang Differential Revision: D28420865 Pulled By: anand1976 fbshipit-source-id: 380e7f5ff95324997f7a59864a9ac96ebbd0100c 19 May 2021, 02:33:55 UTC
60e5af8 Handle return code by io_uring_submit_and_wait() and io_uring_wait_cqe() (#8311) Summary: Right now return codes by io_uring_submit_and_wait() and io_uring_wait_cqe() are not handled. It is not the good practice. Although these two functions are not supposed to return non-0 values in normal exeuction, people suspect that they might return non-0 value when an interruption happens, and the code might cause hanging. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8311 Test Plan: Make sure at least normal test cases still pass. Reviewed By: anand1976 Differential Revision: D28500828 fbshipit-source-id: 8a76cea9cafbd041102e0b6a8eef9d0bfed7c211 18 May 2021, 23:09:14 UTC
6b0a22a Fix MultiGet with PinnableSlices and Merge for WBWI (#8299) Summary: The MultiGetFromBatchAndDB would fail if the PinnableSlice value being returned was pinned. This could happen if the value was retrieved from the DB (not memtable) or potentially if the values were reused (and a previous iteration returned a slice that was pinned). This change resets the pinnable value to clear it prior to attempting to use it, thereby eliminating the problem with the value already being pinned. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8299 Reviewed By: jay-zhuang Differential Revision: D28455426 Pulled By: mrambacher fbshipit-source-id: a34d7d983ec9b6bb4c8a2b4892f72858d43e6972 18 May 2021, 21:35:47 UTC
83d1a66 Expose CompressionOptions::parallel_threads through C API (#8302) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8302 Reviewed By: jay-zhuang Differential Revision: D28499262 Pulled By: ajkr fbshipit-source-id: 7b17b79af871d874dfca76db9bca0d640a6cd854 18 May 2021, 05:53:04 UTC
d83542c Make it possible to apply only a subrange of table property collectors (#8298) Summary: This patch does two things: 1) Introduces some aliases in order to eliminate/prevent long-winded type names w/r/t the internal table property collectors (see e.g. `std::vector<std::unique_ptr<IntTblPropCollectorFactory>>`). 2) Makes it possible to apply only a subrange of table property collectors during table building by turning `TableBuilderOptions::int_tbl_prop_collector_factories` from a pointer to a `vector` into a range (i.e. a pair of iterators). Rationale: I plan to introduce a BlobDB related table property collector, which should only be applied during table creation if blob storage is enabled at the moment (which can be changed dynamically). This change will make it possible to include/ exclude the BlobDB related collector as needed without having to introduce a second `vector` of collectors in `ColumnFamilyData` with pretty much the same contents. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8298 Test Plan: `make check` Reviewed By: jay-zhuang Differential Revision: D28430910 Pulled By: ltamasi fbshipit-source-id: a81d28f2c59495865300f43deb2257d2e6977c8e 18 May 2021, 01:28:39 UTC
0ed8cb6 Write file temperature information to manifest (#8284) Summary: As a part of tiered storage, writing tempeature information to manifest is needed so that after DB recovery, RocksDB still has the tiering information, to implement some further necessary functionalities. Also fix some issues in simulated hybrid FS. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8284 Test Plan: Add a new unit test to validate that the information is indeed written and read back. Reviewed By: zhichao-cao Differential Revision: D28335801 fbshipit-source-id: 56aeb2e6ea090be0200181dd968c8a7278037def 17 May 2021, 22:15:23 UTC
feb06e8 Initial support for secondary cache in LRUCache (#8271) Summary: Defined the abstract interface for a secondary cache in include/rocksdb/secondary_cache.h, and updated LRUCacheOptions to take a std::shared_ptr<SecondaryCache>. An item is initially inserted into the LRU (primary) cache. When it ages out and evicted from memory, its inserted into the secondary cache. On a LRU cache miss and successful lookup in the secondary cache, the item is promoted to the LRU cache. Only support synchronous lookup currently. The secondary cache would be used to implement a persistent (flash cache) or compressed cache. Tests: Results from cache_bench and db_bench don't show any regression due to these changes. cache_bench results before and after this change - Command ```./cache_bench -ops_per_thread=10000000 -threads=1``` Before ```Complete in 40.688 s; QPS = 245774``` ```Complete in 40.486 s; QPS = 246996``` ```Complete in 42.019 s; QPS = 237989``` After ```Complete in 40.672 s; QPS = 245869``` ```Complete in 44.622 s; QPS = 224107``` ```Complete in 42.445 s; QPS = 235599``` db_bench results before this change, and with this change + https://github.com/facebook/rocksdb/issues/8213 and https://github.com/facebook/rocksdb/issues/8191 - Commands ```./db_bench --benchmarks="fillseq,compact" -num=30000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/home/anand76/nvm_cache/db -partition_index_and_filters=true``` ```./db_bench -db=/home/anand76/nvm_cache/db -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=6 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -threads=16 -duration=300``` Before ``` DB path: [/home/anand76/nvm_cache/db] readrandom : 80.702 micros/op 198104 ops/sec; 54.4 MB/s (3708999 of 3708999 found) ``` ``` DB path: [/home/anand76/nvm_cache/db] readrandom : 87.124 micros/op 183625 ops/sec; 50.4 MB/s (3439999 of 3439999 found) ``` After ``` DB path: [/home/anand76/nvm_cache/db] readrandom : 77.653 micros/op 206025 ops/sec; 56.6 MB/s (3866999 of 3866999 found) ``` ``` DB path: [/home/anand76/nvm_cache/db] readrandom : 84.962 micros/op 188299 ops/sec; 51.7 MB/s (3535999 of 3535999 found) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8271 Reviewed By: zhichao-cao Differential Revision: D28357511 Pulled By: anand1976 fbshipit-source-id: d1cfa236f00e649a18c53328be10a8062a4b6da2 14 May 2021, 05:58:40 UTC
d15fbae Refactor Option obj address from char* to void* (#8295) Summary: And replace `reinterpret_cast` with `static_cast` or no cast. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8295 Test Plan: `make check` Reviewed By: mrambacher Differential Revision: D28420303 Pulled By: jay-zhuang fbshipit-source-id: 645be123a0df624dc2bea37cd54a35403fc494fa 13 May 2021, 21:29:42 UTC
d76c46e Deflake TransactionStressTest.ExpiredTransactionDataRace1 (#8258) Summary: We saw the `Commit()` fail with "Operation expired" so apparently the expiration time is too short. Increased the magnitude of the times in this test to make flakiness less likely. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8258 Reviewed By: jay-zhuang Differential Revision: D28177033 Pulled By: ajkr fbshipit-source-id: 0357acee6cc14c104b6ccd39231a683a606ab130 12 May 2021, 22:49:05 UTC
a79b46c Add De/Serialization for CompactionInput/Result (#8247) Summary: The functions will be used for remote compaction parameter input and result. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8247 Test Plan: `make check` Reviewed By: ajkr Differential Revision: D28104680 Pulled By: jay-zhuang fbshipit-source-id: c0a5178e6277125118384278efea2acbf90aa6cb 12 May 2021, 19:36:43 UTC
e9a0bc1 Fix cmake failed to build db_bench (#8289) Summary: And change the cmake build on macos with GFLAGS on to cover more cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8289 Reviewed By: zhichao-cao Differential Revision: D28372467 Pulled By: jay-zhuang fbshipit-source-id: ad7fbe523c3fb135ef5281adbaf2070ca5d0873d 12 May 2021, 18:39:01 UTC
a6e425d Fix a minor clang release build failure (#8290) Summary: Error message: ``` cache/clock_cache.cc:434:14: error: implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'uint32_t' (aka 'unsigned int') [-Werror,-Wshorten-64-to-32] *state = end_idx; ~ ^~~~~~~ ``` Make circleci to cover this case by install tbb. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8290 Test Plan: `USE_CLANG=1 make -j1 release` Reviewed By: akankshamahajan15 Differential Revision: D28374672 Pulled By: jay-zhuang fbshipit-source-id: e8c3ee46f2a008e8a599413292e5a4b5151365df 12 May 2021, 17:45:29 UTC
78a309b New Cache API for gathering statistics (#8225) Summary: Adds a new Cache::ApplyToAllEntries API that we expect to use (in follow-up PRs) for efficiently gathering block cache statistics. Notable features vs. old ApplyToAllCacheEntries: * Includes key and deleter (in addition to value and charge). We could have passed in a Handle but then more virtual function calls would be needed to get the "fields" of each entry. We expect to use the 'deleter' to identify the origin of entries, perhaps even more. * Heavily tuned to minimize latency impact on operating cache. It does this by iterating over small sections of each cache shard while cycling through the shards. * Supports tuning roughly how many entries to operate on for each lock acquire and release, to control the impact on the latency of other operations without excessive lock acquire & release. The right balance can depend on the cost of the callback. Good default seems to be around 256. * There should be no need to disable thread safety. (I would expect uncontended locks to be sufficiently fast.) I have enhanced cache_bench to validate this approach: * Reports a histogram of ns per operation, so we can look at the ditribution of times, not just throughput (average). * Can add a thread for simulated "gather stats" which calls ApplyToAllEntries at a specified interval. We also generate a histogram of time to run ApplyToAllEntries. To make the iteration over some entries of each shard work as cleanly as possible, even with resize between next set of entries, I have re-arranged which hash bits are used for sharding and which for indexing within a shard. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8225 Test Plan: A couple of unit tests are added, but primary validation is manual, as the primary risk is to performance. The primary validation is using cache_bench to ensure that neither the minor hashing changes nor the simulated stats gathering significantly impact QPS or latency distribution. Note that adding op latency histogram seriously impacts the benchmark QPS, so for a fair baseline, we need the cache_bench changes (except remove simulated stat gathering to make it compile). In short, we don't see any reproducible difference in ops/sec or op latency unless we are gathering stats nearly continuously. Test uses 10GB block cache with 8KB values to be somewhat realistic in the number of items to iterate over. Baseline typical output: ``` Complete in 92.017 s; Rough parallel ops/sec = 869401 Thread ops/sec = 54662 Operation latency (ns): Count: 80000000 Average: 11223.9494 StdDev: 29.61 Min: 0 Median: 7759.3973 Max: 9620500 Percentiles: P50: 7759.40 P75: 14190.73 P99: 46922.75 P99.9: 77509.84 P99.99: 217030.58 ------------------------------------------------------ [ 0, 1 ] 68 0.000% 0.000% ( 2900, 4400 ] 89 0.000% 0.000% ( 4400, 6600 ] 33630240 42.038% 42.038% ######## ( 6600, 9900 ] 18129842 22.662% 64.700% ##### ( 9900, 14000 ] 7877533 9.847% 74.547% ## ( 14000, 22000 ] 15193238 18.992% 93.539% #### ( 22000, 33000 ] 3037061 3.796% 97.335% # ( 33000, 50000 ] 1626316 2.033% 99.368% ( 50000, 75000 ] 421532 0.527% 99.895% ( 75000, 110000 ] 56910 0.071% 99.966% ( 110000, 170000 ] 16134 0.020% 99.986% ( 170000, 250000 ] 5166 0.006% 99.993% ( 250000, 380000 ] 3017 0.004% 99.996% ( 380000, 570000 ] 1337 0.002% 99.998% ( 570000, 860000 ] 805 0.001% 99.999% ( 860000, 1200000 ] 319 0.000% 100.000% ( 1200000, 1900000 ] 231 0.000% 100.000% ( 1900000, 2900000 ] 100 0.000% 100.000% ( 2900000, 4300000 ] 39 0.000% 100.000% ( 4300000, 6500000 ] 16 0.000% 100.000% ( 6500000, 9800000 ] 7 0.000% 100.000% ``` New, gather_stats=false. Median thread ops/sec of 5 runs: ``` Complete in 92.030 s; Rough parallel ops/sec = 869285 Thread ops/sec = 54458 Operation latency (ns): Count: 80000000 Average: 11298.1027 StdDev: 42.18 Min: 0 Median: 7722.0822 Max: 6398720 Percentiles: P50: 7722.08 P75: 14294.68 P99: 47522.95 P99.9: 85292.16 P99.99: 228077.78 ------------------------------------------------------ [ 0, 1 ] 109 0.000% 0.000% ( 2900, 4400 ] 793 0.001% 0.001% ( 4400, 6600 ] 34054563 42.568% 42.569% ######### ( 6600, 9900 ] 17482646 21.853% 64.423% #### ( 9900, 14000 ] 7908180 9.885% 74.308% ## ( 14000, 22000 ] 15032072 18.790% 93.098% #### ( 22000, 33000 ] 3237834 4.047% 97.145% # ( 33000, 50000 ] 1736882 2.171% 99.316% ( 50000, 75000 ] 446851 0.559% 99.875% ( 75000, 110000 ] 68251 0.085% 99.960% ( 110000, 170000 ] 18592 0.023% 99.983% ( 170000, 250000 ] 7200 0.009% 99.992% ( 250000, 380000 ] 3334 0.004% 99.997% ( 380000, 570000 ] 1393 0.002% 99.998% ( 570000, 860000 ] 700 0.001% 99.999% ( 860000, 1200000 ] 293 0.000% 100.000% ( 1200000, 1900000 ] 196 0.000% 100.000% ( 1900000, 2900000 ] 69 0.000% 100.000% ( 2900000, 4300000 ] 32 0.000% 100.000% ( 4300000, 6500000 ] 10 0.000% 100.000% ``` New, gather_stats=true, 1 second delay between scans. Scans take about 1 second here so it's spending about 50% time scanning. Still the effect on ops/sec and latency seems to be in the noise. Median thread ops/sec of 5 runs: ``` Complete in 91.890 s; Rough parallel ops/sec = 870608 Thread ops/sec = 54551 Operation latency (ns): Count: 80000000 Average: 11311.2629 StdDev: 45.28 Min: 0 Median: 7686.5458 Max: 10018340 Percentiles: P50: 7686.55 P75: 14481.95 P99: 47232.60 P99.9: 79230.18 P99.99: 232998.86 ------------------------------------------------------ [ 0, 1 ] 71 0.000% 0.000% ( 2900, 4400 ] 291 0.000% 0.000% ( 4400, 6600 ] 34492060 43.115% 43.116% ######### ( 6600, 9900 ] 16727328 20.909% 64.025% #### ( 9900, 14000 ] 7845828 9.807% 73.832% ## ( 14000, 22000 ] 15510654 19.388% 93.220% #### ( 22000, 33000 ] 3216533 4.021% 97.241% # ( 33000, 50000 ] 1680859 2.101% 99.342% ( 50000, 75000 ] 439059 0.549% 99.891% ( 75000, 110000 ] 60540 0.076% 99.967% ( 110000, 170000 ] 14649 0.018% 99.985% ( 170000, 250000 ] 5242 0.007% 99.991% ( 250000, 380000 ] 3260 0.004% 99.995% ( 380000, 570000 ] 1599 0.002% 99.997% ( 570000, 860000 ] 1043 0.001% 99.999% ( 860000, 1200000 ] 471 0.001% 99.999% ( 1200000, 1900000 ] 275 0.000% 100.000% ( 1900000, 2900000 ] 143 0.000% 100.000% ( 2900000, 4300000 ] 60 0.000% 100.000% ( 4300000, 6500000 ] 27 0.000% 100.000% ( 6500000, 9800000 ] 7 0.000% 100.000% ( 9800000, 14000000 ] 1 0.000% 100.000% Gather stats latency (us): Count: 46 Average: 980387.5870 StdDev: 60911.18 Min: 879155 Median: 1033777.7778 Max: 1261431 Percentiles: P50: 1033777.78 P75: 1120666.67 P99: 1261431.00 P99.9: 1261431.00 P99.99: 1261431.00 ------------------------------------------------------ ( 860000, 1200000 ] 45 97.826% 97.826% #################### ( 1200000, 1900000 ] 1 2.174% 100.000% Most recent cache entry stats: Number of entries: 1295133 Total charge: 9.88 GB Average key size: 23.4982 Average charge: 8.00 KB Unique deleters: 3 ``` Reviewed By: mrambacher Differential Revision: D28295742 Pulled By: pdillinger fbshipit-source-id: bbc4a552f91ba0fe10e5cc025c42cef5a81f2b95 11 May 2021, 23:17:10 UTC
78e8241 Added static methods for simple types to OptionTypeInfo (#8249) Summary: Added ParseType, SerializeType, and TypesAreEqual methods to OptionTypeInfo. These methods can be used for serialization and deserialization of basic types. Change the MutableCF/DB Options to use this format. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8249 Reviewed By: jay-zhuang Differential Revision: D28351190 Pulled By: mrambacher fbshipit-source-id: 72a78643b804f2f0bf59c32ffefa63346672ad16 11 May 2021, 23:15:47 UTC
9f2d255 Add ObjectRegistry to ConfigOptions (#8166) Summary: This change enables a couple of things: - Different ConfigOptions can have different registry/factory associated with it, thereby allowing things like a "Test" ConfigOptions versus a "Production" - The ObjectRegistry is created fewer times and can be re-used The ConfigOptions can also be initialized/constructed from a DBOptions, in which case it will grab some of its settings (Env, Logger) from the DBOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8166 Reviewed By: zhichao-cao Differential Revision: D27657952 Pulled By: mrambacher fbshipit-source-id: ae1d6200bb7ab127405cdeefaba43c7fe694dfdd 11 May 2021, 13:47:22 UTC
ff46374 Add Merge Operator support to WriteBatchWithIndex (#8135) Summary: The WBWI has two differing modes of operation dependent on the value of the constructor parameter `overwrite_key`. Currently, regardless of the parameter, neither mode performs as expected when using Merge. This PR remedies this by correctly invoking the appropriate Merge Operator before returning results from the WBWI. Examples of issues that exist which are solved by this PR: ## Example 1 with `overwrite_key=false` Currently, from an empty database, the following sequence: ``` Put('k1', 'v1') Merge('k1', 'v2') Get('k1') ``` Incorrectly yields `v2`, that is to say that the Merge behaves like a Put. ## Example 2 with o`verwrite_key=true` Currently, from an empty database, the following sequence: ``` Put('k1', 'v1') Merge('k1', 'v2') Get('k1') ``` Incorrectly yields `ERROR: kMergeInProgress`. ## Example 3 with `overwrite_key=false` Currently, with a database containing `('k1' -> 'v1')`, the following sequence: ``` Merge('k1', 'v2') GetFromBatchAndDB('k1') ``` Incorrectly yields `v1,v2` ## Example 4 with `overwrite_key=true` Currently, with a database containing `('k1' -> 'v1')`, the following sequence: ``` Merge('k1', 'v1') GetFromBatchAndDB('k1') ``` Incorrectly yields `ERROR: kMergeInProgress`. ## Example 5 with `overwrite_key=false` Currently, from an empty database, the following sequence: ``` Put('k1', 'v1') Merge('k1', 'v2') GetFromBatchAndDB('k1') ``` Incorrectly yields `v1,v2` ## Example 6 with `overwrite_key=true` Currently, from an empty database, `('k1' -> 'v1')`, the following sequence: ``` Put('k1', 'v1') Merge('k1', 'v2') GetFromBatchAndDB('k1') ``` Incorrectly yields `ERROR: kMergeInProgress`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8135 Reviewed By: pdillinger Differential Revision: D27657938 Pulled By: mrambacher fbshipit-source-id: 0fbda6bbc66bedeba96a84786d90141d776297df 10 May 2021, 19:50:25 UTC
f89a536 Change date format in HISTORY.md (#8278) Summary: Per previous discussion, change date format in HISTORY.md to follow ISO 8601. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8278 Reviewed By: jay-zhuang Differential Revision: D28294022 fbshipit-source-id: 563f29c56143519b4a871df82a17dd0a168a578c 07 May 2021, 23:16:30 UTC
a639c02 Allow applying `CompactionFilter` outside of compaction (#8243) Summary: From HISTORY.md release note: - Allow `CompactionFilter`s to apply in more table file creation scenarios such as flush and recovery. For compatibility, `CompactionFilter`s by default apply during compaction. Users can customize this behavior by overriding `CompactionFilterFactory::ShouldFilterTableFileCreation()`. - Removed unused structure `CompactionFilterContext` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8243 Test Plan: added unit tests Reviewed By: pdillinger Differential Revision: D28088089 Pulled By: ajkr fbshipit-source-id: 0799be7908e3b39fea09fc3f1ab00e13ad817fae 07 May 2021, 23:01:40 UTC
242ac6c Bump rexml from 3.2.4 to 3.2.5 in /docs (#8251) Summary: Bumps [rexml](https://github.com/ruby/rexml) from 3.2.4 to 3.2.5. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/ruby/rexml/blob/master/NEWS.md">rexml's changelog</a>.</em></p> <blockquote> <h2>3.2.5 - 2021-04-05 {#version-3-2-5}</h2> <h3>Improvements</h3> <ul> <li> <p>Add more validations to XPath parser.</p> </li> <li> <p><code>require &quot;rexml/document&quot;</code> by default. [GitHub#36][Patch by Koichi ITO]</p> </li> <li> <p>Don't add <code>#dcloe</code> method to core classes globally. [GitHub#37][Patch by Akira Matsuda]</p> </li> <li> <p>Add more documentations. [Patch by Burdette Lamar]</p> </li> <li> <p>Added <code>REXML::Elements#parent</code>. [GitHub#52][Patch by Burdette Lamar]</p> </li> </ul> <h3>Fixes</h3> <ul> <li> <p>Fixed a bug that <code>REXML::DocType#clone</code> doesn't copy external ID information.</p> </li> <li> <p>Fixed round-trip vulnerability bugs. See also: <a href="https://www.ruby-lang.org/en/news/2021/04/05/xml-round-trip-vulnerability-in-rexml-cve-2021-28965/">https://www.ruby-lang.org/en/news/2021/04/05/xml-round-trip-vulnerability-in-rexml-cve-2021-28965/</a> [HackerOne#1104077][CVE-2021-28965][Reported by Juho Nurminen]</p> </li> </ul> <h3>Thanks</h3> <ul> <li> <p>Koichi ITO</p> </li> <li> <p>Akira Matsuda</p> </li> <li> <p>Burdette Lamar</p> </li> <li> <p>Juho Nurminen</p> </li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/ruby/rexml/commit/a622645e980ea5b91ad7b4d6fec32d113f15df88"><code>a622645</code></a> Add 3.2.5 entry</li> <li><a href="https://github.com/ruby/rexml/commit/3c137eb119550874b2b3e27d12b733ca67033377"><code>3c137eb</code></a> Fix a parser bug that some data may be ignored before DOCTYPE</li> <li><a href="https://github.com/ruby/rexml/commit/9b311e59ae05749e082eb6bbefa1cb620d1a786e"><code>9b311e5</code></a> Fix a bug that invalid document declaration may be accepted</li> <li><a href="https://github.com/ruby/rexml/commit/f9d88e4948b4a43294c25dc0edb16815bd9d8618"><code>f9d88e4</code></a> Fix a bug that invalid document declaration may be generated</li> <li><a href="https://github.com/ruby/rexml/commit/f7bab8937513b1403cea5aff874cbf32fd5e8551"><code>f7bab89</code></a> Fix a bug that invalid element end may be accepted</li> <li><a href="https://github.com/ruby/rexml/commit/6a250d2cd1194c2be72becbdd9c3e770aa16e752"><code>6a250d2</code></a> Fix a bug that invalid element start may be accepted</li> <li><a href="https://github.com/ruby/rexml/commit/2fe62e29094d95921d7e19abbd2e26b23d78dc5b"><code>2fe62e2</code></a> Fix a bug that invalid notation declaration may be accepted</li> <li><a href="https://github.com/ruby/rexml/commit/a659c63e37414506dfb0d4655e031bb7a2e73fc8"><code>a659c63</code></a> Fix a bug that invalid notation declaration may be generated</li> <li><a href="https://github.com/ruby/rexml/commit/790dd113ce693ce831cbbc53f2f990a317643f75"><code>790dd11</code></a> Use ruby/setup-ruby (<a href="https://github-redirect.dependabot.com/ruby/rexml/issues/66">https://github.com/facebook/rocksdb/issues/66</a>)</li> <li><a href="https://github.com/ruby/rexml/commit/eda1b2007dd8751f381bf741f16c9e33c5d3e52a"><code>eda1b20</code></a> Clean up and enhance high-level RDoc (<a href="https://github-redirect.dependabot.com/ruby/rexml/issues/65">https://github.com/facebook/rocksdb/issues/65</a>)</li> <li>Additional commits viewable in <a href="https://github.com/ruby/rexml/compare/v3.2.4...v3.2.5">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=rexml&package-manager=bundler&previous-version=3.2.4&new-version=3.2.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `dependabot rebase` will rebase this PR - `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `dependabot merge` will merge this PR after your CI passes on it - `dependabot squash and merge` will squash and merge this PR after your CI passes on it - `dependabot cancel merge` will cancel a previously requested merge and block automerging - `dependabot reopen` will reopen this PR if it is closed - `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts). </details> Pull Request resolved: https://github.com/facebook/rocksdb/pull/8251 Reviewed By: jay-zhuang Differential Revision: D28163644 Pulled By: ajkr fbshipit-source-id: 7c0e8bf30c70f53db691076b396c0b748fa9380d 07 May 2021, 23:00:06 UTC
c26b75b Deprecate obsolete "backupable db" from public APIs (#8274) Summary: An early design of BackupEngine used stackable DB, so I guess a DB had to opt-in to being backupable. Unfortunately the naming of that obsolete design still infects our public API and implementation. This change fixes the public API, with a deprecated backward-compatibility header. `BackupableDBOptions` is renamed to `BackupEngineOptions` (copy-replace in the public header) and backup_engine.h replaces backupable_db.h (present for backward compatibility). The only other change in backupable_db.h -> backup_engine.h is cleaning up headers. Later changes will fix the internal implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8274 Test Plan: The internal implementation of BackupEngine uses the name BackupEngineOptions, while the unit tests use the old name BackupableDBOptions. This gives me confidence that both still work. Reviewed By: mrambacher Differential Revision: D28259471 Pulled By: pdillinger fbshipit-source-id: a25dbe327b9772143488e7bb0ec7139ee42d0613 07 May 2021, 20:53:15 UTC
a4919d6 Cap automatic arena block size to 1 MB (#7907) Summary: Larger arena block size does provide the benefit of reducing allocation overhead, however it may cause other troubles. For example, allocator is more likely not to allocate them to physical memory and trigger page fault. Weighing the risk, we cap the arena block size to 1MB. Users can always use a larger value if they want. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7907 Test Plan: Run all existing tests Reviewed By: pdillinger Differential Revision: D26135269 fbshipit-source-id: b7f55afd03e6ee1d8715f90fa11b6c33944e9ea8 07 May 2021, 20:15:34 UTC
ecd63b9 Revert accidental enabling broken ClockCache in stress test (#8277) Summary: From https://github.com/facebook/rocksdb/issues/8261 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8277 Test Plan: briefly make blackbox_crash_test Reviewed By: zhichao-cao Differential Revision: D28270648 Pulled By: pdillinger fbshipit-source-id: 9bfd46c5a1a449165f6597bddb17af910331773f 06 May 2021, 23:31:51 UTC
b71b459 Permit stdout "fail"/"error" in whitebox crash test (#8272) Summary: In https://github.com/facebook/rocksdb/issues/8268, the `db_stress` stdout began containing both the strings "fail" and "error" (case-insensitive). The whitebox crash test failed upon seeing either of those strings. I checked that all other occurrences of "fail" and "error" (case-insensitive) that `db_stress` produces are printed to `stderr`. So this PR separates the handling of `db_stress`'s stdout and stderr, and only fails when one those bad strings are found in stderr. The downside of this PR is `db_stress`'s original interleaving of stdout/stderr is not preserved in `db_crashtest.py`'s output. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8272 Test Plan: run it; see it succeeds for several runs until encountering a real error ``` $ python3 tools/db_crashtest.py whitebox --simple --random_kill_odd=8887 --max_key=1000000 --value_size_mult=33 ... db_stress: cache/clock_cache.cc:483: bool rocksdb::{anonymous}::ClockCacheShard::Unref(rocksdb::{anonymous}::CacheHandle*, bool, rocksdb::{anonymous}::CleanupContext*): Assertion `CountRefs(flags) > 0' failed. TEST FAILED. Output has 'fail'!!! ``` Reviewed By: zhichao-cao Differential Revision: D28239233 Pulled By: ajkr fbshipit-source-id: 3b8602a0d570466a7e2c81bb9c49468f7716091e 06 May 2021, 00:54:13 UTC
7f3a0f5 db_stress: wait for compaction to finish after open with failure injection (#8270) Summary: When injecting in DB open, error can happen in background threads, causing DB open succeed, but DB is soon made read-only and subsequence writes will fail, which is not expected. To prevent it from happening, wait for compaction to finish before serving the traffic. If there is a failure, reopen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8270 Test Plan: Run the test. Reviewed By: ajkr Differential Revision: D28230537 fbshipit-source-id: e2e97888904f9b9bb50c35ccf95b88c2319ef5c3 05 May 2021, 23:41:45 UTC
e19908c Refactor kill point (#8241) Summary: Refactor kill point to one single class, rather than several extern variables. The intention was to drop unflushed data before killing to simulate some job, and I tried to a pointer to fault ingestion fs to the killing class, but it ended up with harder than I thought. Perhaps we'll need to do this in another way. But I thought the refactoring itself is good so I send it out. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8241 Test Plan: make release and run crash test for a while. Reviewed By: anand1976 Differential Revision: D28078486 fbshipit-source-id: f9182c1455f52e6851c13f88a21bade63bcec45f 05 May 2021, 22:50:29 UTC
8948dc8 Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) Summary: The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions. This change cleans that up by introducing an ImmutableOptions struct. Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form). Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR. All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes. Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262 Reviewed By: pdillinger Differential Revision: D28226540 Pulled By: mrambacher fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf 05 May 2021, 21:00:17 UTC
0f42e50 Fix `GetLiveFiles()` returning OPTIONS-000000 (#8268) Summary: See release note in HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8268 Test Plan: unit test repro Reviewed By: siying Differential Revision: D28227901 Pulled By: ajkr fbshipit-source-id: faf61d13b9e43a761e3d5dcf8203923126b51339 05 May 2021, 19:54:46 UTC
3b981ea Fix use-after-free threading bug in ClockCache (#8261) Summary: In testing for https://github.com/facebook/rocksdb/issues/8225 I found cache_bench would crash with -use_clock_cache, as well as db_bench -use_clock_cache, but not single-threaded. Smaller cache size hits failure much faster. ASAN reported the failuer as calling malloc_usable_size on the `key` pointer of a ClockCache handle after it was reportedly freed. On detailed inspection I found this bad sequence of operations for a cache entry: state=InCache=1,refs=1 [thread 1] Start ClockCacheShard::Unref (from Release, no mutex) [thread 1] Decrement ref count state=InCache=1,refs=0 [thread 1] Suspend before CalcTotalCharge (no mutex) [thread 2] Start UnsetInCache (from Insert, mutex held) [thread 2] clear InCache bit state=InCache=0,refs=0 [thread 2] Calls RecycleHandle (based on pre-updated state) [thread 2] Returns to Insert which calls Cleanup which deletes `key` [thread 1] Resume ClockCacheShard::Unref [thread 1] Read `key` in CalcTotalCharge To fix this, I've added a field to the handle to store the metadata charge so that we can efficiently remember everything we need from the handle in Unref. We must not read from the handle again if we decrement the count to zero with InCache=1, which means we don't own the entry and someone else could eject/overwrite it immediately. Note before this change, on amd64 sizeof(Handle) == 56 even though there are only 48 bytes of data. Grouping together the uint32_t fields would cut it down to 48, but I've added another uint32_t, which takes it back up to 56. Not a big deal. Also fixed DisownData to cooperate with ASAN as in LRUCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8261 Test Plan: Manual + adding use_clock_cache to db_crashtest.py Base performance ./cache_bench -use_clock_cache Complete in 17.060 s; QPS = 2458513 New performance ./cache_bench -use_clock_cache Complete in 17.052 s; QPS = 2459695 Any difference is easily buried in small noise. Crash test shows still more bug(s) in ClockCache, so I'm expecting to disable ClockCache from production code in a follow-up PR (if we can't find and fix the bug(s)) Reviewed By: mrambacher Differential Revision: D28207358 Pulled By: pdillinger fbshipit-source-id: aa7a9322afc6f18f30e462c75dbbe4a1206eb294 05 May 2021, 05:18:00 UTC
c70bae1 Fix ConcurrentTaskLimiter token release for shutdown (#8253) Summary: Previously the shutdown process did not properly wait for all `compaction_thread_limiter` tokens to be released before proceeding to delete the DB's C++ objects. When this happened, we saw tests like "DBCompactionTest.CompactionLimiter" flake with the following error: ``` virtual rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl(): Assertion `outstanding_tasks_ == 0' failed. ``` There is a case where a token can still be alive even after the shutdown process has waited for BG work to complete. In particular, this happens because the shutdown process only waits for flush/compaction scheduled/unscheduled counters to all reach zero. These counters are decremented in `BackgroundCallCompaction()` functions. However, tokens are released in `BGWork*Compaction()` functions, which actually wrap the `BackgroundCallCompaction()` function. A simple sleep could repro the race condition: ``` $ diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc index 806bc548a..ba59efa89 100644 --- a/db/db_impl/db_impl_compaction_flush.cc +++ b/db/db_impl/db_impl_compaction_flush.cc @@ -2442,6 +2442,7 @@ void DBImpl::BGWorkCompaction(void* arg) { static_cast<PrepickedCompaction*>(ca.prepicked_compaction); static_cast_with_check<DBImpl>(ca.db)->BackgroundCallCompaction( prepicked_compaction, Env::Priority::LOW); + sleep(1); delete prepicked_compaction; } $ ./db_compaction_test --gtest_filter=DBCompactionTest.CompactionLimiter db_compaction_test: util/concurrent_task_limiter_impl.cc:24: virtual rocksdb::ConcurrentTaskLimiterImpl::~ConcurrentTaskLimiterImpl(): Assertion `outstanding_tasks_ == 0' failed. Received signal 6 (Aborted) #0 /usr/local/fbcode/platform007/lib/libc.so.6(gsignal+0xcf) [0x7f02673c30ff] ?? ??:0 https://github.com/facebook/rocksdb/issues/1 /usr/local/fbcode/platform007/lib/libc.so.6(abort+0x134) [0x7f02673ac934] ?? ??:0 ... ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8253 Test Plan: sleeps to expose race conditions Reviewed By: akankshamahajan15 Differential Revision: D28168064 Pulled By: ajkr fbshipit-source-id: 9e5167c74398d323e7975980c5cc00f450631160 05 May 2021, 00:27:24 UTC
c2a3424 Deflake DBTest.L0L1L2AndUpHitCounter (#8259) Summary: Previously we saw flakes on platforms like arm on CircleCI, such as the following: ``` Note: Google Test filter = DBTest.L0L1L2AndUpHitCounter [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DBTest [ RUN ] DBTest.L0L1L2AndUpHitCounter db/db_test.cc:5345: Failure Expected: (TestGetTickerCount(options, GET_HIT_L0)) > (100), actual: 30 vs 100 [ FAILED ] DBTest.L0L1L2AndUpHitCounter (150 ms) [----------] 1 test from DBTest (150 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (150 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] DBTest.L0L1L2AndUpHitCounter ``` The test was totally non-deterministic, e.g., flush/compaction timing would affect how many files on each level. Furthermore, it depended heavily on platform-specific details, e.g., by having a 32KB memtable, it could become full with a very different number of entries depending on the platform. This PR rewrites the test to build a deterministic LSM with one file per level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8259 Reviewed By: mrambacher Differential Revision: D28178100 Pulled By: ajkr fbshipit-source-id: 0a03b26e8d23c29d8297c1bccb1b115dce33bdcd 04 May 2021, 18:02:59 UTC
8a92564 Update CircleCI MacOS Xcode version to 11.3.0 (#8256) Summary: To fix CircleCI pyenv installation failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8256 Reviewed By: ajkr Differential Revision: D28191772 Pulled By: jay-zhuang fbshipit-source-id: 2bbb1d5ded473e510c11c8ed27884c4ad073973f 04 May 2021, 17:34:31 UTC
c3ff14e Hint temperature of bottommost level files to FileSystem (#8222) Summary: As the first part of the effort of having placing different files on different storage types, this change introduces several things: (1) An experimental interface in FileSystem that specify temperature to a new file created. (2) A test FileSystemWrapper, SimulatedHybridFileSystem, that simulates HDD for a file of "warm" temperature. (3) A simple experimental feature ColumnFamilyOptions.bottommost_temperature. RocksDB would pass this value to FileSystem when creating any bottommost file. (4) A db_bench parameter that applies the (2) and (3) to db_bench. The motivation of the change is to introduce minimal changes that allow us to evolve tiered storage development. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8222 Test Plan: ./db_bench --benchmarks=fillrandom --write_buffer_size=2000000 -max_bytes_for_level_base=20000000 -level_compaction_dynamic_level_bytes --reads=100 -compaction_readahead_size=20000000 --reads=100000 -num=10000000 followed by ./db_bench --benchmarks=readrandom,stats --write_buffer_size=2000000 -max_bytes_for_level_base=20000000 -simulate_hybrid_fs_file=/tmp/warm_file_list -level_compaction_dynamic_level_bytes -compaction_readahead_size=20000000 --reads=500 --threads=16 -use_existing_db --num=10000000 and see results as expected. Reviewed By: ajkr Differential Revision: D28003028 fbshipit-source-id: 4724896d5205730227ba2f17c3fecb11261744ce 03 May 2021, 20:34:04 UTC
d2ca04e Add more LSM info to FilterBuildingContext (#8246) Summary: Add `num_levels`, `is_bottommost`, and table file creation `reason` to `FilterBuildingContext`, in anticipation of more powerful Bloom-like filter support. To support this, added `is_bottommost` and `reason` to `TableBuilderOptions`, which allowed removing `reason` parameter from `rocksdb::BuildTable`. I attempted to remove `skip_filters` from `TableBuilderOptions`, because filter construction decisions should arise from options, not one-off parameters. I could not completely remove it because the public API for SstFileWriter takes a `skip_filters` parameter, and translating this into an option change would mean awkwardly replacing the table_factory if it is BlockBasedTableFactory with new filter_policy=nullptr option. I marked this public skip_filters option as deprecated because of this oddity. (skip_filters on the read side probably makes sense.) At least `skip_filters` is now largely hidden for users of `TableBuilderOptions` and is no longer used for implementing the optimize_filters_for_hits option. Bringing the logic for that option closer to handling of FilterBuildingContext makes it more obvious that hese two are using the same notion of "bottommost." (Planned: configuration options for Bloom-like filters that generalize `optimize_filters_for_hits`) Recommended follow-up: Try to get away from "bottommost level" naming of things, which is inaccurate (see VersionStorageInfo::RangeMightExistAfterSortedRun), and move to "bottommost run" or just "bottommost." Pull Request resolved: https://github.com/facebook/rocksdb/pull/8246 Test Plan: extended an existing unit test to exercise and check various filter building contexts. Also, existing tests for optimize_filters_for_hits validate some of the "bottommost" handling, which is now closely connected to FilterBuildingContext::is_bottommost through TableBuilderOptions::is_bottommost Reviewed By: mrambacher Differential Revision: D28099346 Pulled By: pdillinger fbshipit-source-id: 2c1072e29c24d4ac404c761a7b7663292372600a 30 April 2021, 20:50:13 UTC
85becd9 Refactor: use TableBuilderOptions to reduce parameter lists (#8240) Summary: Greatly reduced the not-quite-copy-paste giant parameter lists of rocksdb::NewTableBuilder, rocksdb::BuildTable, BlockBasedTableBuilder::Rep ctor, and BlockBasedTableBuilder ctor. Moved weird separate parameter `uint32_t column_family_id` of TableFactory::NewTableBuilder into TableBuilderOptions. Re-ordered parameters to TableBuilderOptions ctor, so that `uint64_t target_file_size` is not randomly placed between uint64_t timestamps (was easy to mix up). Replaced a couple of fields of BlockBasedTableBuilder::Rep with a FilterBuildingContext. The motivation for this change is making it easier to pass along more data into new fields in FilterBuildingContext (follow-up PR). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8240 Test Plan: ASAN make check Reviewed By: mrambacher Differential Revision: D28075891 Pulled By: pdillinger fbshipit-source-id: fddb3dbb8260a0e8bdcbb51b877ebabf9a690d4f 29 April 2021, 14:00:50 UTC
a0e0fec Improve BlockPrefetcher to prefetch only for sequential scans (#7394) Summary: BlockPrefetcher is used by iterators to prefetch data if they anticipate more data to be used in future and this is valid for forward sequential scans. But BlockPrefetcher tracks only num_file_reads_ and not if reads are sequential. This presents problem for MultiGet with large number of keys when it reseeks index iterator and data block. FilePrefetchBuffer can end up doing large readahead for reseeks as readahead size increases exponentially once readahead is enabled. Same issue is with BlockBasedTableIterator. Add previous length and offset read as well in BlockPrefetcher (creates FilePrefetchBuffer) and FilePrefetchBuffer (does prefetching of data) to determine if reads are sequential and then prefetch. Update the last block read after cache hit to take reads from cache also in account. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7394 Test Plan: Add new unit test case Reviewed By: anand1976 Differential Revision: D23737617 Pulled By: akankshamahajan15 fbshipit-source-id: 8e6917c25ed87b285ee495d1b68dc623d71205a3 28 April 2021, 19:53:46 UTC
0db4cde Fix a memory leak in c_test (#8237) Summary: Don't call ```rocksdb_cache_disown_data()``` as it causes the memory allocated for ```shards_``` to be leaked. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8237 Reviewed By: jay-zhuang Differential Revision: D28039061 Pulled By: anand1976 fbshipit-source-id: c3464efe2c006b93b4be87030116a12a124598c4 28 April 2021, 19:29:33 UTC
8fe33a0 Change CircleCI Windows to previous known good image (#8220) Summary: This is to try to resolve the VS2015 install failure in CircleCI Windows builds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8220 Reviewed By: jay-zhuang Differential Revision: D28061834 Pulled By: anand1976 fbshipit-source-id: b2663eb60babee603669a2c2cb55f182df1cc7b1 28 April 2021, 18:30:30 UTC
cde69a7 db_stress to add --open_metadata_write_fault_one_in (#8235) Summary: DB Stress to add --open_metadata_write_fault_one_in which would randomly fail in some file metadata modification operations during DB Open, including file creation, close, renaming and directory sync. Some operations can fail before and after the operations take place. If DB open fails, db_stress would retry without the failure ingestion, and DB is expected to open successfully. This option is enabled in crash test in half of the time. Some follow up changes would allow write failures in open time, and ingesting those failures in non-DB open cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8235 Test Plan: Run stress tests for a while and see failures got triggered. This can reproduce the bug fixed by https://github.com/facebook/rocksdb/pull/8192 and a similar one that fails when fsyncing parent directory. Reviewed By: anand1976 Differential Revision: D28010944 fbshipit-source-id: 36a96da4dc3633e5f7680cef3ea0a900fcdb5558 28 April 2021, 17:58:05 UTC
3949731 Add WAL flush API to C client (#8226) Summary: The C client is missing the`manual_wal_flush` option and the `flush_wal` API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8226 Reviewed By: ajkr Differential Revision: D28000869 Pulled By: jay-zhuang fbshipit-source-id: ed44937e7e7e75bc0dfa870a14147fbeef0c38f8 27 April 2021, 21:56:23 UTC
65abb0c Add 6.18, 6.19 and 6.20 to check_format_compatible.sh (#8236) Summary: Add 6.18, 6.19 and 6.20 to check_format_compatible.sh Pull Request resolved: https://github.com/facebook/rocksdb/pull/8236 Test Plan: ./tools/check_format_compatible.sh (tested without 2.7.fb as it was failing as mentioned in the script) Reviewed By: mrambacher Differential Revision: D28019160 Pulled By: akankshamahajan15 fbshipit-source-id: b59a7c5c14cb4c115926e9ae7c74ea586b22c9ed 27 April 2021, 17:24:27 UTC
13c655a New C API to expose NewCompactOnDeletionCollectorFactory (#8233) Summary: New C API rocksdb_options_add_compact_on_deletion_collector_factory to expose NewCompactOnDeletionCollectorFactory Pull Request resolved: https://github.com/facebook/rocksdb/pull/8233 Reviewed By: mrambacher Differential Revision: D28018381 Pulled By: anand1976 fbshipit-source-id: 674c9ed902c91ff0d9f09e7a60c5f37b907604c6 27 April 2021, 17:14:04 UTC
0ca6d62 Rename variables in ImmutableCFOptions to avoid conflicts with ImmutableDBOptions (#8227) Summary: Renaming ImmutableCFOptions::info_log and statistics to logger and stats. This is stage 2 in creating an ImmutableOptions class. It is necessary because the names match those in ImmutableOptions and have different types. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8227 Reviewed By: jay-zhuang Differential Revision: D28000967 Pulled By: mrambacher fbshipit-source-id: 3bf2aa04e8f1e8724d825b7deacf41080c14420b 26 April 2021, 19:43:45 UTC
c2c7d5e Fix cast-function-type warning (#8230) Summary: Fixing cast-function-type which is appears during the following build: ```bash cmake .. -DFAIL_ON_WARNINGS=ON -DCMAKE_C_COMPILER=x86_64-w64-mingw32-gcc -DCMAKE_CXX_COMPILER=x86_64-w64-mingw32-g++ -DCMAKE_SYSTEM_NAME=Windows make rocksdb ``` Here is the log: ``` /home/leshiy/Work/rocksdb/port/win/env_win.cc: In constructor ‘rocksdb::port::WinClock::WinClock()’: /home/leshiy/Work/rocksdb/port/win/env_win.cc:92:9: error: cast between incompatible function types from ‘FARPROC’ {aka ‘long long int (*)()’} to ‘rocksdb::port::WinClock::FnGetSystemTimePreciseAsFileTime’ {aka ‘void (*)(_FILETIME*)’} [-Werror=cast-function-type] 92 | (FnGetSystemTimePreciseAsFileTime)GetProcAddress( | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 93 | module, "GetSystemTimePreciseAsFileTime"); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1plus: all warnings being treated as errors make[2]: *** [CMakeFiles/rocksdb.dir/build.make:4337: CMakeFiles/rocksdb.dir/port/win/env_win.cc.obj] Error 1 make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/rocksdb.dir/all] Error 2 make: *** [Makefile:91: all] Error 2 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8230 Reviewed By: jay-zhuang Differential Revision: D28000215 Pulled By: mrambacher fbshipit-source-id: 874782cf48f70470e3fbd9097585bf42e810ca61 26 April 2021, 17:13:55 UTC
2760c2a WBWI Internal Move implementation from .h into .cpp (#8229) Summary: Moves some of the structural refactoring from https://github.com/facebook/rocksdb/pull/8135 into this PR. This just cleans up the code by moving implementation out of the .h file and into the .cc file. Should be considered for merge before both https://github.com/facebook/rocksdb/pull/7214 and https://github.com/facebook/rocksdb/pull/8135 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8229 Reviewed By: jay-zhuang Differential Revision: D27999669 Pulled By: mrambacher fbshipit-source-id: 6eccecbf1f11bb9f5a173e86d1e7bc448bc96071 26 April 2021, 16:48:22 UTC
69c9868 Fix javadoc for keyMayExist (#8232) Summary: Closes https://github.com/facebook/rocksdb/issues/6985 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8232 Reviewed By: jay-zhuang Differential Revision: D27999779 Pulled By: mrambacher fbshipit-source-id: a37c88d93bde2692b8be9e46e673dda7bea701b2 26 April 2021, 15:34:10 UTC
6bab3a3 Move RegisterOptions into the Configurable API (#8223) Summary: As previously coded, a Configurable extension would need access to code not in the public API. This change moves RegisterOptions into the Configurable class and therefore available to public extensions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8223 Reviewed By: anand1976 Differential Revision: D27960188 Pulled By: mrambacher fbshipit-source-id: ac88b19397183df633902def5b5701b9b65fbf40 26 April 2021, 10:13:24 UTC
cc1c3ee Eliminate double-buffering of keys in block_based_table_builder (#8219) Summary: The block_based_table_builder buffers some blocks in memory to construct a good compression dictionary. Before this commit, the keys from each block were buffered separately for convenience. However, the buffered block data implicitly contains all keys. This commit eliminates the redundant key buffers and reduces memory usage. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8219 Reviewed By: ajkr Differential Revision: D27945851 Pulled By: saketh-are fbshipit-source-id: caf3cac1217201e080a1e24b542bedf20973afee 23 April 2021, 19:45:02 UTC
d65d7d6 Expose JemallocNodumpAllocator to C API (#8178) Summary: Add new C APIs to create the JemallocNodumpAllocator and set it on a Cache object. `make test` passes with and without `DISABLE_JEMALLOC=1`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8178 Reviewed By: jay-zhuang Differential Revision: D27944631 Pulled By: ajkr fbshipit-source-id: 2531729aa285a8985c58f22f093c4d53029c4a7b 23 April 2021, 05:22:34 UTC
01e460d Make types of Immutable/Mutable Options fields match that of the underlying Option (#8176) Summary: This PR is a first step at attempting to clean up some of the Mutable/Immutable Options code. With this change, a DBOption and a ColumnFamilyOption can be reconstructed from their Mutable and Immutable equivalents, respectively. readrandom tests do not show any performance degradation versus master (though both are slightly slower than the current 6.19 release). There are still fields in the ImmutableCFOptions that are not CF options but DB options. Eventually, I would like to move those into an ImmutableOptions (= ImmutableDBOptions+ImmutableCFOptions). But that will be part of a future PR to minimize changes and disruptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8176 Reviewed By: pdillinger Differential Revision: D27954339 Pulled By: mrambacher fbshipit-source-id: ec6b805ba9afe6e094bffdbd76246c2d99aa9fad 23 April 2021, 03:43:54 UTC
f0fca2b Add internal compaction API for Secondary instance (#8171) Summary: Add compaction API for secondary instance, which compact the files to a secondary DB path without installing to the LSM tree. The API will be used to remote compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8171 Test Plan: `make check` Reviewed By: ajkr Differential Revision: D27694545 Pulled By: jay-zhuang fbshipit-source-id: 8ff3ec1bffdb2e1becee994918850c8902caf731 22 April 2021, 20:02:28 UTC
e85d8a6 Add ZenFS to plugin list (#8218) Summary: Add ZenFS, a file system for zoned block devices, to PLUGINS.md Pull Request resolved: https://github.com/facebook/rocksdb/pull/8218 Reviewed By: jay-zhuang Differential Revision: D27944376 Pulled By: ajkr fbshipit-source-id: c9ea2e9814001ccd7c56d7ef4d38e20dfeb48d1e 22 April 2021, 18:12:40 UTC
09a9ec3 Fix the false positive alert of CF consistency check in WAL recovery (#8207) Summary: In current RocksDB, in recover the information form WAL, we do the consistency check for each column family when one WAL file is corrupted and PointInTimeRecovery is set. However, it will report a false positive alert on "SST file is ahead of WALs" when one of the CF current log number is greater than the corrupted WAL number (CF contains the data beyond the corrupted WAl) due to a new column family creation during flush. In this case, a new WAL is created (it is empty) during a flush. Also, due to some reason (e.g., storage issue or crash happens before SyncCloseLog is called), the old WAL is corrupted. The new CF has no data, therefore, it does not have the consistency issue. Fix: when checking cfd->GetLogNumber() > corrupted_wal_number also check cfd->GetLiveSstFilesSize() > 0. So the CFs with no SST file data will skip the check here. Note potential ignored inconsistency caused due to fix: empty CF can also be caused by write+delete. In this case, after flush, there is no SST files being generated. However, this CF still have the log in the WAL. When the WAL is corrupted, the DB might be inconsistent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8207 Test Plan: added unit test, make crash_test Reviewed By: riversand963 Differential Revision: D27898839 Pulled By: zhichao-cao fbshipit-source-id: 931fc2d8b92dd00b4169bf84b94e712fd688a83e 22 April 2021, 17:28:37 UTC
47b424f Add check to cmake to see if we need to link against -latomic (#8183) Summary: For some compilers/environments (e.g. Clang, riscv64), we need to link against -latomic. Check if this is a requirement and add the library to the third-party libs if it is. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8183 Reviewed By: pdillinger Differential Revision: D27773564 Pulled By: mrambacher fbshipit-source-id: 68e15d823144f83fb02221c7bf5b1e43323419bf 22 April 2021, 15:29:08 UTC
3143527 Ignore comparator name mismatch in ldb manifest dump (#8216) Summary: RocksDB allows user-specified custom comparators which may not be known to `ldb`, a built-in tool for checking/mutating the database. Therefore, column family comparator names mismatch encountered during manifest dump should not prevent the dumping from proceeding. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8216 Test Plan: ``` make check ``` Also manually do the following ``` KEEP_DB=1 ./db_with_timestamp_basic_test ./ldb --db=<db> manifest_dump --verbose ``` The ldb should succeed and print something like: ``` ... --------------- Column family "default" (ID 0) -------------- log number: 6 comparator: <TestComparator>, but the comparator object is not available. ... ``` Reviewed By: ltamasi Differential Revision: D27927581 Pulled By: riversand963 fbshipit-source-id: f610b2c842187d17f575362070209ee6b74ec6d4 22 April 2021, 03:43:10 UTC
4985cea Add comment to DisableManualCompaction() (#8186) Summary: Add comment to DisableManualCompaction() which was missing. Also explictly return from DBImpl::CompactRange() to avoid memtable flush when manual compaction is disabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8186 Test Plan: Run existing unit tests. Reviewed By: jay-zhuang Differential Revision: D27744517 fbshipit-source-id: 449548a48905903b888dc9612bd17480f6596a71 21 April 2021, 22:23:46 UTC
596e900 Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) Summary: When WriteBufferManager is shared across DBs and column families to maintain memory usage under a limit, OOMs have been observed when flush cannot finish but writes continuously insert to memtables. In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write, this change will stall incoming writers until flush is completed and memory_usage drops. Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_ (memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true. DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall() (which adds dummy stall object to the writer's queue). Then DB is blocked on a state State::Blocked (current write doesn't go through). WBStallInterface object maintained by every DB instance is added to the queue of WriteBufferManager. If multiple DBs tries to write during this stall, they will also be blocked when check WriteBufferManager::ShouldStall() returns true. End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush to complete and avoid continous stalling if memory usage remains close to buffer_size. WriterBufferManager::EndWriteStall() is called, which removes all instances from its queue and signal them to continue. Their state is changed to State::Running and they are unblocked. DBImpl then signal all incoming writers of that DB to continue by calling write_thread_.EndWriteStall() (which removes dummy stall object from the queue). DB instance creates WBMStallInterface which is an interface to block and signal DBs during stall. When DB needs to be blocked or signalled by WriteBufferManager, state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7898 Test Plan: Added a new test db/db_write_buffer_manager_test.cc Reviewed By: anand1976 Differential Revision: D26093227 Pulled By: akankshamahajan15 fbshipit-source-id: 2bbd982a3fb7033f6de6153aa92a221249861aae 21 April 2021, 20:54:02 UTC
95f6add Revert Ribbon starting level support from #8198 (#8212) Summary: This partially reverts commit 10196d7edc2fc5c03553c76acaf1337b5c7c1718. The problem with this change is because of important filter use cases: FIFO compaction and SST writer. FIFO "compaction" always uses level 0 so would only use Ribbon filters if specifically including level 0 for the Ribbon filter policy. SST writer sets level_at_creation=-1 to indicate unknown level, and this would be treated the same as level 0 unless fixed. We are keeping the part about committing to permanent schema, which is only changes to API comments and HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8212 Test Plan: CI Reviewed By: jay-zhuang Differential Revision: D27896468 Pulled By: pdillinger fbshipit-source-id: 50a775f7cba5d64fb729d9b982e355864020596e 21 April 2021, 02:46:40 UTC
2e5de5a Cleanup include (#8208) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8208 Make include of "file_system.h" use the same include path as everywhere else. Reviewed By: riversand963, akankshamahajan15 Differential Revision: D27881606 fbshipit-source-id: fc1e076229fde21041a813c655ce017b5070c8b3 20 April 2021, 21:57:27 UTC
905dd17 Fix seqno in ingested file boundary key metadata (#8209) Summary: Fixes https://github.com/facebook/rocksdb/issues/6245. Adapted from https://github.com/facebook/rocksdb/issues/8201 and https://github.com/facebook/rocksdb/issues/8205. Previously we were writing the ingested file's smallest/largest internal keys with sequence number zero, or `kMaxSequenceNumber` in case of range tombstone. The former (sequence number zero) is incorrect and can lead to files being incorrectly ordered. The fix in this PR is to overwrite boundary keys that have sequence number zero with the ingested file's assigned sequence number. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8209 Test Plan: repro unit test Reviewed By: riversand963 Differential Revision: D27885678 Pulled By: ajkr fbshipit-source-id: 4a9f2c6efdfff81c3a9923e915ea88b250ee7b6a 20 April 2021, 21:00:21 UTC
1b99947 Mention PR 8206 in HISTORY.md (#8210) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8210 Reviewed By: akankshamahajan15 Differential Revision: D27887612 Pulled By: ltamasi fbshipit-source-id: 0db8d0b6047334dc47fe30a98804449043454386 20 April 2021, 19:07:40 UTC
a89740f Fix unittest no space issue (#8204) Summary: Unittest reports no space from time to time, which can be reproduced on a small memory machine with SHM. It's caused by large WAL files generated during the test, which is preallocated, but didn't truncate during close(). Adding the missing APIs to set preallocation. It added arm test as nightly build, as the test runs more than 1 hour. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8204 Test Plan: test on small memory arm machine Reviewed By: mrambacher Differential Revision: D27873145 Pulled By: jay-zhuang fbshipit-source-id: f797c429d6bc13cbcc673bc03fcc72adda55f506 20 April 2021, 15:42:28 UTC
a345b4d Move arm build from travis to circleci (#8203) Summary: Moving ARM build from travis to CircleCI. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8203 Test Plan: CI Reviewed By: ajkr Differential Revision: D27861753 Pulled By: jay-zhuang fbshipit-source-id: 5e36a67f6fbb921c2ed80b284ba2de485411937b 20 April 2021, 03:07:02 UTC
a376c22 Handle rename() failure in non-local FS (#8192) Summary: In a distributed environment, a file `rename()` operation can succeed on server (remote) side, but the client can somehow return non-ok status to RocksDB. Possible reasons include network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a new MANIFEST. We currently always delete the new MANIFEST if an error occurs. This is problematic in distributed world. If the server-side successfully updates the CURRENT file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail. As a fix, we can track the execution result of IO operations on the new MANIFEST. - If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original MANIFEST. Therefore, it is safe to remove the new MANIFEST. - If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the new MANIFEST.) Therefore, we keep the new MANIFEST. - Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT. - If process reopens the db immediately after the failure, then the CURRENT file can point to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can succeed and ignore the other. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192 Test Plan: make check Reviewed By: zhichao-cao Differential Revision: D27804648 Pulled By: riversand963 fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4 20 April 2021, 01:11:13 UTC
0c6e467 Fix a data race related to DB properties (#8206) Summary: Historically, the DB properties `rocksdb.cur-size-active-mem-table`, `rocksdb.cur-size-all-mem-tables`, and `rocksdb.size-all-mem-tables` called the method `MemTable::ApproximateMemoryUsage` for mutable memtables, which is not safe without synchronization. This resulted in data races with memtable inserts. The patch changes the code handling these properties to use `MemTable::ApproximateMemoryUsageFast` instead, which returns a cached value backed by an atomic variable. Two test cases had to be updated for this change. `MemoryTest.MemTableAndTableReadersTotal` was fixed by increasing the value size used so each value ends up in its own memtable, which was the original intention (note: the test has been broken in the sense that the test code didn't consider that memtable sizes below 64 KB get increased to 64 KB by `SanitizeOptions`, and has been passing only by accident). `DBTest.MemoryUsageWithMaxWriteBufferSizeToMaintain` relies on completely up-to-date values and thus was changed to use `ApproximateMemoryUsage` directly instead of going through the DB properties. Note: this should be safe in this case since there's only a single thread involved. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8206 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D27866811 Pulled By: ltamasi fbshipit-source-id: 7bd754d0565e0a65f1f7f0e78ffc093beef79394 19 April 2021, 23:38:02 UTC
b0e2019 Handle blob files when options.best_efforts_recovery is true (#8180) Summary: If `options.best_efforts_recovery == true`, RocksDB currently tolerates missing table files and recovers to the latest version without missing table files (not considering WAL). It is necessary to handle blob files as well to make the feature more complete. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8180 Test Plan: make check Reviewed By: ltamasi Differential Revision: D27840556 Pulled By: riversand963 fbshipit-source-id: 041685d0dc2e7779ac4f0374c07a8a327704aa5e 19 April 2021, 18:56:14 UTC
c377c2b Fix flaky test BackupableDBTest.FileSizeForIncremental (#8197) Summary: Test was flaky because for kUseDbSessionId naming, blob files use naming scheme kLegacyCrc32cAndFileSize. So expected number of files because of collision can vary. So disabling blobdb for this test case. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8197 Reviewed By: pdillinger Differential Revision: D27836997 Pulled By: akankshamahajan15 fbshipit-source-id: 5eb21a5f4acae3d6b730a9e1b207264fbc18cb80 18 April 2021, 23:18:35 UTC
531a5f8 Update release version to 6.20 (#8199) Summary: Update release version to 6.20 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8199 Test Plan: No code change Reviewed By: ajkr Differential Revision: D27838750 Pulled By: akankshamahajan15 fbshipit-source-id: f02f722fc6bdd37d626d47a0e932bbecea3507a8 17 April 2021, 03:15:36 UTC
back to top