https://github.com/facebook/rocksdb
- HEAD
- refs/heads/2.2.fb.branch
- refs/heads/2.3.fb.branch
- refs/heads/2.4.fb.branch
- refs/heads/2.5.fb.branch
- refs/heads/2.6.fb.branch
- refs/heads/2.7
- refs/heads/2.7.fb.branch
- refs/heads/2.8.1.fb
- refs/heads/2.8.fb
- refs/heads/2.8.fb.trunk
- refs/heads/3.0.fb
- refs/heads/3.0.fb.branch
- refs/heads/3.1.fb
- refs/heads/3.10.fb
- refs/heads/3.11.fb
- refs/heads/3.12.fb
- refs/heads/3.13.fb
- refs/heads/3.2.fb
- refs/heads/3.3.fb
- refs/heads/3.4.fb
- refs/heads/3.5.fb
- refs/heads/3.6.fb
- refs/heads/3.7.fb
- refs/heads/3.8.fb
- refs/heads/3.9.fb
- refs/heads/4.0.fb
- refs/heads/4.1.fb
- refs/heads/4.10.fb
- refs/heads/4.11.fb
- refs/heads/4.12.fb
- refs/heads/4.13.fb
- refs/heads/4.2.fb
- refs/heads/4.3.fb
- refs/heads/4.4.fb
- refs/heads/4.5.fb
- refs/heads/4.6.fb
- refs/heads/4.7.fb
- refs/heads/4.8.fb
- refs/heads/4.9.fb
- refs/heads/5.0.fb
- refs/heads/5.1.fb
- refs/heads/5.10.fb
- refs/heads/5.11.fb
- refs/heads/5.12.fb
- refs/heads/5.13.fb
- refs/heads/5.13.fb.myrocks
- refs/heads/5.14.fb
- refs/heads/5.14.fb.myrocks
- refs/heads/5.15.fb
- refs/heads/5.16.fb
- refs/heads/5.17.fb
- refs/heads/5.17.fb.myrocks
- refs/heads/5.18.fb
- refs/heads/5.2.fb
- refs/heads/5.3.fb
- refs/heads/5.4.fb
- refs/heads/5.5.fb
- refs/heads/5.6.fb
- refs/heads/5.7.fb
- refs/heads/5.7.fb.myrocks
- refs/heads/5.8.3
- refs/heads/5.8.fb
- refs/heads/5.9.fb
- refs/heads/5.9.fb.myrocks
- refs/heads/6.0.fb
- refs/heads/6.0.fb.myrocks
- refs/heads/6.1.fb
- refs/heads/6.1.fb.myrocks
- refs/heads/6.1.fb.prod201905
- refs/heads/6.10.fb
- refs/heads/6.11.fb
- refs/heads/6.12.fb
- refs/heads/6.13.fb
- refs/heads/6.13.fb.laser
- refs/heads/6.14.fb
- refs/heads/6.14.fb.laser
- refs/heads/6.15.fb
- refs/heads/6.16.fb
- refs/heads/6.17.fb
- refs/heads/6.17.fb.laser
- refs/heads/6.18.fb
- refs/heads/6.19.fb
- refs/heads/6.2.fb
- refs/heads/6.20.fb
- refs/heads/6.21.fb
- refs/heads/6.22-history.md-fixup
- refs/heads/6.22.fb
- refs/heads/6.23.fb
- refs/heads/6.24.fb
- refs/heads/6.25.fb
- refs/heads/6.26.fb
- refs/heads/6.27.fb
- refs/heads/6.28.fb
- refs/heads/6.29.fb
- refs/heads/6.3.fb
- refs/heads/6.3.fb.myrocks
- refs/heads/6.3.fb.myrocks2
- refs/heads/6.3fb
- refs/heads/6.4.fb
- refs/heads/6.5.fb
- refs/heads/6.6.fb
- refs/heads/6.7.fb
- refs/heads/6.8.fb
- refs/heads/6.9.fb
- refs/heads/7.0.fb
- refs/heads/7.1.fb
- refs/heads/7.10.fb
- refs/heads/7.2.fb
- refs/heads/7.3.fb
- refs/heads/7.4.fb
- refs/heads/7.5.fb
- refs/heads/7.6.fb
- refs/heads/7.7.fb
- refs/heads/7.8.fb
- refs/heads/7.9.fb
- refs/heads/8.0.fb
- refs/heads/8.1.fb
- refs/heads/8.10.fb
- refs/heads/8.11.2_zippydb
- refs/heads/8.11.fb
- refs/heads/8.11.fb_zippydb
- refs/heads/8.2.fb
- refs/heads/8.3.fb
- refs/heads/8.4.fb
- refs/heads/8.5.fb
- refs/heads/8.6.fb
- refs/heads/8.7.fb
- refs/heads/8.8.fb
- refs/heads/8.9.fb
- refs/heads/9.0.fb
- refs/heads/9.1.fb
- refs/heads/9.1.fb.myrocks
- refs/heads/9.2.fb
- refs/heads/9.3.fb
- refs/heads/adaptive
- refs/heads/ajkr-patch-1
- refs/heads/ajkr-patch-2
- refs/heads/blob_shadow
- refs/heads/bottom-pri-level
- refs/heads/bugfix-build-detect
- refs/heads/checksum_readahead_mmap_fix
- refs/heads/draft-myrocks-and-fbcode-8.0.fb
- refs/heads/feature/debug-rocksdbjavastatic
- refs/heads/feature/travis-arm64
- refs/heads/fix-release-notes
- refs/heads/fix-win2022-build
- refs/heads/fix-write-batch-comment
- refs/heads/format_compatible_4
- refs/heads/getmergeops
- refs/heads/gh-pages-old
- refs/heads/history-update
- refs/heads/hotfix/lambda-capture
- refs/heads/improve-support
- refs/heads/jijiew-patch-1
- refs/heads/katherinez-patch-1
- refs/heads/katherinez-patch-2
- refs/heads/main
- refs/heads/master
- refs/heads/mdcallag_benchmark_oct22
- refs/heads/nvm_cache_proto
- refs/heads/pr-sanity-check-as-GHAction
- refs/heads/pr/11267
- refs/heads/pr/6062
- refs/heads/ramvadiv-patch-1
- refs/heads/release_fix
- refs/heads/revert-10606-7.6.1
- refs/heads/ribbon_bloom_hybrid
- refs/heads/scaffold
- refs/heads/siying-patch-1
- refs/heads/siying-patch-10
- refs/heads/siying-patch-2
- refs/heads/siying-patch-3
- refs/heads/siying-patch-4
- refs/heads/siying-patch-5
- refs/heads/siying-patch-6
- refs/heads/siying-patch-7
- refs/heads/siying-patch-8
- refs/heads/skip_memtable_flush
- refs/heads/testing_ppc_build
- refs/heads/tests
- refs/heads/unschedule_issue_test_base
- refs/heads/unused-var
- refs/heads/v6.6.4
- refs/heads/xxhash_merge_base
- refs/heads/yiwu_stackable
- refs/heads/yuslepukhin
- refs/remotes/origin/5.13.fb
- refs/tags/2.5.fb
- refs/tags/2.6.fb
- refs/tags/3.0.fb
- refs/tags/do-not-use-me2
- refs/tags/rocksdb-3.1
- refs/tags/rocksdb-3.10.2
- refs/tags/rocksdb-3.11
- refs/tags/rocksdb-3.11.1
- refs/tags/rocksdb-3.11.2
- refs/tags/rocksdb-3.2
- refs/tags/rocksdb-3.3
- refs/tags/rocksdb-3.4
- refs/tags/rocksdb-3.5
- refs/tags/rocksdb-3.5.1
- refs/tags/rocksdb-3.6.1
- refs/tags/rocksdb-3.6.2
- refs/tags/rocksdb-3.7
- refs/tags/rocksdb-3.8
- refs/tags/rocksdb-3.9
- refs/tags/rocksdb-3.9.1
- refs/tags/rocksdb-4.1
- refs/tags/rocksdb-5.10.2
- refs/tags/rocksdb-5.10.3
- refs/tags/rocksdb-5.10.4
- refs/tags/rocksdb-5.11.2
- refs/tags/rocksdb-5.11.3
- refs/tags/rocksdb-5.14.3
- refs/tags/rocksdb-5.2.1
- refs/tags/rocksdb-5.3.3
- refs/tags/rocksdb-5.3.4
- refs/tags/rocksdb-5.3.5
- refs/tags/rocksdb-5.3.6
- refs/tags/rocksdb-5.4.10
- refs/tags/rocksdb-5.4.5
- refs/tags/rocksdb-5.4.6
- refs/tags/rocksdb-5.5.2
- refs/tags/rocksdb-5.5.3
- refs/tags/rocksdb-5.5.4
- refs/tags/rocksdb-5.5.5
- refs/tags/rocksdb-5.5.6
- refs/tags/rocksdb-5.6.1
- refs/tags/rocksdb-5.6.2
- refs/tags/rocksdb-5.7.1
- refs/tags/rocksdb-5.7.2
- refs/tags/rocksdb-5.7.3
- refs/tags/rocksdb-5.7.5
- refs/tags/rocksdb-5.8.6
- refs/tags/rocksdb-5.8.7
- refs/tags/rocksdb-5.8.8
- refs/tags/rocksdb-5.9.2
- refs/tags/v4.0
- refs/tags/v4.1
- refs/tags/v5.10.2
- refs/tags/v5.10.3
- refs/tags/v5.10.4
- refs/tags/v5.11.2
- refs/tags/v5.11.3
- refs/tags/v5.13.3
- refs/tags/v5.14.3
- refs/tags/v5.15.10
- refs/tags/v5.18.3
- refs/tags/v5.2.1
- refs/tags/v5.3.3
- refs/tags/v5.3.4
- refs/tags/v5.3.5
- refs/tags/v5.3.6
- refs/tags/v5.4.10
- refs/tags/v5.4.5
- refs/tags/v5.4.6
- refs/tags/v5.5.2
- refs/tags/v5.5.3
- refs/tags/v5.5.4
- refs/tags/v5.5.5
- refs/tags/v5.5.6
- refs/tags/v5.6.1
- refs/tags/v5.6.2
- refs/tags/v5.7.1
- refs/tags/v5.7.2
- refs/tags/v5.7.3
- refs/tags/v5.7.5
- refs/tags/v5.8.6
- refs/tags/v5.8.7
- refs/tags/v5.8.8
- refs/tags/v5.9.2
- refs/tags/v6.0.1
- refs/tags/v6.0.2
- refs/tags/v6.1.1
- refs/tags/v6.1.2
- refs/tags/v6.10.1
- refs/tags/v6.10.2
- refs/tags/v6.11.4
- refs/tags/v6.11.6
- refs/tags/v6.12.6
- refs/tags/v6.12.7
- refs/tags/v6.13.2
- refs/tags/v6.13.3
- refs/tags/v6.14.5
- refs/tags/v6.14.6
- refs/tags/v6.15.4
- refs/tags/v6.15.5
- refs/tags/v6.16.3
- refs/tags/v6.16.4
- refs/tags/v6.17.3
- refs/tags/v6.2.2
- refs/tags/v6.2.4
- refs/tags/v6.20.3
- refs/tags/v6.22.1
- refs/tags/v6.25.3
- refs/tags/v6.26.1
- refs/tags/v6.28.2
- refs/tags/v6.29.3
- refs/tags/v6.29.4
- refs/tags/v6.29.5
- refs/tags/v6.3.6
- refs/tags/v6.4.6
- refs/tags/v6.5.2
- refs/tags/v6.5.3
- refs/tags/v6.6.3
- refs/tags/v6.6.4
- refs/tags/v6.7.3
- refs/tags/v6.8.1
- refs/tags/v7.0.1
- refs/tags/v7.0.2
- refs/tags/v7.0.4
- refs/tags/v7.2.0
- refs/tags/v7.2.2
- refs/tags/v7.5.3
- refs/tags/v7.7.2
- refs/tags/v7.9.2
- refs/tags/v8.0.0
- refs/tags/v8.11.4
- refs/tags/v8.3.2
- refs/tags/v8.3.3
- refs/tags/v8.4.4
- refs/tags/v8.5.3
- refs/tags/v8.6.7
- refs/tags/v8.7.3
- refs/tags/v9.0.1
- refs/tags/v9.1.1
- refs/tags/v9.2.1
- v9.1.0
- v9.0.0
- v8.9.1
- v8.8.1
- v8.5.4
- v8.11.3
- v8.10.2
- v8.10.0
- v8.1.1
- v7.8.3
- v7.7.8
- v7.7.3
- v7.6.0
- v7.4.5
- v7.4.4
- v7.4.3
- v7.3.1
- v7.10.2
- v7.1.2
- v7.1.1
- v7.0.3
- v6.27.3
- v6.26.0
- v6.25.1
- v6.24.2
- v6.23.3
- v6.23.2
- v6.19.3
- v6.15.2
- v5.8
- v5.5.1
- v5.4.7
- v5.18.4
- v5.17.2
- v5.16.6
- v5.14.2
- v5.13.4
- v5.13.2
- v5.13.1
- v5.12.5
- v5.12.4
- v5.12.3
- v5.12.2
- v5.1.4
- v5.1.3
- v5.1.2
- v5.0.2
- v5.0.1
- v4.9
- v4.8
- v4.6.1
- v4.5.1
- v4.4.1
- v4.4
- v4.3.1
- v4.3
- v4.2
- v4.13.5
- v4.13
- v4.11.2
- v3.9
- v3.8
- v3.7
- v3.6.1
- v3.5
- v3.4
- v3.3
- v3.2
- v3.13.1
- v3.13
- v3.12.1
- v3.12
- v3.11
- v3.10
- v3.1
- v3.0
- v2.8
- v2.7
- v2.6
- v2.5
- v2.4
- v2.3
- v2.2
- v2.1
- v2.0
- v1.5.9.1
- v1.5.8.2
- v1.5.8.1
- v1.5.8
- v1.5.7
- rocksdb-5.8
- rocksdb-5.4.7
- rocksdb-5.1.4
- rocksdb-5.1.3
- rocksdb-5.1.2
- rocksdb-5.0.2
- rocksdb-5.0.1
- rocksdb-4.9
- rocksdb-4.8
- rocksdb-4.6.1
- rocksdb-4.5.1
- rocksdb-4.4.1
- rocksdb-4.4
- rocksdb-4.3.1
- rocksdb-4.3
- rocksdb-4.2
- rocksdb-4.13.5
- rocksdb-4.13
- rocksdb-4.11.2
- rocksdb-3.10.1
- blob_st_lvl-pre
- 2.8.fb
- 2.7.fb
- 2.4.fb
- 2.3.fb
- 2.2.fb
- 2.1.fb
- 2.0.fb
- 1.5.9.fb
- 1.5.9.2.fb
- 1.5.9.1.fb
- 1.5.8.fb
- 1.5.8.2.fb
- 1.5.8.1.fb
- 1.5.7.fb
Take a new snapshot of a software origin
If the archived software origin currently browsed is not synchronized with its upstream version (for instance when new commits have been issued), you can explicitly request Software Heritage to take a new snapshot of it.
Use the form below to proceed. Once a request has been submitted and accepted, it will be processed as soon as possible. You can then check its processing state by visiting this dedicated page.Processing "take a new snapshot" request ...
Permalinks
To reference or cite the objects present in the Software Heritage archive, permalinks based on SoftWare Hash IDentifiers (SWHIDs) must be used.
Select below a type of object currently browsed in order to display its associated SWHID and permalink.
Revision | Author | Date | Message | Commit Date |
---|---|---|---|---|
36540d4 | Andrew Kryczka | 17 April 2024, 20:46:25 UTC | update version.h and HISTORY.md for 9.0.1 | 17 April 2024, 20:46:25 UTC |
fd68cc0 | Andrew Kryczka | 11 April 2024, 19:46:15 UTC | Skip io_uring feature test when building with fbcode (#12525) Summary: Previously when building with fbcode and having a system install of liburing, it would link liburing from fbcode statically as well as the system library dynamically. That led to the following error: ``` ./db_stress: error while loading shared libraries: liburing.so.1: cannot open shared object file: No such file or directory ``` The fix is to skip the feature test for system liburing when `FBCODE_BUILD=true`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12525 Test Plan: - `make clean && make ROCKSDB_NO_FBCODE=1 V=1 -j56 db_stress && ./db_stress` - `make clean && make V=1 -j56 db_stress && ./db_stress` Reviewed By: anand1976 Differential Revision: D55997335 Pulled By: ajkr fbshipit-source-id: 17d8561100f41c6c9ae382a80c6cddc14f050bdc | 16 April 2024, 23:35:48 UTC |
fc05af3 | Jeff Palm | 21 February 2024, 20:03:40 UTC | switch to using centos8-native (#12367) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12367 switch to using centos8-native for rocks-db Reviewed By: jowlyzhang Differential Revision: D53971368 fbshipit-source-id: 635885dfb9e0ec6daa7623627a50e6b2897725ba | 16 April 2024, 23:26:40 UTC |
814ba4b | Andrew Kryczka | 09 April 2024, 19:39:22 UTC | 12404+12371 history entry | 09 April 2024, 23:57:22 UTC |
7575389 | Andrew Kryczka | 09 April 2024, 19:38:56 UTC | 12474 history entry | 09 April 2024, 23:54:12 UTC |
63e7f12 | Radek Hubner | 05 April 2024, 20:55:18 UTC | Fix exception on RocksDB.getColumnFamilyMetaData() (#12474) Summary: https://github.com/facebook/rocksdb/issues/12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by https://github.com/facebook/rocksdb/issues/11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated. This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12474 Reviewed By: jowlyzhang Differential Revision: D55811808 Pulled By: ajkr fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e | 09 April 2024, 23:53:40 UTC |
f444196 | anand76 | 11 March 2024, 18:26:24 UTC | Add a FS flag to detect and correct corruption (#12408) Summary: Add a flag in `IOOptions` to request the file system to make best efforts to detect data corruption and reconstruct the data if possible. This will be used by RocksDB to retry a read if the previous read returns corrupt data (checksum mismatch). Add a new op to `FSSupportedOps` that, if supported, will trigger this behavior in RocksDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12408 Reviewed By: akankshamahajan15 Differential Revision: D54564218 Pulled By: anand1976 fbshipit-source-id: bc401dcd22a7d320bf63b5183c41714acdce39f5 | 18 March 2024, 22:15:28 UTC |
9ded0f7 | Radek Hubner | 06 March 2024, 18:33:17 UTC | Fix regression for Javadoc jar build (#12404) Summary: https://github.com/facebook/rocksdb/issues/12371 Introduced regression not defining dependency between `create_javadoc` and `rocksdb_javadocs_jar` build targets. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12404 Reviewed By: pdillinger Differential Revision: D54516862 Pulled By: ajkr fbshipit-source-id: 785a99b2caf979395ae0de60e40e7d1b93059adb | 15 March 2024, 23:09:48 UTC |
b64bc23 | Peter Dillinger | 14 March 2024, 19:04:41 UTC | Fix windows build and CI (#12426) Summary: Issue https://github.com/facebook/rocksdb/issues/12421 describes a regression in the migration from CircleCI to GitHub Actions in which failing build steps no longer fail Windows CI jobs. In GHA with pwsh (new preferred powershell command), only the last non-builtin command (or something like that) affects the overall success/failure result, and failures in external commands do not exit the script, even with `$ErrorActionPreference = 'Stop'` and `$PSNativeCommandErrorActionPreference = $true`. Switching to `powershell` causes some obscure failure (not seen in CircleCI) about the `-Lo` option to `curl`. Here we work around this using the only reasonable-but-ugly way known: explicitly check the result after every non-trivial build step. This leaves us highly susceptible to future regressions with unchecked build steps in the future, but a clean solution is not known. This change also fixes the build errors that were allowed to creep in because of the CI regression. Also decreased the unnecessarily long running time of DBWriteTest.WriteThreadWaitNanosCounter. For background, this problem explicitly contradicts GitHub's documentation, and GitHub has known about the problem for more than a year, with no evidence of caring or intending to fix. https://github.com/actions/runner-images/issues/6668 Somehow CircleCI doesn't have this problem. And even though cmd.exe and powershell have been perpetuating DOS-isms for decades, they still seem to be a somewhat active "hot mess" when it comes to sensible, consistent, and documented behavior. Fixes https://github.com/facebook/rocksdb/issues/12421 A history of some things I tried in development is here: https://github.com/facebook/rocksdb/compare/main...pdillinger:rocksdb:debug_windows_ci_orig Pull Request resolved: https://github.com/facebook/rocksdb/pull/12426 Test Plan: CI, including https://github.com/facebook/rocksdb/issues/12434 where I have temporarily enabled other Windows builds on PR with this change Reviewed By: cbi42 Differential Revision: D54903698 Pulled By: pdillinger fbshipit-source-id: 116bcbebbbf98f347c7cf7dfdeebeaaed7f76827 | 15 March 2024, 23:09:48 UTC |
3c9956e | Adam Retter | 27 February 2024, 23:46:12 UTC | Correct CMake Javadoc and source jar builds (#12371) Summary: Fix some issues introduced in https://github.com/facebook/rocksdb/pull/12199 (CC rhubner) 1. Previous `jar -v -c -f` was not valid command syntax. 2. Javadoc and source Jar files were prefixed `rocksdb-`, now corrected to `rocksdbjni-` pdillinger This needs to be merged to `main` and also `8.11.fb` (to fix the Windows build for the RocksJava release of 8.11.2) please. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12371 Reviewed By: pdillinger, jowlyzhang Differential Revision: D54136834 Pulled By: hx235 fbshipit-source-id: f356f2401042af359ada607e5f0be627418ccd6c | 15 March 2024, 20:25:58 UTC |
659fc7c | Yu Zhang | 17 February 2024, 00:29:43 UTC | Update HISTORY.md for 9.0.0 | 17 February 2024, 00:29:43 UTC |
f2732d0 | Alex Wied | 16 February 2024, 18:28:41 UTC | Export GetSequenceNumber functionality for Snapshots (#12354) Summary: This PR adds `Snapshot->GetSequenceNumber()` functionality to the C API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12354 Reviewed By: akankshamahajan15 Differential Revision: D53836085 Pulled By: cbi42 fbshipit-source-id: 4a14daeba9210a69bcb74e4c1c0666deff1b4837 | 16 February 2024, 18:28:41 UTC |
055b21a | Adam Retter | 16 February 2024, 18:26:32 UTC | Update ZLib to 1.3.1 (#12358) Summary: pdillinger This fixes the RocksJava build, is also needed in the 8.10.fb and 8.11.fb branches please? Pull Request resolved: https://github.com/facebook/rocksdb/pull/12358 Reviewed By: jaykorean Differential Revision: D53859743 Pulled By: pdillinger fbshipit-source-id: b8417fccfee931591805f9aecdfae7c086fee708 | 16 February 2024, 18:26:32 UTC |
d227276 | anand76 | 16 February 2024, 17:21:06 UTC | Deprecate some variants of Get and MultiGet (#12327) Summary: A lot of variants of Get and MultiGet have been added to `include/rocksdb/db.h` over the years. Try to consolidate them by marking variants that don't return timestamps as deprecated. The underlying DB implementation will check and return Status::NotSupported() if it doesn't support returning timestamps and the caller asks for it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12327 Reviewed By: pdillinger Differential Revision: D53828151 Pulled By: anand1976 fbshipit-source-id: e0b5ca42d32daa2739d5f439a729815a2d4ff050 | 16 February 2024, 17:21:06 UTC |
956f1df | Akanksha Mahajan | 16 February 2024, 17:14:55 UTC | Change ReadAsync callback API to remove const from FSReadRequest (#11649) Summary: Modify ReadAsync callback API to remove const from FSReadRequest as const doesn't let to fs_scratch to move the ownership. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11649 Test Plan: CircleCI jobs Reviewed By: anand1976 Differential Revision: D53585309 Pulled By: akankshamahajan15 fbshipit-source-id: 3bff9035db0e6fbbe34721a5963443355807420d | 16 February 2024, 17:14:55 UTC |
28c1c15 | anand76 | 16 February 2024, 01:22:03 UTC | Sync tickers and histograms across C++ and Java (#12355) Summary: The RocksDB ticker and histogram statistics were out of sync between the C++ and Java code, with a number of newer stats missing in TickerType.java and HistogramType.java. Also, there were gaps in numbering in portal.h, which could soon become an issue due to the number of tickers and the fact that we're limited to 1 byte in Java. This PR adds the missing stats, and re-numbers all of them. It also moves some stats around to try to group related stats together. Since this will go into a major release, compatibility shouldn't be an issue. This should be automated at some point, since the current process is somewhat error prone. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12355 Reviewed By: jaykorean Differential Revision: D53825324 Pulled By: anand1976 fbshipit-source-id: 298c180872f4b9f1ee54b8bb22f4e280458e7e09 | 16 February 2024, 01:22:03 UTC |
1201813 | Peter Dillinger | 15 February 2024, 23:39:55 UTC | KeySegmentsExtractor and prototype higher-dimensional filtering (#12075) Summary: This change contains a prototype new API for "higher dimensional" filtering of read queries. Existing filters treat keys as one-dimensional, either as distinct points (whole key) or as contiguous ranges in comparator order (prefix filters). The proposed KeySegmentsExtractor allows treating keys as multi-dimensional for filtering purposes even though they still have a single total order across dimensions. For example, consider these keys in different LSM levels: L0: abc_0123 abc_0150 def_0114 ghi_0134 L1: abc_0045 bcd_0091 def_0077 xyz_0080 If we get a range query for [def_0100, def_0200), a prefix filter (up to the underscore) will tell us that both levels are potentially relevant. However, if each SST file stores a simple range of the values for the second segment of the key, we would see that L1 only has [0045, 0091] which (under certain required assumptions) we are sure does not overlap with the given range query. Thus, we can filter out processing or reading any index or data blocks from L1 for the query. This kind of case shows up with time-ordered data but is more general than filtering based on user timestamp. See https://github.com/facebook/rocksdb/issues/11332 . Here the "time" segments of the keys are meaningfully ordered with respect to each other even when the previous segment is different, so summarizing data along an alternate dimension of the key like this can work well for filtering. This prototype implementation simply leverages existing APIs for user table properties and table filtering, which is not very CPU efficient. Eventually, we expect to create a native implementation. However, I have put some significant thought and engineering into the new APIs overall, which I expect to be close to refined enough for production. For details, see new public APIs in experimental.h. For a detailed example, see the new unit test in db_bloom_filter_test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12075 Test Plan: Unit test included Reviewed By: jowlyzhang Differential Revision: D53619406 Pulled By: pdillinger fbshipit-source-id: 9e6e7b82b4db8d815db76a6ab340e90db2c191f2 | 15 February 2024, 23:39:55 UTC |
bfd00bb | Peter Dillinger | 15 February 2024, 19:23:48 UTC | Use format_version=6 by default (#12352) Summary: It's in production for a large storage service, and it was initially released 6 months ago (8.6.0). IMHO that's enough room for "easy downgrade" to most any user's previously integrated version, even if they only update a few times a year. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12352 Test Plan: tests updated, including format capatibility test table_test: ApproximateOffsetOfCompressed is affected because adding index block to metaindex adds about 13 bytes to SST files in format_version 6. This test has historically been problematic and one reason is that, apparently, not only could it pass/fail depending on snappy compression version, but also how long your host name is, because of db_host_id. I've cleared that out for the test, which takes care of format_version=6 and hopefully improves long-term reliability. Suggested follow-up: FinishImpl in table_test.cc takes a table_options that is ignored in some cases and might not match the ioptions.table_factory configuration unless the caller is very careful. This should be cleaned up somehow. Reviewed By: anand1976 Differential Revision: D53786884 Pulled By: pdillinger fbshipit-source-id: 1964cbd40d3ab0a821fdc01c458031df716fcf51 | 15 February 2024, 19:23:48 UTC |
6e57135 | Changyu Bi | 15 February 2024, 17:53:20 UTC | Add a changelog entry for PR 12322 (#12353) Summary: .. for public api change related to sst_dump. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12353 Reviewed By: jaykorean Differential Revision: D53791123 Pulled By: cbi42 fbshipit-source-id: 3fbe9c7a3eb0a30dc1a00d39bc8a46028baa3779 | 15 February 2024, 17:53:20 UTC |
d201e59 | Gilbert Liu | 14 February 2024, 22:40:05 UTC | Update llvm-fb to 15 (#12342) Summary: Update llvm-fb to 15 and some other dependency versions. ## Test Copied over the two script files to tp2 librocksdb source and ran tp2_build, it succeeded. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12342 Reviewed By: ltamasi Differential Revision: D53690631 Pulled By: bunnypak fbshipit-source-id: 68f884b2a565f98bc3510290b411a901ef781adb | 14 February 2024, 22:40:05 UTC |
f405e55 | Yu Zhang | 14 February 2024, 04:30:07 UTC | Add support in SstFileWriter to not persist user defined timestamps (#12348) Summary: This PR adds support in `SstFileWriter` to create SST files without persisting timestamps when the column family has enabled UDTs in Memtable only feature. The sst files created from flush and compaction do not contain timestamps, we want to make the sst files created by `SstFileWriter` to follow the same pattern and not persist timestamps. This is to prepare for ingesting external SST files for this type of column family. There are timestamp-aware APIs and non timestamp-aware APIs in `SstFileWriter`. The former are exclusively used for when the column family's comparator is timestamp-aware, a.k.a `Comparator::timestamp_size() > 0`, while the latter are exclusively used for the column family's comparator is non timestamp-aware, a.k.a `Comparator::timestamp_size() == 0`. There are sanity checks to make sure these APIs are correctly used. In this PR, the APIs usage continue with above enforcement, where even though timestamps are not eventually persisted, users are still asked to use only the timestamp-aware APIs. But because data points will logically all have minimum timestamps, we don't allow multiple versions of the same user key (without timestamp) to be added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12348 Test Plan: Added unit tests Manual inspection of generated sst files with `sst_dump` Reviewed By: ltamasi Differential Revision: D53732667 Pulled By: jowlyzhang fbshipit-source-id: e43beba0d3a1736b94ee5c617163a6280efd65b7 | 14 February 2024, 04:30:07 UTC |
4bea83a | Yu Zhang | 14 February 2024, 02:36:25 UTC | Remove the force mode for EnableFileDeletions API (#12337) Summary: There is no strong reason for user to need this mode while on the other hand, its behavior is destructive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12337 Reviewed By: hx235 Differential Revision: D53630393 Pulled By: jowlyzhang fbshipit-source-id: ce94b537258102cd98f89aa4090025663664dd78 | 14 February 2024, 02:36:25 UTC |
8c7c0a3 | Jay Huh | 13 February 2024, 22:15:52 UTC | Minor refactor with printing stdout in blackbox tests (#12350) Summary: As title. Adding a missing stdout printing in `blackbox_crash_main()` # Test **Blackbox** ``` $> python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 ``` ``` ... stdout: Choosing random keys with no overwrite DB path: [/tmp/jewoongh/rocksdb_crashtest_blackbox34jwn9of] (Re-)verified 0 unique IDs 2024/02/13-12:27:33 Initializing worker threads Crash-recovery verification passed :) 2024/02/13-12:27:36 Starting database operations ... jewoongh stdout test jewoongh stdout test ... jewoongh stdout test stderr: jewoongh injected error ``` **Whitebox** ``` $> python3 tools/db_crashtest.py whitebox --simple --max_key=25000000 --write_buffer_size=4194304 ``` ``` ... stdout: Choosing random keys with no overwrite Creating 24415 locks ... 2024/02/13-12:31:51 Initializing worker threads Crash-recovery verification passed :) 2024/02/13-12:31:54 Starting database operations jewoongh stdout test jewoongh stdout test jewoongh stdout test ... stderr: jewoongh injected error ... ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12350 Reviewed By: akankshamahajan15, cbi42 Differential Revision: D53728910 Pulled By: jaykorean fbshipit-source-id: ec90ed3b5e6a1102d1fb55d357d0371e5072a173 | 13 February 2024, 22:15:52 UTC |
10d0245 | Yu Zhang | 13 February 2024, 19:15:28 UTC | Add support to bulk load external files with user-defined timestamps (#12343) Summary: This PR adds initial support to bulk loading external sst files with user-defined timestamps. To ensure this invariant is met while ingesting external files: assume there are two internal keys: <K, ts1, seq1> and <K, ts2, seq2>, the following should hold: ts1 < ts2 iff. seq1 < seq2 These extra requirements are added for ingesting external files with user-defined timestamps: 1) A file with overlapping user key (without timestamp) range with the db cannot be ingested. This is because we cannot ensure above invariant is met without checking each overlapped key's timestamp and compare it with the timestamp from the db. This is an expensive step. This bulk loading feature will be used by MyRocks and currently their usage can guarantee ingested file's key range doesn't overlap with db. https://github.com/facebook/mysql-5.6/blob/4f3a57a13fec9fa2cb6d8bef6d38adba209e1981/storage/rocksdb/ha_rocksdb.cc#L3312 We can consider loose this requirement by doing this check in the future, this initial support just disallow this. 2) Files with overlapping user key (without timestamp) range are not allowed to be ingested. For similar reasons, it's hard to ensure above invariant is met. For example, if we have two files where user keys are interleaved like this: file1: [c10, c8, f10, f5] file2: [b5, c11, f4] Either file1 gets a bigger global seqno than file2, or the other way around, above invariant cannot be met. So we disallow this. 2) When a column family enables user-defined timestamps, it doesn't support ingestion behind mode. Ingestion behind currently simply puts the file at the bottommost level, and assign a global seqno 0 to the file. We need to do similar search though the LSM tree for key range overlap checks to make sure aformentioned invariant is met. So this initial support disallow this mode. We can consider adding it in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12343 Test Plan: Add unit tests Reviewed By: cbi42 Differential Revision: D53686182 Pulled By: jowlyzhang fbshipit-source-id: f05e3fb27967f7974ed40179d78634c40ecfb136 | 13 February 2024, 19:15:28 UTC |
45668a0 | 马越 | 13 February 2024, 18:48:31 UTC | add unit test for compactRangeWithNullBoundaries java api (#12333) Summary: The purpose of this PR is to supplement a set of unit tests for https://github.com/facebook/rocksdb/pull/12328 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12333 Reviewed By: ltamasi Differential Revision: D53553830 Pulled By: cbi42 fbshipit-source-id: d21490f7ce7b30f42807ee37eda455ca6abdd072 | 13 February 2024, 18:48:31 UTC |
de1e3ff | Levi Tamasi | 12 February 2024, 21:26:09 UTC | Fix a data race in DBImpl::RenameTempFileToOptionsFile (#12347) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12347 `DBImpl::disable_delete_obsolete_files_` should only be accessed while holding the DB mutex to prevent data races. There's a piece of logic in `DBImpl::RenameTempFileToOptionsFile` where this synchronization was previously missing. The patch fixes this issue similarly to how it's handled in `DisableFileDeletions` and `EnableFileDeletions`, that is, by saving the counter value while holding the mutex and then performing the actual file deletion outside the critical section. Note: this PR only fixes the race itself; as a followup, we can also look into cleaning up and optimizing the file deletion logic (which is currently inefficient on multiple different levels). Reviewed By: jowlyzhang Differential Revision: D53675153 fbshipit-source-id: 5358e894ee6829d3edfadac50a93d97f8819e481 | 12 February 2024, 21:26:09 UTC |
395d24f | Yaroslav Stepanchuk | 12 February 2024, 19:24:56 UTC | Fix build on alpine 3.19 (#12345) Summary: Add missing include of the cstdint header. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12345 Reviewed By: ltamasi Differential Revision: D53672261 Pulled By: cbi42 fbshipit-source-id: 758944c0b51b9701a129e7b88f692103bbce11d3 | 12 February 2024, 19:24:56 UTC |
daf06f1 | chuhao zeng | 10 February 2024, 00:46:38 UTC | Continue format script when changes detected by clang-format-diff.py (#12329) Summary: The original [clang-format-diff.py script](https://raw.githubusercontent.com/llvm/llvm-project/main/clang/tools/clang-format/clang-format-diff.py), referenced in format.sh, exits with a status of 1 at the end after writing diffs to stderr. Consequently, the format.sh script terminates after initializing the 'diffs' variable. Implemented additional logic in format-diff.sh to ensure continuous execution, even when changes are detected and further formatting is required. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12329 Reviewed By: pdillinger Differential Revision: D53483185 Pulled By: cbi42 fbshipit-source-id: b7adff26f129220941258fd6ee83d053fa12b077 | 10 February 2024, 00:46:38 UTC |
b46f570 | Changyu Bi | 07 February 2024, 23:44:17 UTC | Fix unexpected keyword argument 'print_as_stderr' in crash test (#12339) Summary: Fix crash test failure like https://github.com/facebook/rocksdb/actions/runs/7821514511/job/21338625372#step:5:530 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12339 Test Plan: CI Reviewed By: jaykorean Differential Revision: D53545053 Pulled By: cbi42 fbshipit-source-id: b466a8dc9c0ded0377e8677937199c6f959f96ef | 07 February 2024, 23:44:17 UTC |
58d55b7 | Peter Dillinger | 07 February 2024, 23:06:04 UTC | Mark wal_compression feature as production-ready (#12336) Summary: (as title) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12336 Test Plan: in use at Meta for a large service; in crash test Reviewed By: hx235 Differential Revision: D53537628 Pulled By: pdillinger fbshipit-source-id: 69e7ac9ab7b59b928d1144105667a7fde8a55a5a | 07 February 2024, 23:06:04 UTC |
42a8e58 | Changyu Bi | 07 February 2024, 22:17:51 UTC | Print zstd warning to stdout in stress test (#12338) Summary: so the stress test does not fail. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12338 Reviewed By: jaykorean Differential Revision: D53542941 Pulled By: cbi42 fbshipit-source-id: 83b2eb3cb5cc4c5a268da386c22c4aadeb039a74 | 07 February 2024, 22:17:51 UTC |
9a2d748 | Akanksha Mahajan | 07 February 2024, 20:34:40 UTC | Print stderr in crash test script and exit on stderr (#12335) Summary: Some of the errors like data race and heap-after-use are error out based on crash test reporting them as error by relying on stderr. So reverting back to original form unless we come up with a more reliable solution to error out. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12335 Reviewed By: cbi42 Differential Revision: D53534781 Pulled By: akankshamahajan15 fbshipit-source-id: b19aa560d1560ac2281f7bc04e13961ed751f178 | 07 February 2024, 20:34:40 UTC |
54cb9c7 | Peter Dillinger | 07 February 2024, 18:44:11 UTC | Prefer static_cast in place of most reinterpret_cast (#12308) Summary: The following are risks associated with pointer-to-pointer reinterpret_cast: * Can produce the "wrong result" (crash or memory corruption). IIRC, in theory this can happen for any up-cast or down-cast for a non-standard-layout type, though in practice would only happen for multiple inheritance cases (where the base class pointer might be "inside" the derived object). We don't use multiple inheritance a lot, but we do. * Can mask useful compiler errors upon code change, including converting between unrelated pointer types that you are expecting to be related, and converting between pointer and scalar types unintentionally. I can only think of some obscure cases where static_cast could be troublesome when it compiles as a replacement: * Going through `void*` could plausibly cause unnecessary or broken pointer arithmetic. Suppose we have `struct Derived: public Base1, public Base2`. If we have `Derived*` -> `void*` -> `Base2*` -> `Derived*` through reinterpret casts, this could plausibly work (though technical UB) assuming the `Base2*` is not dereferenced. Changing to static cast could introduce breaking pointer arithmetic. * Unnecessary (but safe) pointer arithmetic could arise in a case like `Derived*` -> `Base2*` -> `Derived*` where before the Base2 pointer might not have been dereferenced. This could potentially affect performance. With some light scripting, I tried replacing pointer-to-pointer reinterpret_casts with static_cast and kept the cases that still compile. Most occurrences of reinterpret_cast have successfully been changed (except for java/ and third-party/). 294 changed, 257 remain. A couple of related interventions included here: * Previously Cache::Handle was not actually derived from in the implementations and just used as a `void*` stand-in with reinterpret_cast. Now there is a relationship to allow static_cast. In theory, this could introduce pointer arithmetic (as described above) but is unlikely without multiple inheritance AND non-empty Cache::Handle. * Remove some unnecessary casts to void* as this is allowed to be implicit (for better or worse). Most of the remaining reinterpret_casts are for converting to/from raw bytes of objects. We could consider better idioms for these patterns in follow-up work. I wish there were a way to implement a template variant of static_cast that would only compile if no pointer arithmetic is generated, but best I can tell, this is not possible. AFAIK the best you could do is a dynamic check that the void* conversion after the static cast is unchanged. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12308 Test Plan: existing tests, CI Reviewed By: ltamasi Differential Revision: D53204947 Pulled By: pdillinger fbshipit-source-id: 9de23e618263b0d5b9820f4e15966876888a16e2 | 07 February 2024, 18:44:11 UTC |
e3e8fbb | Yu Zhang | 07 February 2024, 02:35:36 UTC | Add a separate range classes for internal usage (#12071) Summary: Introduce some different range classes `UserKeyRange` and `UserKeyRangePtr` to be used by internal implementation. The `Range` class is used in both public APIs like `DB::GetApproximateSizes`, `DB::GetApproximateMemTableStats`, `DB::GetPropertiesOfTablesInRange` etc and internal implementations like `ColumnFamilyData::RangesOverlapWithMemtables`, `VersionSet::GetPropertiesOfTablesInRange`. These APIs have different expectations of what keys this range class contain. Public API users are supposed to populate the range with the user keys without timestamp, in the same way that point lookup and range scan APIs' key input only expect the user key without timestamp. The internal APIs implementation expect a user key whose format is compatible with the user comparator, a.k.a a user key with the timestamp. This PR contains: 1) introducing counterpart range class `UserKeyRange` `UserKeyRangePtr` for internal implementation while leave the existing `Range` and `RangePtr` class only for public APIs. Internal implementations are updated to use this new class instead. 2) add user-defined timestamp support for `DB::GetPropertiesOfTablesInRange` API and `DeleteFilesInRanges` API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12071 Test Plan: existing tests Added test for `DB::GetPropertiesOfTablesInRange` and `DeleteFilesInRanges` APIs for when user-defined timestamp is enabled. The change in external_file_ingestion_job doesn't have a user-defined timestamp enabled test case coverage, will add one in a follow up PR that adds file ingestion support for UDT. Reviewed By: ltamasi Differential Revision: D53292608 Pulled By: jowlyzhang fbshipit-source-id: 9a9279e23c640a6d8f8232636501a95aef7638b8 | 07 February 2024, 02:35:36 UTC |
0088f77 | Jay Huh | 06 February 2024, 04:11:35 UTC | Multiget LDB Followup (#12332) Summary: # Summary Following up jowlyzhang 's comment in https://github.com/facebook/rocksdb/issues/12283 . - Remove `ARG_TTL` from help which is not relevant to `multi_get` command - Treat NotFound status as non-error case for both `Get` and `MultiGet` and updated the unit test, `ldb_test.py` - Print key along with value in `multi_get` command Pull Request resolved: https://github.com/facebook/rocksdb/pull/12332 Test Plan: **Unit Test** ``` $>python3 tools/ldb_test.py ... Ran 25 tests in 17.447s OK ``` **Manual Run** ``` $> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000000D8 0x0000000000000009000000000000002678787878BEEF 0x0000000000000009000000000000012B00000000000000D8 ==> 0x47000000434241404F4E4D4C4B4A494857565554535251505F5E5D5C5B5A595867666564636261606F6E6D6C6B6A696877767574737271707F7E7D7C7B7A797807060504030201000F0E0D0C0B0A090817161514131211101F1E1D1C1B1A1918 Key not found: 0x0000000000000009000000000000002678787878BEEF ``` ``` $> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex get 0x00000000000000090000000000 Key not found ``` Reviewed By: jowlyzhang Differential Revision: D53450164 Pulled By: jaykorean fbshipit-source-id: 9ccec78ad3695e65b1ed0c147c7cbac502a1bd48 | 06 February 2024, 04:11:35 UTC |
1a885fe | Hui Xiao | 05 February 2024, 21:35:19 UTC | Remove deprecated Options::access_hint_on_compaction_start (#11654) Summary: **Context:** `Options::access_hint_on_compaction_start ` is marked deprecated and now ready to be removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11654 Test Plan: Multiple db_stress runs with pre-PR and post-PR binary randomly to ensure forward/backward compatibility on options https://github.com/ajkr/rocksdb/commit/36a5686ec012f35a4371e409aa85c404ca1c210d?fbclid=IwAR2IcdAUdTvw9O9V5GkHEYJRGMVR9p7Ei-LMa-9qiXlj3z80DxjkxlGnP1E `python3 tools/db_crashtest.py --simple blackbox --interval=30` Reviewed By: cbi42 Differential Revision: D47892459 Pulled By: hx235 fbshipit-source-id: a62f46a0377fe143be7638e218978d5431c15c56 | 05 February 2024, 21:35:19 UTC |
3a28779 | 马越 | 05 February 2024, 19:05:57 UTC | Fix the problem that wrong Key may be passed when using CompactRange JAVA API (#12328) Summary: When using the Rocksdb Java API. When we use Java code to call `db.compactRange (columnFamilyHandle, start, null)` which means we hope to perform range compaction on keys bigger than **start**. we expected call to the corresponding C++ code : `db->compactRange (columnFamilyHandle, &start, nullptr)` But in reality, what is being called is `db ->compactRange (columnFamilyHandle,start,"")` The problem here is the `null` in Java are not converted to `nullptr`, but rather to `""`, which may result in some unexpected results Pull Request resolved: https://github.com/facebook/rocksdb/pull/12328 Reviewed By: jowlyzhang Differential Revision: D53432749 Pulled By: cbi42 fbshipit-source-id: eeadd19d05667230568668946d2ef1d5b2568268 | 05 February 2024, 19:05:57 UTC |
6e88126 | Peter Dillinger | 05 February 2024, 18:26:41 UTC | Don't log an error when an auxiliary dir is missing (#12326) Summary: info_log gets an error logged when wal_dir or a db_path/cf_path is missing. Under this condition, the directory is created later (in DBImpl::Recover -> Directories::SetDirectories) with no error status returned. To avoid error spam in logs, change these to a descriptive "header" log entry. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12326 Test Plan: manual with DBBasicTest.DBCloseAllDirectoryFDs which exercises this code Reviewed By: jowlyzhang Differential Revision: D53374743 Pulled By: pdillinger fbshipit-source-id: 32d1ce18809da13a25bdd6183d661f66a3b6a111 | 05 February 2024, 18:26:41 UTC |
4eaa771 | Yu Zhang | 03 February 2024, 02:07:57 UTC | Refactor external sst file ingestion job (#12305) Summary: Updates some documentations and invariant assertions after https://github.com/facebook/rocksdb/issues/12257 and https://github.com/facebook/rocksdb/issues/12284. Also refactored some duplicate code and improved some error message and preconditions for errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12305 Test Plan: Existing unit tests Reviewed By: hx235 Differential Revision: D53371325 Pulled By: jowlyzhang fbshipit-source-id: fb0edcb3a3602cdf0a292ef437cfdfe897fc6c99 | 03 February 2024, 02:07:57 UTC |
5620efc | Changyu Bi | 03 February 2024, 01:09:42 UTC | Remove deprecated option `ignore_max_compaction_bytes_for_input` (#12323) Summary: The option is introduced in https://github.com/facebook/rocksdb/issues/10835 to allow disabling the new compaction behavior if it's not safe. The option is enabled by default and there has not been a need to disable it. So it should be safe to remove now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12323 Reviewed By: ajkr Differential Revision: D53330336 Pulled By: cbi42 fbshipit-source-id: 36eef4664ac96b3a7ed627c48bd6610b0a7eafc5 | 03 February 2024, 01:09:42 UTC |
ace1721 | Changyu Bi | 02 February 2024, 23:37:40 UTC | Remove deprecated option `level_compaction_dynamic_file_size` (#12325) Summary: The option is introduced in https://github.com/facebook/rocksdb/issues/10655 to allow reverting to old behavior. The option is enabled by default and there has not been a need to disable it. Remove it for 9.0 release. Also fixed and improved a few unit tests that depended on setting this option to false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12325 Test Plan: existing tests. Reviewed By: hx235 Differential Revision: D53369430 Pulled By: cbi42 fbshipit-source-id: 0ec2440ca8d88db7f7211c581542c7581bd4d3de | 02 February 2024, 23:37:40 UTC |
1d6dbfb | Peter Dillinger | 02 February 2024, 22:14:43 UTC | Rename IntTblPropCollector -> InternalTblPropColl (#12320) Summary: I've always found this name difficult to read, because it sounds like it's for collecting int(eger) table properties. I'm fixing this now to set up for a change that I have stubbed out in the public API (table_properties.h): a new adapter function `TablePropertiesCollector::AsInternal()` that allows RocksDB-provided TablePropertiesCollectors (such as CompactOnDeletionCollector) to implement the easier-to-upgrade internal interface while still (superficially) implementing the public interface. In addition to added flexibility, this should be a performance improvement as the adapter class UserKeyTablePropertiesCollector can be avoided for such cases where a RocksDB-provided collector is used (AsInternal() returns non-nullptr). table_properties.h is the only file with changes that aren't simple find-replace renaming. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12320 Test Plan: existing tests, CI Reviewed By: ajkr Differential Revision: D53336945 Pulled By: pdillinger fbshipit-source-id: 02535bcb30bbfb00e29e8478af62e5dad50a63b8 | 02 February 2024, 22:14:43 UTC |
95b41ee | anand76 | 02 February 2024, 19:48:35 UTC | Fix potential incorrect result for duplicate key in MultiGet (#12295) Summary: The RocksDB correctness testing has recently discovered a possible, but very unlikely, correctness issue with MultiGet. The issue happens when all of the below conditions are met - 1. Duplicate keys in a MultiGet batch 2. Key matches the last key in a non-zero, non-bottommost level file 3. Final value is not in the file (merge operand, not snapshot visible etc) 4. Multiple entries exist for the key in the file spanning more than 1 data block. This can happen due to snapshots, which would force multiple versions of the key in the file, and they may spill over to another data block 5. Lookup attempt in the SST for the first of the duplicates fails with IO error on a data block (NOT the first data block, but the second or subsequent uncached block), but no errors for the other duplicates 6. Value or merge operand for the key is present in the very next level The problem is, in FilePickerMultiGet, when looking up keys in a level we use FileIndexer and the overlapping file in the current level to determine the search bounds for that key in the file list in the next level. If the next level is empty, the search bounds are reset and we do a full binary search in the next non-empty level's LevelFilesBrief. However, under the conditions https://github.com/facebook/rocksdb/issues/1 and https://github.com/facebook/rocksdb/issues/2 listed above, only the first of the duplicates has its next-level search bounds updated, and the remaining duplicates are skipped. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12295 Test Plan: Add unit tests that fail an assertion or return wrong result without the fix Reviewed By: hx235 Differential Revision: D53187634 Pulled By: anand1976 fbshipit-source-id: a5eadf4fede9bbdec784cd993b15e3341436d1ea | 02 February 2024, 19:48:35 UTC |
046ac91 | Sanket Sanjeev Karnik | 02 February 2024, 03:09:25 UTC | Mark destructors as overridden (#12324) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12324 We are trying to use rocksdb inside Hedwig. This is causing some builds to fail D53033764. Hence fixing -Wsuggest-destructor-override warning. Reviewed By: jowlyzhang Differential Revision: D53328538 fbshipit-source-id: d5b9865442de049b18f9ed086df5fa58fb8880d5 | 02 February 2024, 03:09:25 UTC |
c6b1f6d | Changyu Bi | 01 February 2024, 22:35:03 UTC | Augment sst_dump tool to verify num_entries in table property (#12322) Summary: sst_dump --command=check can now compare number of keys in a file with num_entries in table property and reports corruption is there is a mismatch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12322 Test Plan: - new unit test for API `SstFileDumper::ReadSequential` - ran sst_dump on a good and a bad file: ``` sst_dump --file=./32316112.sst options.env is 0x7f68bfcb5000 Process ./32316112.sst Sst file format: block-based from [] to [] sst_dump --file=./32316115.sst options.env is 0x7f6d0d2b5000 Process ./32316115.sst Sst file format: block-based from [] to [] ./32316115.sst: Corruption: Table property has num_entries = 6050408 but scanning the table returns 6050406 records. ``` Reviewed By: jowlyzhang Differential Revision: D53320481 Pulled By: cbi42 fbshipit-source-id: d84c996346a9575a5a2ea5f5fb09a9d3ee672cd6 | 01 February 2024, 22:35:03 UTC |
f9d4535 | Andrew Kryczka | 01 February 2024, 00:30:26 UTC | Removed `check_flush_compaction_key_order` (#12311) Summary: `check_flush_compaction_key_order` option was introduced for the key order checking online validation. It gave users the ability to disable the validation without downgrade in case the validation caused inefficiencies or false positives. Over time this validation has shown to be cheap and correct, so the option to disable it can now be removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12311 Reviewed By: cbi42 Differential Revision: D53233379 Pulled By: ajkr fbshipit-source-id: 1384361104021d6e3e580dce2ec123f9f99ce637 | 01 February 2024, 00:30:26 UTC |
76c834e | Peter Dillinger | 31 January 2024, 21:14:42 UTC | Remove 'virtual' when implied by 'override' (#12319) Summary: ... to follow modern C++ style / idioms. Used this hack: ``` for FILE in `cat my_list_of_files`; do perl -pi -e 'BEGIN{undef $/;} s/ virtual( [^;{]* override)/$1/smg' $FILE; done ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12319 Test Plan: existing tests, CI Reviewed By: jaykorean Differential Revision: D53275303 Pulled By: pdillinger fbshipit-source-id: bc0881af270aa8ef4d0ae4f44c5a6614b6407377 | 31 January 2024, 21:14:42 UTC |
95d582e | Akanksha Mahajan | 31 January 2024, 20:37:42 UTC | Enable io_uring in stress test (#12313) Summary: Enable io_uring in stress test Pull Request resolved: https://github.com/facebook/rocksdb/pull/12313 Test Plan: Crash test Reviewed By: anand1976 Differential Revision: D53238319 Pulled By: akankshamahajan15 fbshipit-source-id: c0c8e6a6479f6977210370606e9d551c1299ba62 | 31 January 2024, 20:37:42 UTC |
d11584e | Yu Zhang | 31 January 2024, 19:12:52 UTC | Be consistent in key range overlap check (#12315) Summary: We should be consistent in how we check key range overlap in memtables and in sst files. While all the sst file key range overlap check compares the user key without timestamp, for example: https://github.com/facebook/rocksdb/blob/377eee77f8da3f5d232cf014db0c4ca232352883/db/version_set.cc#L129-L130 This key range overlap check for memtable is comparing the whole user key. Currently it happen to achieve the same effect because this function is only called by `ExternalSstFileIngestionJob` and `DBImpl::CompactRange`, which takes a user key without timestamp as the range end, pad a max or min timestamp to it depending on whether the end is exclusive. So use `Compartor::Compare` here is working too, but we should update it to `Comparator::CompareWithoutTimestamp` to be consistent with all the other file key range overlapping check functions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12315 Test Plan: existing tests Reviewed By: ltamasi Differential Revision: D53273456 Pulled By: jowlyzhang fbshipit-source-id: c094ae1f0c195d52542124c4fb03fdca14241e85 | 31 January 2024, 19:12:52 UTC |
acf77e1 | Peter Dillinger | 31 January 2024, 00:16:04 UTC | Fix possible crash test segfault in FileExpectedStateManager::Restore() (#12314) Summary: `replayer` could be `nullptr` if `!s.ok()` from an earlier failure. Also consider status returned from `record->Accept()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12314 Test Plan: blackbox_crash_test run Reviewed By: hx235 Differential Revision: D53241506 Pulled By: pdillinger fbshipit-source-id: fd330417c23391ca819c3ee0f69e4156d81934dc | 31 January 2024, 00:16:04 UTC |
377eee7 | Yu Zhang | 30 January 2024, 20:55:41 UTC | Fix race condition for accessing file size in TestFSWritableFile (#12312) Summary: Fix a race condition reported by thread sanitizer for accessing an underlying file's size from `TestFSWritableFile`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12312 Test Plan: COMPILE_WITH_TSAN=1 make -j10 transaction_test ./transaction_test --gtest_filter="DBAsBaseDB/TransactionTest.UnlockWALStallCleared/4" --gtest_repeat=100 Reviewed By: pdillinger Differential Revision: D53235231 Pulled By: jowlyzhang fbshipit-source-id: 35133cd97f8cbb48746ca3b42baeedecb36beb7b | 30 January 2024, 20:55:41 UTC |
2b42455 | Peter Dillinger | 30 January 2024, 19:58:31 UTC | Don't warn on (recursive) disable file deletion (#12310) Summary: To stop spamming our warning logs with normal behavior. Also fix comment on `DisableFileDeletions()`. In response to https://github.com/facebook/rocksdb/issues/12001 I've indicated my objection to granting legitimacy to force=true, but I'm not addressing that here and now. In short, the user shouldn't be asked to think about whether they want to use the *wrong* behavior. ;) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12310 Test Plan: existing tests Reviewed By: jowlyzhang Differential Revision: D53233117 Pulled By: pdillinger fbshipit-source-id: 5d2aedb76b02b30f8a5fa5b436fc57fde5d40d6e | 30 January 2024, 19:58:31 UTC |
b10c171 | Yu Zhang | 30 January 2024, 17:49:32 UTC | Remove WritableFile(FSWritableFile)::GetFileSize default implementation (#12303) Summary: As titled. This changes public API behavior, and subclasses of `WritableFile` and `FSWritableFile` need to explicitly provide an implementation for the `GetFileSize` method after this change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12303 Reviewed By: ajkr Differential Revision: D53205769 Pulled By: jowlyzhang fbshipit-source-id: 2e613ca3650302913821b33159b742bdf1d24bc7 | 30 January 2024, 17:49:32 UTC |
aacf60d | Andrew Kryczka | 30 January 2024, 01:29:04 UTC | Speedup based on number of files marked for compaction (#12306) Summary: RocksDB self throttles per-DB compaction parallelism until it detects compaction pressure. This PR adds pressure detection based on the number of files marked for compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12306 Reviewed By: cbi42 Differential Revision: D53200559 Pulled By: ajkr fbshipit-source-id: 63402ee336881a4539204d255960f04338ab7a0e | 30 January 2024, 01:29:04 UTC |
61ed0de | Peter Dillinger | 30 January 2024, 00:31:09 UTC | Add more detail to some statuses (#12307) Summary: and also fix comment/label on some MacOS CI jobs. Motivated by a crash test failure missing a definitive indicator of the genesis of the status: ``` file ingestion error: Operation failed. Try again.: ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12307 Test Plan: just cosmetic changes. These statuses should not arise frequently enough to be a performance issue (copying messages). Reviewed By: jaykorean Differential Revision: D53199529 Pulled By: pdillinger fbshipit-source-id: ad83daaa5d80f75c9f81158e90fb6d9ecca33fe3 | 30 January 2024, 00:31:09 UTC |
1d8c54a | Radek Hubner | 30 January 2024, 00:19:59 UTC | Fix build on OpenBSD i386 (#12142) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12142 Reviewed By: pdillinger Differential Revision: D53150218 Pulled By: ajkr fbshipit-source-id: a4c4d9d22d99e8a82d93d1a7ef37ec5326855cb5 | 30 January 2024, 00:19:59 UTC |
17042a3 | Yu Zhang | 29 January 2024, 23:28:37 UTC | Remove misspelled tickers used in error handler (#12302) Summary: As titled, the replacement tickers have been introduced in https://github.com/facebook/rocksdb/issues/11509 and in use since release 8.4. This PR completely removes the misspelled ones. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12302 Test Plan: CI tests Reviewed By: jaykorean Differential Revision: D53196935 Pulled By: jowlyzhang fbshipit-source-id: 9c9d0d321247690db5edfdc52b4fecb2f1218979 | 29 January 2024, 23:28:37 UTC |
b9cb7b9 | akankshamahajan | 29 January 2024, 23:08:20 UTC | Provide support for FSBuffer for point lookups (#12266) Summary: Provide support for FSBuffer for point lookups It also add support for compaction and scan reads that goes through BlockFetcher when readahead/prefetching is not enabled. Some of the compaction/Scan reads goes through FilePrefetchBuffer and some through BlockFetcher. This PR add support to use underlying file system scratch buffer for reads that go through BlockFetcher as for FilePrefetch reads, design is complicated to support this feature. Design - In order to use underlying FileSystem provided scratch for Reads, it uses MultiRead with 1 request instead of Read API which required API change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12266 Test Plan: Stress test using underlying file system scratch buffer internally. Reviewed By: anand1976 Differential Revision: D53019089 Pulled By: akankshamahajan15 fbshipit-source-id: 4fe3d090d77363320e4b67186fd4d51c005c0961 | 29 January 2024, 23:08:20 UTC |
0d68aff | Jay Huh | 29 January 2024, 20:52:59 UTC | StressTest - Move some stderr messages to stdout (#12304) Summary: Moving some of the messages that we print out in `stderr` to `stdout` to make `stderr` more strictly related to errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12304 Test Plan: ``` $> python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 ``` **Before** ``` ... stderr: WARNING: prefix_size is non-zero but memtablerep != prefix_hash Error : jewoongh injected test error This is not a real failure. Verification failed :( ``` **After** No longer seeing the `WARNING: prefix_size is non-zero but memtablerep != prefix_hash` message in stderr, but still appears in stdout. ``` WARNING: prefix_size is non-zero but memtablerep != prefix_hash Integrated BlobDB: blob files enabled 0, min blob size 0, blob file size 268435456, blob compression type NoCompression, blob GC enabled 0, cutoff 0.250000, force threshold 1.000000, blob compaction readahead size 0, blob file starting level 0 Integrated BlobDB: blob cache disabled DB path: [/tmp/jewoongh/rocksdb_crashtest_blackboxzztp281q] (Re-)verified 0 unique IDs 2024/01/29-11:57:51 Initializing worker threads Crash-recovery verification passed :) 2024/01/29-11:57:58 Starting database operations 2024/01/29-11:57:58 Starting verification Stress Test : 245.167 micros/op 10221 ops/sec : Wrote 0.00 MB (0.10 MB/sec) (16% of 6 ops) : Wrote 1 times : Deleted 0 times : Single deleted 0 times : 4 read and 0 found the key : Prefix scanned 0 times : Iterator size sum is 0 : Iterated 3 times : Deleted 0 key-ranges : Range deletions covered 0 keys : Got errors 0 times : 0 CompactFiles() succeed : 0 CompactFiles() did not succeed stderr: Error : jewoongh injected test error This is not a real failure. Error : jewoongh injected test error This is not a real failure. Error : jewoongh injected test error This is not a real failure. Verification failed :( ``` Reviewed By: akankshamahajan15 Differential Revision: D53193587 Pulled By: jaykorean fbshipit-source-id: 40d59f4c993c5ce043c571a207ccc9b74a0180c6 | 29 January 2024, 20:52:59 UTC |
fc48af3 | Chdy | 29 January 2024, 20:31:11 UTC | fix some perf statistic in write (#12285) Summary: ### Summary: perf context lack statistics in some write steps ``` rocksdb::get_perf_context()->write_wal_time); rocksdb::get_perf_context()->write_memtable_time); rocksdb::get_perf_context()->write_pre_and_post_process_time); ``` #### case 1: when the unordered_write is true, the `write_memtable_time` is 0 ``` write_wal_time : 13.7012 write_memtable_time : 0 write_pre_and_post_process_time : 142.037 ``` Reason: `DBImpl::UnorderedWriteMemtable` function has no statistical `write_memtable_time` during insert memtable, ```c++ Status DBImpl::UnorderedWriteMemtable(const WriteOptions& write_options, WriteBatch* my_batch, WriteCallback* callback, uint64_t log_ref, SequenceNumber seq, const size_t sub_batch_cnt) { ... if (w.CheckCallback(this) && w.ShouldWriteToMemtable()) { // need calculate write_memtable_time ColumnFamilyMemTablesImpl column_family_memtables( versions_->GetColumnFamilySet()); w.status = WriteBatchInternal::InsertInto( &w, w.sequence, &column_family_memtables, &flush_scheduler_, &trim_history_scheduler_, write_options.ignore_missing_column_families, 0 /*log_number*/, this, true /*concurrent_memtable_writes*/, seq_per_batch_, sub_batch_cnt, true /*batch_per_txn*/, write_options.memtable_insert_hint_per_batch); if (write_options.disableWAL) { has_unpersisted_data_.store(true, std::memory_order_relaxed); } } ... } ``` Fix: add perf function ``` write_wal_time : 14.3991 write_memtable_time : 19.3367 write_pre_and_post_process_time : 130.441 ``` #### case 2: when the enable_pipelined_write is true, the `write_memtable_time` is small ``` write_wal_time : 11.2986 write_memtable_time : 1.0205 write_pre_and_post_process_time : 140.131 ``` Fix: `DBImpl::UnorderedWriteMemtable` function has no statistical `write_memtable_time` when `w.state == WriteThread::STATE_PARALLEL_MEMTABLE_WRITER` ```c++ Status DBImpl::PipelinedWriteImpl(const WriteOptions& write_options, WriteBatch* my_batch, WriteCallback* callback, uint64_t* log_used, uint64_t log_ref, bool disable_memtable, uint64_t* seq_used) { ... if (w.state == WriteThread::STATE_PARALLEL_MEMTABLE_WRITER) { // need calculate write_memtable_time assert(w.ShouldWriteToMemtable()); ColumnFamilyMemTablesImpl column_family_memtables( versions_->GetColumnFamilySet()); w.status = WriteBatchInternal::InsertInto( &w, w.sequence, &column_family_memtables, &flush_scheduler_, &trim_history_scheduler_, write_options.ignore_missing_column_families, 0 /*log_number*/, this, true /*concurrent_memtable_writes*/, false /*seq_per_batch*/, 0 /*batch_cnt*/, true /*batch_per_txn*/, write_options.memtable_insert_hint_per_batch); if (write_thread_.CompleteParallelMemTableWriter(&w)) { MemTableInsertStatusCheck(w.status); versions_->SetLastSequence(w.write_group->last_sequence); write_thread_.ExitAsMemTableWriter(&w, *w.write_group); } } if (seq_used != nullptr) { *seq_used = w.sequence; } assert(w.state == WriteThread::STATE_COMPLETED); return w.FinalStatus(); } ``` FIx: add perf function ``` write_wal_time : 10.5201 write_memtable_time : 17.1048 write_pre_and_post_process_time : 114.313 ``` #### case3: `DBImpl::WriteImplWALOnly` function has no statistical `write_delay_time` ```c++ Status DBImpl::WriteImplWALOnly( WriteThread* write_thread, const WriteOptions& write_options, WriteBatch* my_batch, WriteCallback* callback, uint64_t* log_used, const uint64_t log_ref, uint64_t* seq_used, const size_t sub_batch_cnt, PreReleaseCallback* pre_release_callback, const AssignOrder assign_order, const PublishLastSeq publish_last_seq, const bool disable_memtable) { ... if (publish_last_seq == kDoPublishLastSeq) { } else { // need calculate write_delay_time InstrumentedMutexLock lock(&mutex_); Status status = DelayWrite(/*num_bytes=*/0ull, *write_thread, write_options); if (!status.ok()) { WriteThread::WriteGroup write_group; write_thread->EnterAsBatchGroupLeader(&w, &write_group); write_thread->ExitAsBatchGroupLeader(write_group, status); return status; } } } ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12285 Reviewed By: ajkr Differential Revision: D53191765 Pulled By: cbi42 fbshipit-source-id: f78d5b280bea6a777f077c89c3e0b8fe98d3c860 | 29 January 2024, 20:31:11 UTC |
071a146 | Yu Zhang | 29 January 2024, 19:37:34 UTC | Add support for range deletion when user timestamps are not persisted (#12254) Summary: For the user defined timestamps in memtable only feature, some special handling for range deletion blocks are needed since both the key (start_key) and the value (end_key) of a range tombstone can contain user-defined timestamps. Handling for the key is taken care of in the same way as the other data blocks in the block based table. This PR adds the special handling needed for the value (end_key) part. This includes: 1) On the write path, when L0 SST files are first created from flush, user-defined timestamps are removed from an end key of a range tombstone. There are places where it's logically removed (replaced with a min timestamp) because there is still logic with the running comparator that expects a user key that contains timestamp. And in the block based builder, it is eventually physically removed before persisted in a block. 2) On the read path, when range deletion block is being read, we artificially pad a min timestamp to the end key of a range tombstone in `BlockBasedTableReader`. 3) For file boundary `FileMetaData.largest`, we artificially pad a max timestamp to it if it contains a range deletion sentinel. Anytime when range deletion end_key is used to update file boundaries, it's using max timestamp instead of the range tombstone's actual timestamp to mark it as an exclusive end. https://github.com/facebook/rocksdb/blob/d69628e6ced20ff859381d1eda55675f7f93a0eb/db/dbformat.h#L923-L935 This max timestamp is removed when in memory `FileMetaData.largest` is persisted into Manifest, we pad it back when it's read from Manifest while handling related `VersionEdit` in `VersionEditHandler`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12254 Test Plan: Added unit test and enabled this feature combination's stress test. Reviewed By: cbi42 Differential Revision: D52965527 Pulled By: jowlyzhang fbshipit-source-id: e8315f8a2c5268e2ae0f7aec8012c266b86df985 | 29 January 2024, 19:37:34 UTC |
8829ba9 | Jay Huh | 29 January 2024, 19:09:47 UTC | print stderr separately per option (#12301) Summary: While working on Meta's internal test triaging process, I found that `db_crashtest.py` was printing out `stdout` and `stderr` altogether. Adding an option to print `stderr` separately so that it's easy to extract only `stderr` from the test run. `print_stderr_separately` is introduced as an optional parameter with default value `False` to keep the existing behavior as is (except a few minor changes). Minor changes to the existing behavior - We no longer print `stderr has error message:` and `***` prefix to each line. We simply print `stderr:` before printing `stderr` if stderr is printed in stdout and print `stderr` as is. - We no longer print `times error occurred in output is ...` which doesn't appear to have any values Pull Request resolved: https://github.com/facebook/rocksdb/pull/12301 Test Plan: **Default Behavior (blackbox)** Run printed everything as is ``` $> python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 2> /tmp/error.log Running blackbox-crash-test with interval_between_crash=120 total-duration=6000 ... Integrated BlobDB: blob files enabled 0, min blob size 0, blob file size 268435456, blob compression type NoCompression, blob GC enabled 0, cutoff 0.250000, force threshold 1.000000, blob compaction readahead size 0, blob file starting level 0 Integrated BlobDB: blob cache disabled DB path: [/tmp/jewoongh/rocksdb_crashtest_blackboxwh7yxpec] (Re-)verified 0 unique IDs 2024/01/29-09:16:30 Initializing worker threads Crash-recovery verification passed :) 2024/01/29-09:16:35 Starting database operations 2024/01/29-09:16:35 Starting verification Stress Test : 543.600 micros/op 8802 ops/sec : Wrote 0.00 MB (0.27 MB/sec) (50% of 10 ops) : Wrote 5 times : Deleted 1 times : Single deleted 0 times : 4 read and 0 found the key : Prefix scanned 0 times : Iterator size sum is 0 : Iterated 0 times : Deleted 0 key-ranges : Range deletions covered 0 keys : Got errors 0 times : 0 CompactFiles() succeed : 0 CompactFiles() did not succeed stderr: WARNING: prefix_size is non-zero but memtablerep != prefix_hash Error : jewoongh injected test error This is not a real failure. Verification failed :( ``` Nothing in stderr ``` $> cat /tmp/error.log ``` **Default Behavior (whitebox)** Run printed everything as is ``` $> python3 tools/db_crashtest.py whitebox --simple --max_key=25000000 --write_buffer_size=4194304 2> /tmp/error.log Running whitebox-crash-test with total-duration=10000 ... (Re-)verified 571 unique IDs 2024/01/29-09:33:53 Initializing worker threads Crash-recovery verification passed :) 2024/01/29-09:35:16 Starting database operations 2024/01/29-09:35:16 Starting verification Stress Test : 97248.125 micros/op 10 ops/sec : Wrote 0.00 MB (0.00 MB/sec) (12% of 8 ops) : Wrote 1 times : Deleted 0 times : Single deleted 0 times : 4 read and 1 found the key : Prefix scanned 1 times : Iterator size sum is 120868 : Iterated 4 times : Deleted 0 key-ranges : Range deletions covered 0 keys : Got errors 0 times : 0 CompactFiles() succeed : 0 CompactFiles() did not succeed stderr: WARNING: prefix_size is non-zero but memtablerep != prefix_hash Error : jewoongh injected test error This is not a real failure. New cache capacity = 4865393 Verification failed :( TEST FAILED. See kill option and exit code above!!! ``` Nothing in stderr ``` $> cat /tmp/error.log ``` **New option added (blackbox)** ``` $> python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --print_stderr_separately 2> /tmp/error.log Running blackbox-crash-test with interval_between_crash=120 total-duration=6000 ... Integrated BlobDB: blob files enabled 0, min blob size 0, blob file size 268435456, blob compression type NoCompression, blob GC enabled 0, cutoff 0.250000, force threshold 1.000000, blob compaction readahead size 0, blob file starting level 0 Integrated BlobDB: blob cache disabled DB path: [/tmp/jewoongh/rocksdb_crashtest_blackbox7ybna32z] (Re-)verified 0 unique IDs Compaction filter factory: DbStressCompactionFilterFactory 2024/01/29-09:05:39 Initializing worker threads Crash-recovery verification passed :) 2024/01/29-09:05:46 Starting database operations 2024/01/29-09:05:46 Starting verification Stress Test : 235.917 micros/op 16000 ops/sec : Wrote 0.00 MB (0.16 MB/sec) (16% of 12 ops) : Wrote 2 times : Deleted 1 times : Single deleted 0 times : 9 read and 0 found the key : Prefix scanned 0 times : Iterator size sum is 0 : Iterated 0 times : Deleted 0 key-ranges : Range deletions covered 0 keys : Got errors 0 times : 0 CompactFiles() succeed : 0 CompactFiles() did not succeed ``` stderr printed separately ``` $> cat /tmp/error.log WARNING: prefix_size is non-zero but memtablerep != prefix_hash Error : jewoongh injected test error This is not a real failure. New cache capacity = 19461571 Verification failed :( ``` **New option added (whitebox)** ``` $> python3 tools/db_crashtest.py whitebox --simple --max_key=25000000 --write_buffer_size=4194304 --print_stderr_separately 2> /tmp/error.log Running whitebox-crash-test with total-duration=10000 ... Integrated BlobDB: blob files enabled 0, min blob size 0, blob file size 268435456, blob compression type NoCompression, blob GC enabled 0, cutoff 0.250000, force threshold 1.000000, blob compaction readahead size 0, blob file starting level 0 Integrated BlobDB: blob cache disabled DB path: [/tmp/jewoongh/rocksdb_crashtest_whiteboxtwj0ihn6] (Re-)verified 157 unique IDs 2024/01/29-09:39:59 Initializing worker threads Crash-recovery verification passed :) 2024/01/29-09:40:16 Starting database operations 2024/01/29-09:40:16 Starting verification Stress Test : 742.474 micros/op 11801 ops/sec : Wrote 0.00 MB (0.27 MB/sec) (36% of 19 ops) : Wrote 7 times : Deleted 1 times : Single deleted 0 times : 8 read and 0 found the key : Prefix scanned 0 times : Iterator size sum is 0 : Iterated 4 times : Deleted 0 key-ranges : Range deletions covered 0 keys : Got errors 0 times : 0 CompactFiles() succeed : 0 CompactFiles() did not succeed TEST FAILED. See kill option and exit code above!!! ``` stderr printed separately ``` $> cat /tmp/error.log WARNING: prefix_size is non-zero but memtablerep != prefix_hash Error : jewoongh injected test error This is not a real failure. Error : jewoongh injected test error This is not a real failure. Error : jewoongh injected test error This is not a real failure. New cache capacity = 4865393 Verification failed :( ``` Reviewed By: akankshamahajan15 Differential Revision: D53187491 Pulled By: jaykorean fbshipit-source-id: 76f9100d08b96d014e41b7b88b206d69f0ae932b | 29 January 2024, 19:09:47 UTC |
4e60663 | Peter Dillinger | 29 January 2024, 18:38:08 UTC | Remove unnecessary, confusing 'extern' (#12300) Summary: In C++, `extern` is redundant in a number of cases: * "Global" function declarations and definitions * "Global" variable definitions when already declared `extern` For consistency and simplicity, I've removed these in code that *we own*. In a couple of cases, I removed obsolete declarations, and for MagicNumber constants, I have consolidated the declarations into a header file (format.h) as standard best practice would prescribe. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12300 Test Plan: no functional changes, CI Reviewed By: ajkr Differential Revision: D53148629 Pulled By: pdillinger fbshipit-source-id: fb8d927959892e03af09b0c0d542b0a3b38fd886 | 29 January 2024, 18:38:08 UTC |
36704e9 | akankshamahajan | 26 January 2024, 19:39:47 UTC | Improve crash test script to not rely on std::errors for failures. (#12265) Summary: Right now crash_test relies on std::errors too to check for only errors/failures along with verification. However, that's not a reliable solution and many internal services logs benign errors/warnings in which case our test script fails. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12265 Test Plan: Keep std::errors but printout instead of failing and will monitor crash tests internally to see if there is any scenario which solely relies on std::error, in which case stress tests can be improve. Reviewed By: ajkr, cbi42 Differential Revision: D52967000 Pulled By: akankshamahajan15 fbshipit-source-id: 5328c8b69480c7946fe6a9c72f9ffeede70ac2ad | 26 January 2024, 19:39:47 UTC |
f2ddb92 | Radek Hubner | 26 January 2024, 17:13:03 UTC | Fix database open with column family. (#12167) Summary: When is RocksDB is opened with Column Family descriptors, the default column family must be set properly. If it was not, then the flush operation will fail. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12167 Reviewed By: ajkr Differential Revision: D53104007 Pulled By: cbi42 fbshipit-source-id: dffa8e34a4b2a438553ee4ea308f3fa2e22e46f7 | 26 January 2024, 17:13:03 UTC |
2233a2f | Changyu Bi | 26 January 2024, 17:12:07 UTC | Enhance corruption status message for record mismatch in compaction (#12297) Summary: ... to include the actual numbers of processed and expected records, and the file number for input files. The purpose is to be able to find the offending files even when the relevant LOG file is gone. Another change is to check the record count even when `compaction_verify_record_count` is false, and log a warning message without setting corruption status if there is a mismatch. This is consistent with how we check the record count for flush. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12297 Test Plan: print the status message in `DBCompactionTest.VerifyRecordCount` ``` before Corruption: Compaction number of input keys does not match number of keys processed. after Compaction number of input keys does not match number of keys processed. Expected 20 but processed 10. Compaction summary: Base version 4 Base level 0, inputs: [11(2156B) 9(2156B)] ``` Reviewed By: ajkr Differential Revision: D53110130 Pulled By: cbi42 fbshipit-source-id: 6325cbfb8f71f25ce37f23f8277ebe9264863c3b | 26 January 2024, 17:12:07 UTC |
a31fded | Hui Xiao | 26 January 2024, 02:20:31 UTC | Pass rate_limiter_priority from SequentialFileReader to FS (#12296) Summary: **Context/Summary:** The rate_limiter_priority passed to SequentialFileReader is now passed down to underlying file system. This allows the priority associated with backup/restore SST reads to be exposed to FS. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12296 Test Plan: - Modified existing UT Reviewed By: pdillinger Differential Revision: D53100368 Pulled By: hx235 fbshipit-source-id: b4a28917efbb1b0d16f9d1c2b38769bffcff0f34 | 26 January 2024, 02:20:31 UTC |
f046a8f | Peter Dillinger | 25 January 2024, 22:40:18 UTC | Deflake ColumnFamilyTest.WriteStallSingleColumnFamily (#12294) Summary: https://github.com/facebook/rocksdb/issues/12267 apparently introduced a data race in test code where a background read of estimated_compaction_needed_bytes while holding the DB mutex could race with forground write for testing purposes. This change adds the DB mutex to those writes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12294 Test Plan: 1000 TSAN runs of test (massively fails before change, passes after) Reviewed By: ajkr Differential Revision: D53095483 Pulled By: pdillinger fbshipit-source-id: 13fcb383ebad313dabe39eb8f9085c34d370b54a | 25 January 2024, 22:40:18 UTC |
c3bff1c | zaidoon | 25 January 2024, 20:36:40 UTC | Allow setting Stderr Logger via C API (#12262) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12262 Reviewed By: pdillinger Differential Revision: D53027616 Pulled By: ajkr fbshipit-source-id: 2e88e53e0c02447c613439f5528161ea1340b323 | 25 January 2024, 20:36:40 UTC |
0bf9079 | Radek Hubner | 25 January 2024, 20:36:30 UTC | Change Java native methods to static (#11882) Summary: This should give us some performance benefit calling native C++ code. Closes https://github.com/facebook/rocksdb/issues/4786 See https://github.com/evolvedbinary/jni-benchmarks/ for more info. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11882 Reviewed By: pdillinger Differential Revision: D53066207 Pulled By: ajkr fbshipit-source-id: daedef185215d0d8e791cd85bef598900bcb5bf2 | 25 January 2024, 20:36:30 UTC |
46e8c44 | Radek Hubner | 25 January 2024, 20:36:03 UTC | Generate the same output for cmake rocksdbjava as for make. (#12093) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12093 Reviewed By: pdillinger Differential Revision: D53066122 Pulled By: ajkr fbshipit-source-id: 9fcb037dbbae35091a6e47b695bfa4d643ed7d65 | 25 January 2024, 20:36:03 UTC |
054c00e | Radek Hubner | 25 January 2024, 20:35:27 UTC | Fix typo in CMakeList. (#12247) Summary: Fix https://github.com/facebook/rocksdb/issues/12237 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12247 Reviewed By: pdillinger Differential Revision: D53066154 Pulled By: ajkr fbshipit-source-id: 0a86c2f3e6cc28f3f52af33d4414ae06b03e3bf1 | 25 January 2024, 20:35:27 UTC |
96fb7de | Hui Xiao | 25 January 2024, 20:00:15 UTC | Rate-limit un-ratelimited flush/compaction code paths (#12290) Summary: **Context/Summary:** We recently found out some code paths in flush and compaction aren't rate-limited when they should. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12290 Test Plan: existing UT** Reviewed By: anand1976 Differential Revision: D53066103 Pulled By: hx235 fbshipit-source-id: 9dc4cab5f841230d18e5504dc480ac523e9d3950 | 25 January 2024, 20:00:15 UTC |
d895eb0 | Peter Dillinger | 25 January 2024, 19:27:15 UTC | Fix UB/crash in new SeqnoToTimeMapping::CopyFromSeqnoRange (#12293) Summary: After https://github.com/facebook/rocksdb/issues/12253 this function has crashed in the crash test, in its call to `std::copy`. I haven't reproduced the crash directly, but `std::copy` probably has undefined behavior if the starting iterator is after the ending iterator, which was possible. I've fixed the logic to deal with that case and to add an assertion to check that precondition of `std::copy` (which appears can be unchecked by `std::copy` itself even with UBSAN+ASAN). Also added some unit tests etc. that were unfinished for https://github.com/facebook/rocksdb/issues/12253, and slightly tweak SeqnoToTimeMapping::EnforceMaxTimeSpan handling of zero time span case. This is intended for patching 8.11. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12293 Test Plan: tests added. Will trigger ~20 runs of the crash test job that saw the crash. https://fburl.com/ci/5iiizvfa Reviewed By: jowlyzhang Differential Revision: D53090422 Pulled By: pdillinger fbshipit-source-id: 69d60b1847d9c7e4ae62b153011c2040405db461 | 25 January 2024, 19:27:15 UTC |
11d4ac8 | Changyu Bi | 25 January 2024, 18:44:00 UTC | Add some missing status checks in SstFileWriter (#12281) Summary: Add some missing status checks and documentation for SstFileWriter. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12281 Test Plan: existing unit tests. Reviewed By: jaykorean, ajkr Differential Revision: D53064036 Pulled By: cbi42 fbshipit-source-id: 686d90e24c18c8a4ee81668663a7780a69a45d4c | 25 January 2024, 18:44:00 UTC |
3812a77 | Changyu Bi | 25 January 2024, 18:37:11 UTC | Deflake `DBCompactionTest.BottomPriCompactionCountsTowardConcurrencyLimit` (#12289) Summary: The test has been failing with ``` [ RUN ] DBCompactionTest.BottomPriCompactionCountsTowardConcurrencyLimit db/db_compaction_test.cc:9661: Failure Expected equality of these values: 0u Which is: 0 env_->GetThreadPoolQueueLen(Env::Priority::LOW) Which is: 1 ``` This can happen when thread pool queue len is checked before `test::SleepingBackgroundTask::DoSleepTask` is scheduled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12289 Reviewed By: ajkr Differential Revision: D53064300 Pulled By: cbi42 fbshipit-source-id: 9ed1b714243880f82bd1cc1584b402ac9cf57507 | 25 January 2024, 18:37:11 UTC |
438fc3d | Hui Xiao | 25 January 2024, 17:58:25 UTC | No consistency check when compaction filter is enabled in stress test (#12291) Summary: **Context/Summary:** [Consistency check between Multiget and Get](https://github.com/facebook/rocksdb/blob/d82d179a5edc57e7de395e5db6f224d53e87c0cd/db_stress_tool/no_batched_ops_stress.cc#L585-L591) requires snapshot to be repeatable, that is, no keys being protected by this snapshot is deleted. However compaction filter can remove keys under snapshot, e,g,[DBStressCompactionFilter](https://github.com/facebook/rocksdb/blob/d82d179a5edc57e7de395e5db6f224d53e87c0cd/db_stress_tool/db_stress_compaction_filter.h#L59), which makes consistency check fail meaninglessly. This is noted in https://github.com/facebook/rocksdb/wiki/Compaction-Filter - "Since release 6.0, with compaction filter enabled, RocksDB always invoke filtering for any key, even if it knows it will make a snapshot not repeatable." This PR makes consistency check happens only when compaction filter is not enabled in stress test Pull Request resolved: https://github.com/facebook/rocksdb/pull/12291 Test Plan: - Make the stress test command that fails on consistency check pass ``` ./db_stress --preserve_unverified_changes=1 --acquire_snapshot_one_in=0 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=0 --block_size=16384 --bloom_before_level=2147483647 --bloom_bits=30.729729833325962 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=4 --compaction_readahead_size=0 --compaction_ttl=0 --compressed_secondary_cache_size=8388608 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zlib --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=0 --delpercent=0 --delrangepercent=50 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=1 --disable_auto_compactions=1 --enable_pipelined_write=0 --enable_thread_tracking=0 --expected_values_dir=$expected --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=xxh64 --flush_one_in=0 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=3 --index_type=0 --ingest_external_file_one_in=100 --initial_auto_readahead_size=0 --iterpercent=0 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=0 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=1000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=2097152 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000 --optimize_filters_for_memory=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=8 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --set_options_one_in=0 --snapshot_hold_ops=0 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=0 --subcompactions=1 --sync=0 --sync_fault_injection=0 --target_file_size_base=167772 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=3 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=0 --verify_checksum_one_in=0 --verify_db_one_in=0 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=0 --verify_sst_unique_id_in_manifest=0 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=335544 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=5 ``` Reviewed By: jaykorean, ajkr Differential Revision: D53075223 Pulled By: hx235 fbshipit-source-id: 61aa4a79de5d123a55eb5ac08b449a8362cc91ae | 25 January 2024, 17:58:25 UTC |
928aca8 | Yu Zhang | 25 January 2024, 07:30:08 UTC | Skip searching through lsm tree for a target level when files overlap (#12284) Summary: While ingesting multiple external files with key range overlap, current flow go through the lsm tree to do a search for a target level and later discard that result by defaulting back to L0. This PR improves this by just skip the search altogether. The other change is to remove default to L0 for the combination of universal compaction + force global sequence number, which was initially added to meet a pre https://github.com/facebook/rocksdb/issues/7421 invariant. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12284 Test Plan: Added unit test: ./external_sst_file_test --gtest_filter="*IngestFileWithGlobalSeqnoAssignedUniversal*" Reviewed By: ajkr Differential Revision: D53072238 Pulled By: jowlyzhang fbshipit-source-id: 30943e2e284a7f23b495c0ea4c80cb166a34a8ac | 25 January 2024, 07:30:08 UTC |
d82d179 | chuhao zeng | 25 January 2024, 00:16:18 UTC | Enhance ldb_cmd_tool to enable user pass in customized cfds (#12261) Summary: The current implementation of the ldb_cmd tool involves commenting out the user-passed column_family_descriptors, resulting in the tool consistently constructing its column_family_descriptors from the pre-existing OPTIONS file. The proposed fix prioritizes user-passed column family descriptors, ensuring they take precedence over those specified in the OPTIONS file. This modification enhances the tool's adaptability and responsiveness to user configurations. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12261 Reviewed By: cbi42 Differential Revision: D52965877 Pulled By: ajkr fbshipit-source-id: 334a83a8e1004c271b19e7ca09381a0e7cf87b03 | 25 January 2024, 00:16:18 UTC |
59f4cbe | Jay Huh | 24 January 2024, 19:35:12 UTC | MultiGet support in ldb (#12283) Summary: While investigating test failures due to the inconsistency between `Get()` and `MultiGet()`, I realized that LDB currently doesn't support `MultiGet()`. This PR introduces the `MultiGet()` support in LDB. Tested the command manually. Unit test will follow in a separate PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12283 Test Plan: When key not found ``` $> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000002AB Status for key 0x0000000000000009000000000000012B00000000000002AB: NotFound: ``` Compare the same key with get ``` $> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex get 0x0000000000000009000000000000012B00000000000002AB Failed: Get failed: NotFound: ``` Multiple keys not found ``` $> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000002AB 0x0000000000000009000000000000012B00000000000002AC Status for key 0x0000000000000009000000000000012B00000000000002AB: NotFound: Status for key 0x0000000000000009000000000000012B00000000000002AC: NotFound: ``` One of the keys found ``` $> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000002AB 0x00000000000000090000000000000026787878787878 Status for key 0x0000000000000009000000000000012B00000000000002AB: NotFound: 0x22000000262724252A2B28292E2F2C2D32333031363734353A3B38393E3F3C3D02030001060704050A0B08090E0F0C0D12131011161714151A1B18191E1F1C1D ``` All of the keys found ``` $> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000000D8 0x00000000000000090000000000000026787878787878 15:57:03 0x47000000434241404F4E4D4C4B4A494857565554535251505F5E5D5C5B5A595867666564636261606F6E6D6C6B6A696877767574737271707F7E7D7C7B7A797807060504030201000F0E0D0C0B0A090817161514131211101F1E1D1C1B1A1918 0x22000000262724252A2B28292E2F2C2D32333031363734353A3B38393E3F3C3D02030001060704050A0B08090E0F0C0D12131011161714151A1B18191E1F1C1D ``` Reviewed By: hx235 Differential Revision: D53048519 Pulled By: jaykorean fbshipit-source-id: a6217905464c5f460a222e2b883bdff47b9dd9c7 | 24 January 2024, 19:35:12 UTC |
1b2b16b | Hui Xiao | 24 January 2024, 19:21:05 UTC | Fix bug of newer ingested data assigned with an older seqno (#12257) Summary: **Context:** We found an edge case where newer ingested data is assigned with an older seqno. This causes older data of that key to be returned for read. Consider the following lsm shape: ![image](https://github.com/facebook/rocksdb/assets/83968999/973fd160-5065-49cd-8b7b-b6ab4badae23) Then ingest a file to L5 containing new data of key_overlap. Because of [this](https://l.facebook.com/l.php?u=https%3A%2F%2Fgithub.com%2Ffacebook%2Frocksdb%2Fblob%2F5a26f392ca640818da0b8590be6119699e852b07%2Fdb%2Fexternal_sst_file_ingestion_job.cc%3Ffbclid%3DIwAR10clXxpUSrt6sYg12sUMeHfShS7XigFrsJHvZoUDroQpbj_Sb3dG_JZFc%23L951-L956&h=AT0m56P7O0ZML7jk1sdjgnZZyGPMXg9HkKvBEb8mE9ZM3fpJjPrArAMsaHWZQPt9Ki-Pn7lv7x-RT9NEd_202Y6D2juIVHOIt3EjCZptDKBLRBMG49F8iBUSM9ypiKe8XCfM-FNW2Hl4KbVq2e3nZRbMvUM), the file is assigned with seqno 2, older than the old data's seqno 4. After just another compaction, we will drop the new_v for key_overlap because of the seqno and cause older data to be returned. ![image](https://github.com/facebook/rocksdb/assets/83968999/a3ef95e4-e7ae-4c30-8d03-955cd4b5ed42) **Summary:** This PR removes the incorrect seqno assignment Pull Request resolved: https://github.com/facebook/rocksdb/pull/12257 Test Plan: - New unit test failed before the fix but passes after - python3 tools/db_crashtest.py --compaction_style=1 --ingest_external_file_one_in=10 --preclude_last_level_data_seconds=36000 --compact_files_one_in=10 --enable_blob_files=0 blackbox` - Rehearsal stress test Reviewed By: cbi42 Differential Revision: D52926092 Pulled By: hx235 fbshipit-source-id: 9e4dade0f6cc44e548db8fca27ccbc81a621cd6f | 24 January 2024, 19:21:05 UTC |
9243f1b | Yu Zhang | 24 January 2024, 19:04:40 UTC | Ensures PendingExpectedValue either Commit or Rollback (#12244) Summary: This PR adds automatic checks in the `PendingExpectedValue` class to make sure it's either committed or rolled back before being destructed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12244 Reviewed By: hx235 Differential Revision: D52853794 Pulled By: jowlyzhang fbshipit-source-id: 1dcd7695f2c52b79695be0abe11e861047637dc4 | 24 January 2024, 19:04:40 UTC |
b31f324 | Peter Dillinger | 24 January 2024, 18:14:22 UTC | Fix flaky test shutdown race in seqno_time_test (#12282) Summary: Seen in build-macos-cmake: ``` Received signal 11 (Segmentation fault: 11) https://github.com/facebook/rocksdb/issues/1 rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0::operator()(void*) const (in seqno_time_test) (mock_time_env.cc:29) https://github.com/facebook/rocksdb/issues/2 decltype(std::declval<rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0&>()(std::declval<void*>())) std::__1::__invoke[abi:v15006]<rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0&, void*>(rocksdb::MockSystemClock::InstallTimedWait ixCallback()::$_0&, void*&&) (in seqno_time_test) (invoke.h:394) ... ``` This is presumably because the std::function from the lambda only saves a copy of the SeqnoTimeTest* this pointer, which doesn't prevent it from being reclaimed on parallel shutdown. If we instead save a copy of the `std::shared_ptr<MockSystemClock>` in the std::function, this should prevent the crash. (Note that in `SyncPoint::Data::Process()` copies the std::function before releasing the mutex for calling the callback.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12282 Test Plan: watch CI Reviewed By: cbi42 Differential Revision: D53027136 Pulled By: pdillinger fbshipit-source-id: 26cd9c0352541d806d42bb061dd349d3b47171a5 | 24 January 2024, 18:14:22 UTC |
fc25ac0 | Richard Barnes | 24 January 2024, 16:20:50 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/util/ribbon_impl.h (#12269) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12269 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969093 fbshipit-source-id: 0520085819fa785679c859b63b877931d3f71f2c | 24 January 2024, 16:20:50 UTC |
1463314 | Richard Barnes | 24 January 2024, 15:39:59 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/util/murmurhash.cc (#12270) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12270 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52965944 fbshipit-source-id: 625d47662e984db9ce06e72ff39025b8a24aa246 | 24 January 2024, 15:39:59 UTC |
28ba896 | Richard Barnes | 24 January 2024, 15:38:17 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/include/rocksdb/slice_transform.h (#12275) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12275 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969065 fbshipit-source-id: cf2fcdc006d3b45fb54fb700a8ebefb14b42de0d | 24 January 2024, 15:38:17 UTC |
f099032 | Richard Barnes | 24 January 2024, 15:37:31 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/utilities/env_mirror.cc (#12271) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12271 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969070 fbshipit-source-id: 22e0958ad6ced5c021ef7dafbe16a17c282935d8 | 24 January 2024, 15:37:31 UTC |
502a175 | Richard Barnes | 24 January 2024, 15:25:27 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/utilities/transactions/lock/range/range_tree/lib/locktree/manager.cc (#12276) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12276 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969073 fbshipit-source-id: 1b2495548d939c32e7a89a6424767497fab9550e | 24 January 2024, 15:25:27 UTC |
0797616 | Richard Barnes | 24 January 2024, 15:24:06 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/utilities/transactions/write_unprepared_txn.h (#12273) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12273 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969166 fbshipit-source-id: 129715bfe69735b83b077c7d6cbf1786c1dfc410 | 24 January 2024, 15:24:06 UTC |
1f3e3ea | Richard Barnes | 24 January 2024, 15:22:49 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/env/env_encryption.cc (#12274) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12274 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969133 fbshipit-source-id: f5a8452af25a5a51d5c7e4045baef12575022da9 | 24 January 2024, 15:22:49 UTC |
532c940 | Richard Barnes | 24 January 2024, 15:22:45 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/utilities/transactions/transaction_base.h (#12272) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12272 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969170 fbshipit-source-id: 581304039be789cbce6760740e9557a925e02722 | 24 January 2024, 15:22:45 UTC |
8c01cb7 | Richard Barnes | 24 January 2024, 15:22:31 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/include/rocksdb/table.h (#12277) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12277 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969088 fbshipit-source-id: cd83cb3cd98b1389ddfe3e5e316f088eb5975b9f | 24 January 2024, 15:22:31 UTC |
3079a7e | Richard Barnes | 24 January 2024, 15:22:10 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/db/internal_stats.h (#12278) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12278 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969116 fbshipit-source-id: 8cb28dafdbede54e8cb59c2b8d461b1eddb3de68 | 24 January 2024, 15:22:10 UTC |
5eebfaa | Richard Barnes | 24 January 2024, 15:16:00 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/utilities/fault_injection_fs.h (#12279) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12279 `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: jaykorean Differential Revision: D52969150 fbshipit-source-id: a66326e2f8285625c4260f4d23df678a25bcfe29 | 24 January 2024, 15:16:00 UTC |
3ef9092 | Changyu Bi | 23 January 2024, 18:07:06 UTC | Print additional information when flaky test DBTestWithParam.ThreadStatusSingleCompaction fails (#12268) Summary: The test is [flaky](https://github.com/facebook/rocksdb/actions/runs/7616272304/job/20742657041?pr=12257&fbclid=IwAR1vNI1rSRVKnOsXs0WCPklqTkBXxlwS1GMJgWWe7D8dtAvh6e6wxk067FY) but I could not reproduce the test failure. Add some debug print to make the next failure more helpful Pull Request resolved: https://github.com/facebook/rocksdb/pull/12268 Test Plan: ``` check print works when test fails: [ RUN ] DBTestWithParam/DBTestWithParam.ThreadStatusSingleCompaction/0 thread id: 6134067200, thread status: thread id: 6133493760, thread status: Compaction db/db_test.cc:4680: Failure Expected equality of these values: op_count Which is: 1 expected_count Which is: 0 ``` Reviewed By: hx235 Differential Revision: D52987503 Pulled By: cbi42 fbshipit-source-id: 33b369796f9b97155578b45167e722ddcde93594 | 23 January 2024, 18:07:06 UTC |
dee4686 | Richard Barnes | 23 January 2024, 17:41:29 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/port/lang.h Summary: `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: dmm-fb Differential Revision: D52968990 fbshipit-source-id: 58d344b719734c736cd80d47eeb6965557ce344b | 23 January 2024, 17:41:29 UTC |
7fe9316 | Andrew Kryczka | 23 January 2024, 17:14:59 UTC | Log pending compaction bytes in a couple places (#12267) Summary: This PR adds estimated pending compaction bytes in two places: - The "Level summary", which is printed to the info LOG after every flush or compaction - The "rocksdb.cfstats" property, which is printed to the info LOG periodically according to `stats_dump_period_sec` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12267 Test Plan: Ran `./db_bench -benchmarks=filluniquerandom -stats_dump_period_sec=1 -statistics=true -write_buffer_size=524288` and looked at the LOG. ``` ** Compaction Stats [default] ** ... Estimated pending compaction bytes: 12117691 ... 2024/01/22-13:15:12.283563 1572872 (Original Log Time 2024/01/22-13:15:12.283540) [/db_impl/db_impl_compaction_flush.cc:371] [default] Level summary: files[10 1 0 0 0 0 0] max score 0.50, estimated pending compaction bytes 12359137 ``` Reviewed By: cbi42 Differential Revision: D52973337 Pulled By: ajkr fbshipit-source-id: c4e546bd9bdac387eebeeba303d04125212037b8 | 23 January 2024, 17:14:59 UTC |
84711e2 | Richard Barnes | 23 January 2024, 16:42:15 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/monitoring/histogram.h Summary: `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: dmm-fb Differential Revision: D52968964 fbshipit-source-id: 2cb8c683f958742e2f151db8ef6824ab622528e6 | 23 January 2024, 16:42:15 UTC |
c057c2e | Richard Barnes | 23 January 2024, 16:40:00 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/include/rocksdb/file_system.h Summary: `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: dmm-fb Differential Revision: D52969123 fbshipit-source-id: d9e22dff70644dad0173ee8f6f9b64021f4b2551 | 23 January 2024, 16:40:00 UTC |
1863441 | Richard Barnes | 23 January 2024, 16:37:47 UTC | Remove extra semi colon from internal_repo_rocksdb/repo/monitoring/histogram.cc Summary: `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: dmm-fb Differential Revision: D52969001 fbshipit-source-id: d628fa6c5e5d01657fcb7aff7b05dea704ed2025 | 23 January 2024, 16:37:47 UTC |