sort by:
Revision Author Date Message Commit Date
c847a31 Print compaction score for every compaction run. Summary: A compaction is picked based on its score. It is useful to print the compaction score in the LOG because it aids in debugging. If one looks at the logs, one can find out why a compaction was preferred over another. Test Plan: make clean check Differential Revision: https://reviews.facebook.net/D7137 04 December 2012, 18:03:47 UTC
6eb5ed9 rocksdb README file. Summary: rocksdb README file. Test Plan: Reviewers: CC: Task ID: # Blame Rev: 30 November 2012, 06:39:08 UTC
d4627e6 Move WAL files to archive directory, instead of deleting. Summary: Create a directory "archive" in the DB directory. During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory, instead of deleting. Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move. Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6975 29 November 2012, 01:28:08 UTC
d29f181 Fix all the lint errors. Summary: Scripted and removed all trailing spaces and converted all tabs to spaces. Also fixed other lint errors. All lint errors from this point of time should be taken seriously. Test Plan: make all check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D7059 29 November 2012, 01:18:41 UTC
9b83853 Release 1.5.6.fb Summary: Test Plan: Reviewers: CC: Task ID: # Blame Rev: 29 November 2012, 00:09:41 UTC
9a35784 Delete non-visible keys during a compaction even in the presense of snapshots. Summary: LevelDB should delete almost-new keys when a long-open snapshot exists. The previous behavior is to keep all versions that were created after the oldest open snapshot. This can lead to database size bloat for high-update workloads when there are long-open snapshots and long-open snapshot will be used for logical backup. By "almost new" I mean that the key was updated more than once after the oldest snapshot. If there were two snapshots with seq numbers s1 and s2 (s1 < s2), and if we find two instances of the same key k1 that lie entirely within s1 and s2 (i.e. s1 < k1 < s2), then the earlier version of k1 can be safely deleted because that version is not visible in any snapshot. Test Plan: unit test attached make clean check Differential Revision: https://reviews.facebook.net/D6999 28 November 2012, 23:47:40 UTC
34487af Moved FBCode Linter's to LevelDB. Summary: Added FBCODE like linting support to our codebase. Test Plan: arc lint lint's the code. Reviewers: dhruba Reviewed By: dhruba CC: emayanke, leveldb Differential Revision: https://reviews.facebook.net/D7041 28 November 2012, 17:49:01 UTC
3366eda Print out status at the end of a compaction run. Summary: Print out status at the end of a compaction run. This helps in debugging. Test Plan: make clean check Reviewers: sheki Reviewed By: sheki Differential Revision: https://reviews.facebook.net/D7035 28 November 2012, 06:17:38 UTC
43f5a07 Remove unused varibles. Cause compiler warnings. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: emayanke Differential Revision: https://reviews.facebook.net/D6993 27 November 2012, 04:55:24 UTC
2a39699 Assertion failure while running with unit tests with OPT=-g Summary: When we expand the range of keys for a level 0 compaction, we need to invoke ParentFilesInCompaction() only once for the entire range of keys that is being compacted. We were invoking it for each file that was being compacted, but this triggers an assertion because each file's range were contiguous but non-overlapping. I renamed ParentFilesInCompaction to ParentRangeInCompaction to adequately represent that it is the range-of-keys and not individual files that we compact in a single compaction run. Here is the assertion that is fixed by this patch. db_test: db/version_set.cc:585: void leveldb::Version::ExtendOverlappingInputs(int, const leveldb::Slice&, const leveldb::Slice&, std::vector<leveldb::FileMetaData*, std::allocator<leveldb::FileMetaData*> >*, int): Assertion `user_cmp->Compare(flimit, user_begin) >= 0' failed. Test Plan: make clean check OPT=-g Reviewers: sheki Reviewed By: sheki CC: MarkCallaghan, emayanke, leveldb Differential Revision: https://reviews.facebook.net/D6963 26 November 2012, 22:00:39 UTC
7c6f527 Merge branch 'performance' 26 November 2012, 20:01:55 UTC
e0cd6bf The c_test was sometimes failing with an assertion. Summary: On fast filesystems (e.g. /dev/shm and ext4), the flushing of memstore to disk was fast and quick, and the background compaction thread was not getting scheduled fast enough to delete obsolete files before the db was closed. This caused the repair method to pick up those files that were not part of the db and the unit test was failing. The fix is to enhance the unti test to run a compaction before closing the database so that all files that are not part of the database are truly deleted from the filesystem. Test Plan: make c_test; ./c_test Reviewers: chip, emayanke, sheki Reviewed By: chip CC: leveldb Differential Revision: https://reviews.facebook.net/D6915 26 November 2012, 19:59:51 UTC
6caf3b8 Fix broken test; some ldb commands can run without a db_ Summary: It would appear our unit tests make use of code from ldb_cmd, and don't always require a valid database handle. D6855 was not aware db_ could sometimes be NULL for such commands, and so it broke reduce_levels_test. This moves the check elsewhere to (at least) fix the 'ldb dump' case of segfaulting when it couldn't open a database. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D6903 26 November 2012, 19:11:30 UTC
879e45e Fix ldb segfault and use static libsnappy for all builds Summary: Link statically against snappy, using the gvfs one for facebook environments, and the bundled one otherwise. In addition, fix a few minor segfaults in ldb when it couldn't open the database, and update .gitignore to include a few other build artifacts. Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D6855 21 November 2012, 19:07:19 UTC
7632fdb Support taking a configurable number of files from the same level to compact in a single compaction run. Summary: The compaction process takes some files from LevelK and merges it into LevelK+1. The number of files it picks from LevelK was capped such a way that the total amount of data picked does not exceed the maxfilesize of that level. This essentially meant that only one file from LevelK is picked for a single compaction. For bulkloads, we would like to take many many file from LevelK and compact them using a single compaction run. This patch introduces a option called the 'source_compaction_factor' (similar to expanded_compaction_factor). It is a multiplier that is multiplied by the maxfilesize of that level to arrive at the limit that is used to throttle the number of source files from LevelK. For bulk loads, set source_compaction_factor to a very high number so that multiple files from the same level are picked for compaction in a single compaction. The default value of source_compaction_factor is 1, so that we can keep backward compatibilty with existing compaction semantics. Test Plan: make clean check Reviewers: emayanke, sheki Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D6867 21 November 2012, 16:37:03 UTC
fbb73a4 Support to disable background compactions on a database. Summary: This option is needed for fast bulk uploads. The goal is to load all the data into files in L0 without any interference from background compactions. Test Plan: make clean check Reviewers: sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6849 21 November 2012, 05:12:06 UTC
3754f2f A major bug that was not considering the compaction score of the n-1 level. Summary: The method Finalize() recomputes the compaction score of each level and then sorts these score from largest to smallest. The idea is that the level with the largest compaction score will be a better candidate for compaction. There are usually very few levels, and a bubble sort code was used to sort these compaction scores. There existed a bug in the sorting code that skipped looking at the score for the n-1 level. This meant that even if the compaction score of the n-1 level is large, it will not be picked for compaction. This patch fixes the bug and also introduces "asserts" in the code to detect any possible inconsistencies caused by future bugs. This bug existed in the very first code change that introduced multi-threaded compaction to the leveldb code. That version of code was committed on Oct 19th via https://github.com/facebook/leveldb/commit/1ca0584345af85d2dccc434f451218119626d36e Test Plan: make clean check OPT=-g Reviewers: emayanke, sheki, MarkCallaghan Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6837 20 November 2012, 23:44:21 UTC
dde7089 Fix asserts Summary: make check OPT=-g fails with the following assert. ==== Test DBTest.ApproximateSizes db_test: db/version_set.cc:765: void leveldb::VersionSet::Builder::CheckConsistencyForDeletes(leveldb::VersionEdit*, int, int): Assertion `found' failed. The assertion was that file #7 that was being deleted did not preexists, but actualy it did pre-exist as shown in the manifest dump shows below. The bug was that we did not check for file existance at the same level. *************************Edit[0] = VersionEdit { Comparator: leveldb.BytewiseComparator } *************************Edit[1] = VersionEdit { LogNumber: 8 PrevLogNumber: 0 NextFile: 9 LastSeq: 80 AddFile: 0 7 8005319 'key000000' @ 1 : 1 .. 'key000079' @ 80 : 1 } *************************Edit[2] = VersionEdit { LogNumber: 8 PrevLogNumber: 0 NextFile: 13 LastSeq: 80 CompactPointer: 0 'key000079' @ 80 : 1 DeleteFile: 0 7 AddFile: 1 9 2101425 'key000000' @ 1 : 1 .. 'key000020' @ 21 : 1 AddFile: 1 10 2101425 'key000021' @ 22 : 1 .. 'key000041' @ 42 : 1 AddFile: 1 11 2101425 'key000042' @ 43 : 1 .. 'key000062' @ 63 : 1 AddFile: 1 12 1701165 'key000063' @ 64 : 1 .. 'key000079' @ 80 : 1 } Test Plan: Reviewers: CC: Task ID: # Blame Rev: 19 November 2012, 22:51:22 UTC
a4b79b6 Merge branch 'master' into performance 19 November 2012, 21:20:25 UTC
74054fa Fix compilation error while compiling unit tests with OPT=-g Summary: Fix compilation error while compiling with OPT=-g Test Plan: make clean check OPT=-g Reviewers: CC: Task ID: # Blame Rev: 19 November 2012, 21:16:46 UTC
48dafb2 Fix compilation error introduced by previous commit 7889e094554dc5bba678a0bfa7fb5eca422c34de Summary: Fix compilation error introduced by previous commit 7889e094554dc5bba678a0bfa7fb5eca422c34de Test Plan: make clean check 19 November 2012, 20:16:45 UTC
7889e09 Enhance manifest_dump to print each individual edit. Summary: The manifest file contains a series of edits. If the verbose option is switched on, then print each individual edit in the manifest file. This helps in debugging. Test Plan: make clean manifest_dump Reviewers: emayanke, sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6807 19 November 2012, 20:04:35 UTC
661dc15 Fix LDB dumpwal to print the messages as in the file. Summary: StringStream.clear() does not clear the stream. It sets some flags. Who knew? Fixing that is not printing the stuff again and again. Test Plan: ran it on a local db Reviewers: dhruba, emayanke Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6795 19 November 2012, 20:04:35 UTC
65b035a Fix a coding error in db_test.cc Summary: The new function MinLevelToCompress in db_test.cc was incomplete. It needs to tell the calling function-TEST whether the test has to be skipped or not Test Plan: make all;./db_test Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: sheki Differential Revision: https://reviews.facebook.net/D6771 19 November 2012, 20:04:35 UTC
30742e1 LDB can read WAL. Summary: Add option to read WAL and print a summary for each record. facebook task => #1885013 E.G. Output : ./ldb dump_wal --walfile=/tmp/leveldbtest-5907/dbbench/026122.log --header Sequence,Count,ByteSize 49981,1,100033 49981,1,100033 49982,1,100033 49981,1,100033 49982,1,100033 49983,1,100033 49981,1,100033 49982,1,100033 49983,1,100033 49984,1,100033 49981,1,100033 49982,1,100033 Test Plan: Works run ./ldb read_wal --wal-file=/tmp/leveldbtest-5907/dbbench/000078.log --header Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: emayanke, leveldb, zshao Differential Revision: https://reviews.facebook.net/D6675 19 November 2012, 20:04:34 UTC
4b622ab Enhance manifest_dump to print each individual edit. Summary: The manifest file contains a series of edits. If the verbose option is switched on, then print each individual edit in the manifest file. This helps in debugging. Test Plan: make clean manifest_dump Reviewers: emayanke, sheki Reviewed By: sheki CC: leveldb Differential Revision: https://reviews.facebook.net/D6807 19 November 2012, 20:02:27 UTC
b648401 Fix LDB dumpwal to print the messages as in the file. Summary: StringStream.clear() does not clear the stream. It sets some flags. Who knew? Fixing that is not printing the stuff again and again. Test Plan: ran it on a local db Reviewers: dhruba, emayanke Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6795 19 November 2012, 19:14:07 UTC
62e7583 enhance dbstress to simulate hard crash Summary: dbstress has an option to reopen the database. Make it such that the previous handle is not closed before we reopen, this simulates a situation similar to a process crash. Added new api to DMImpl to remove the lock file. Test Plan: run db_stress Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D6777 19 November 2012, 07:16:17 UTC
de278a6 Fix a coding error in db_test.cc Summary: The new function MinLevelToCompress in db_test.cc was incomplete. It needs to tell the calling function-TEST whether the test has to be skipped or not Test Plan: make all;./db_test Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: sheki Differential Revision: https://reviews.facebook.net/D6771 16 November 2012, 22:56:50 UTC
f5cdf93 LDB can read WAL. Summary: Add option to read WAL and print a summary for each record. facebook task => #1885013 E.G. Output : ./ldb dump_wal --walfile=/tmp/leveldbtest-5907/dbbench/026122.log --header Sequence,Count,ByteSize 49981,1,100033 49981,1,100033 49982,1,100033 49981,1,100033 49982,1,100033 49983,1,100033 49981,1,100033 49982,1,100033 49983,1,100033 49984,1,100033 49981,1,100033 49982,1,100033 Test Plan: Works run ./ldb read_wal --wal-file=/tmp/leveldbtest-5907/dbbench/000078.log --header Reviewers: dhruba, heyongqiang Reviewed By: dhruba CC: emayanke, leveldb, zshao Differential Revision: https://reviews.facebook.net/D6675 16 November 2012, 17:09:00 UTC
c3392c9 The db_stress test should also test multi-threaded compaction. Summary: Create more than one background compaction thread if specified. This code peice is similar to what exists in db_bench. Test Plan: make check Differential Revision: https://reviews.facebook.net/D6753 15 November 2012, 06:01:39 UTC
6c5a4d6 Merge branch 'master' into performance Conflicts: db/db_impl.h 15 November 2012, 05:39:52 UTC
e988c11 Enhance db_bench to be able to specify a grandparent_overlap_factor. Summary: The value specified in max_grandparent_overlap_factor is used to limit the file size in a compaction run. This patch makes it configurable when using db_bench. Test Plan: make clean db_bench Reviewers: MarkCallaghan, heyongqiang Reviewed By: heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D6729 15 November 2012, 00:20:13 UTC
0f590af Push release 1.5.5.fb. Summary: Test Plan: Reviewers: CC: Task ID: # Blame Rev: 14 November 2012, 00:28:11 UTC
33cf6f3 Make sse compilation optional. Summary: The fbcode compilation was always switching on msse by default. This patch keeps the same behaviour but allows the compilation process to switch off msse if needed. If one does not want to use sse, then do the following: export USE_SSE=0 make clean all Test Plan: make clean all Reviewers: heyongqiang Reviewed By: heyongqiang CC: leveldb Differential Revision: https://reviews.facebook.net/D6717 14 November 2012, 00:25:57 UTC
5d16e50 Improved CompactionFilter api: pass in a opaque argument to CompactionFilter invocation. Summary: There are applications that operate on multiple leveldb instances. These applications will like to pass in an opaque type for each leveldb instance and this type should be passed back to the application with every invocation of the CompactionFilter api. Test Plan: Enehanced unit test for opaque parameter to CompactionFilter. Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan, sheki, emayanke Differential Revision: https://reviews.facebook.net/D6711 14 November 2012, 00:22:26 UTC
43d9a82 Fix asserts so that "make check OPT=-g" works on performance branch Summary: Compilation used to fail with the error: db/version_set.cc:1773: error: ‘number_of_files_to_sort_’ is not a member of ‘leveldb::VersionSet’ I created a new method called CheckConsistencyForDeletes() so that all the high cost checking is done only when OPT=-g is specified. I also fixed a bug in PickCompactionBySize that was triggered when OPT=-g was switched on. The base_index in the compaction record was not set correctly. Test Plan: make check OPT=-g Differential Revision: https://reviews.facebook.net/D6687 13 November 2012, 18:40:52 UTC
a785e02 The db_bench utility was broken in 1.5.4.fb because of a signed-unsigned comparision. Summary: The db_bench utility was broken in 1.5.4.fb because of a signed-unsigned comparision. The static variable FLAGS_min_level_to_compress was recently changed from int to 'unsigned in' but it is initilized to a nagative value -1. The segfault is of this type: Program received signal SIGSEGV, Segmentation fault. Open (this=0x7fffffffdee0) at db/db_bench.cc:939 939 db/db_bench.cc: No such file or directory. (gdb) where Test Plan: run db_bench with no options. Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6663 12 November 2012, 21:59:35 UTC
e626261 Introducing "database reopens" into the stress test. Database will reopen after a specified number of iterations (configurable) of each thread when they will wait for the databse to reopen. Summary: FLAGS_reopen (configurable) specifies the number of times the databse is to be reopened. FLAGS_ops_per_thread is divided into points based on that reopen field. At these points all threads come together to wait for the databse to reopen. Each thread "votes" for the database to reopen and when all have voted, the database reopens. Test Plan: make all;./db_stress Reviewers: dhruba, MarkCallaghan, sheki, asad, heyongqiang Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6627 12 November 2012, 20:26:32 UTC
c64796f Fix test failure of reduce_num_levels Summary: I changed the reduce_num_levels logic to avoid "compactRange()" call if the current number of levels in use (levels that contain files) is smaller than the new num of levels. And that change breaks the assert in reduce_levels_test Test Plan: run reduce_levels_test Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: emayanke, sheki Differential Revision: https://reviews.facebook.net/D6651 12 November 2012, 20:05:38 UTC
9c6c232 Compilation error while compiling with OPT=-g Summary: make clean check OPT=-g fails leveldb::DBStatistics::getTickerCount(leveldb::Tickers)’: ./db/db_statistics.h:34: error: ‘MAX_NO_TICKERS’ was not declared in this scope util/ldb_cmd.cc:255: warning: left shift count >= width of type Test Plan: make clean check OPT=-g Reviewers: CC: Task ID: # Blame Rev: 11 November 2012, 08:20:40 UTC
0f8e472 Metrics: record compaction drop's and bloom filter effectiveness Summary: Record BloomFliter hits and drop off reasons during compaction. Test Plan: Unit tests work. Reviewers: dhruba, heyongqiang Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6591 09 November 2012, 19:38:45 UTC
20d18a8 disable size compaction in ldb reduce_levels and added compression and file size parameter to it Summary: disable size compaction in ldb reduce_levels, this will avoid compactions rather than the manual comapction, added --compression=none|snappy|zlib|bzip2 and --file_size= per-file size to ldb reduce_levels command Test Plan: run ldb Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: sheki, emayanke Differential Revision: https://reviews.facebook.net/D6597 09 November 2012, 18:14:47 UTC
e00c709 Preparing for new release 1.5.4.fb Summary: Preparing for new release 1.5.4.fb Test Plan: Reviewers: CC: Task ID: # Blame Rev: 09 November 2012, 17:21:11 UTC
9e97bfd Introducing deletes for stress test Summary: Stress test modified to do deletes and later verify them Test Plan: running the test: db_stress Reviewers: dhruba, heyongqiang, asad, sheki, MarkCallaghan Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6567 09 November 2012, 00:55:18 UTC
391885c stat's collection in leveldb Summary: Prototype stat's collection. Diff is a good estimate of what the final code will look like. A few assumptions : * Used a global static instance of the statistics object. Plan to pass it to each internal function. Static allows metrics only at app level. * In the Ticker's do not do any locking. Depend on the mutex at each function of LevelDB. If we ever remove the mutex, we should change here too. The other option is use atomic objects anyways as there won't be any contention as they will be always acquired only by one thread. * The counters are dumb, increment through lifecycle. Plan to use ods etc to get last5min stat etc. Test Plan: made changes in db_bench Ran ./db_bench --statistics=1 --num=10000 --cache_size=5000 This will print the cache hit/miss stats. Reviewers: dhruba, heyongqiang Differential Revision: https://reviews.facebook.net/D6441 08 November 2012, 21:55:49 UTC
95dda37 Move filesize-based-sorting to outside the Mutex Summary: When a new version is created, we sort all the files at every level based on their size. This is necessary because we want to compact the largest file first. The sorting takes quite a bit of CPU. Moved the sorting code to be outside the mutex. Also, the earlier code was sorting files at all levels but we do not need to sort the highest-number level because those files are never the cause of any compaction. To reduce sorting costs, we sort only the first few files in each level because it is likely that those are the only files in that level that will be picked for compaction. At steady state, I have seen that this patch increase throughout from 1500 writes/sec to 1700 writes/sec at the end of a 72 hour run. The cpu saving by not sorting the last level was not distinctive in this test run because there were only 100K files in the highest numbered level. I expect the cpu saving to be significant when the number of files is much higher. This is mostly an early preview and not ready for rigorous review. With this patch, the writs/sec is now bottlenecked not by the sorting code but by GetOverlappingInputs. I am working on a patch to optimize GetOverlappingInputs. Test Plan: make check Reviewers: MarkCallaghan, heyongqiang Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D6411 07 November 2012, 23:39:44 UTC
18cb600 Fixed compilation error in previous merge. Summary: Fixed compilation error in previous merge. Test Plan: Reviewers: CC: Task ID: # Blame Rev: 07 November 2012, 23:24:47 UTC
8143062 Merge branch 'master' into performance Conflicts: db/db_impl.cc db/version_set.cc util/options.cc 07 November 2012, 23:11:37 UTC
3fcf533 Add a readonly db Summary: as subject Test Plan: run db_bench readrandom Reviewers: dhruba Reviewed By: dhruba CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6495 07 November 2012, 22:19:48 UTC
9b87a2b Avoid doing a exhaustive search when looking for overlapping files. Summary: The Version::GetOverlappingInputs() is called multiple times in the compaction code path. Eack invocation does a binary search for overlapping files in the specified key range. This patch remembers the offset of an overlapped file when GetOverlappingInputs() is called the first time within a compaction run. Suceeding calls to GetOverlappingInputs() uses the remembered index to avoid the binary search. I measured that 1000 iterations of GetOverlappingInputs takes around 4500 microseconds without this patch. If I use this patch with the hint on every invocation, then 1000 iterations take about 3900 microsecond. Test Plan: make check OPT=-g Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6513 07 November 2012, 19:47:17 UTC
4e413df Flush Data at object destruction if disableWal is used. Summary: Added a conditional flush in ~DBImpl to flush. There is still a chance of writes not being persisted if there is a crash (not a clean shutdown) before the DBImpl instance is destroyed. Test Plan: modified db_test to meet the new expectations. Reviewers: dhruba, heyongqiang Differential Revision: https://reviews.facebook.net/D6519 06 November 2012, 23:04:42 UTC
aa42c66 Fix all warnings generated by -Wall option to the compiler. Summary: The default compilation process now uses "-Wall" to compile. Fix all compilation error generated by gcc. Test Plan: make all check Reviewers: heyongqiang, emayanke, sheki Reviewed By: heyongqiang CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6525 06 November 2012, 22:07:31 UTC
5f91868 Merge branch 'master' into performance Conflicts: db/version_set.cc util/options.cc 06 November 2012, 00:51:55 UTC
cb7a002 The method GetOverlappingInputs should use binary search. Summary: The method Version::GetOverlappingInputs used a sequential search to map a kay-range to a set of files. But the files are arranged in ascending order of key, so a biary search is more effective. This patch implements Version::GetOverlappingInputsBinarySearch that finds one file that corresponds to the specified key range and then iterates backwards and forwards to find all overlapping files. This patch is critical for making compactions efficient, especially when there are thousands of files in a single level. I measured that 1000 iterations of TEST_MaxNextLevelOverlappingBytes takes 16000 microseconds without this patch. With this patch, the same method takes about 4600 microseconds. Test Plan: Almost all unit tests in db_test uses this method to lookup keys. Reviewers: heyongqiang Reviewed By: heyongqiang CC: MarkCallaghan, emayanke, sheki Differential Revision: https://reviews.facebook.net/D6465 06 November 2012, 00:08:01 UTC
5273c81 Ability to invoke application hook for every key during compaction. Summary: There are certain use-cases where the application intends to delete older keys aftre they have expired a certian time period. One option for those applications is to periodically scan the entire database and delete appropriate keys. A better way is to allow the application to hook into the compaction process. This patch allows the application to set a method callback for every key that is being compacted. If this method returns true, then the key is not preserved in the output of the compaction. Test Plan: This is mostly to preview the proposed new public api. Since it is a public api, please do due diligence on reviewing it. I will be writing test cases for this api in mynext version of this patch. Reviewers: MarkCallaghan, heyongqiang Reviewed By: heyongqiang CC: sheki, adsharma Differential Revision: https://reviews.facebook.net/D6285 06 November 2012, 00:02:13 UTC
f1a7c73 fix complie error Summary: as subject Test Plan:n/a 05 November 2012, 18:30:19 UTC
d55c2ba Add a tool to change number of levels Summary: as subject. Test Plan: manually test it, will add a testcase Reviewers: dhruba, MarkCallaghan Differential Revision: https://reviews.facebook.net/D6345 05 November 2012, 18:17:39 UTC
81f735d Merge branch 'master' into performance Conflicts: db/db_impl.cc util/options.cc 05 November 2012, 17:41:38 UTC
a1bd5b7 Compilation problem introduced by previous commit 854c66b089bef5d27f79750884f70f6e2c8c69da. Summary: Compilation problem introduced by previous commit 854c66b089bef5d27f79750884f70f6e2c8c69da. Test Plan: make check 05 November 2012, 06:04:14 UTC
854c66b Make compression options configurable. These include window-bits, level and strategy for ZlibCompression Summary: Leveldb currently uses windowBits=-14 while using zlib compression.(It was earlier 15). This makes the setting configurable. Related changes here: https://reviews.facebook.net/D6105 Test Plan: make all check Reviewers: dhruba, MarkCallaghan, sheki, heyongqiang Differential Revision: https://reviews.facebook.net/D6393 02 November 2012, 18:26:39 UTC
3096fa7 Add two more options: disable block cache and make table cache shard number configuable Summary: as subject Test Plan: run db_bench and db_test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6111 01 November 2012, 20:23:21 UTC
3e7e269 Use timer to measure sleep rather than assume it is 1000 usecs Summary: This makes the stall timers in MakeRoomForWrite more accurate by timing the sleeps. From looking at the logs the real sleep times are usually about 2000 usecs each when SleepForMicros(1000) is called. The modified LOG messages are: 2012/10/29-12:06:33.271984 2b3cc872f700 delaying write 13 usecs for level0_slowdown_writes_trigger 2012/10/29-12:06:34.688939 2b3cc872f700 delaying write 1728 usecs for rate limits with max score 3.83 Task ID: # Blame Rev: Test Plan: run db_bench, look at DB/LOG Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin *PUBLIC* platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6297 30 October 2012, 14:21:37 UTC
fb8d437 fix test failure Summary: as subject Test Plan: db_test Reviewers: dhruba, MarkCallaghan Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6309 30 October 2012, 01:55:52 UTC
925f60d add a test case to make sure chaning num_levels will fail Summary: Summary: as subject Test Plan: db_test Reviewers: dhruba, MarkCallaghan Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6303 29 October 2012, 22:27:07 UTC
53e0431 Merge branch 'master' into performance Conflicts: db/db_bench.cc util/options.cc 29 October 2012, 21:18:00 UTC
321dfdc Allow having different compression algorithms on different levels. Summary: The leveldb API is enhanced to support different compression algorithms at different levels. This adds the option min_level_to_compress to db_bench that specifies the minimum level for which compression should be done when compression is enabled. This can be used to disable compression for levels 0 and 1 which are likely to suffer from stalls because of the CPU load for memtable flushes and (L0,L1) compaction. Level 0 is special as it gets frequent memtable flushes. Level 1 is special as it frequently gets all:all file compactions between it and level 0. But all other levels could be the same. For any level N where N > 1, the rate of sequential IO for that level should be the same. The last level is the exception because it might not be full and because files from it are not read to compact with the next larger level. The same amount of time will be spent doing compaction at any level N excluding N=0, 1 or the last level. By this standard all of those levels should use the same compression. The difference is that the loss (using more disk space) from a faster compression algorithm is less significant for N=2 than for N=3. So we might be willing to trade disk space for faster write rates with no compression for L0 and L1, snappy for L2, zlib for L3. Using a faster compression algorithm for the mid levels also allows us to reclaim some cpu without trading off much loss in disk space overhead. Also note that little is to be gained by compressing levels 0 and 1. For a 4-level tree they account for 10% of the data. For a 5-level tree they account for 1% of the data. With compression enabled: * memtable flush rate is ~18MB/second * (L0,L1) compaction rate is ~30MB/second With compression enabled but min_level_to_compress=2 * memtable flush rate is ~320MB/second * (L0,L1) compaction rate is ~560MB/second This practicaly takes the same code from https://reviews.facebook.net/D6225 but makes the leveldb api more general purpose with a few additional lines of code. Test Plan: make check Differential Revision: https://reviews.facebook.net/D6261 29 October 2012, 18:48:09 UTC
acc8567 Add more rates to db_bench output Summary: Adds the "MB/sec in" and "MB/sec out" to this line: Amplification: 1.7 rate, 0.01 GB in, 0.02 GB out, 8.24 MB/sec in, 13.75 MB/sec out Changes all values to be reported per interval and since test start for this line: ... thread 0: (10000,60000) ops and (19155.6,27307.5) ops/second in (0.522041,2.197198) seconds Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin *PUBLIC* platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6291 29 October 2012, 18:30:07 UTC
de7689b Fix unit test failure caused by delaying deleting obsolete files. Summary: A previous commit 4c107587ed47af84633f8c61f65516a504d6cd98 introduced the idea that some version updates might not delete obsolete files. This means that if a unit test blindly counts the number of files in the db directory it might not represent the true state of the database. Use GetLiveFiles() insteads to count the number of live files in the database. Test Plan: make check 29 October 2012, 18:12:24 UTC
70c42bf Adds DB::GetNextCompaction and then uses that for rate limiting db_bench Summary: Adds a method that returns the score for the next level that most needs compaction. That method is then used by db_bench to rate limit threads. Threads are put to sleep at the end of each stats interval until the score is less than the limit. The limit is set via the --rate_limit=$double option. The specified value must be > 1.0. Also adds the option --stats_per_interval to enable additional metrics reported every stats interval. Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin *PUBLIC* platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6243 29 October 2012, 17:17:43 UTC
8965c8d Add the missing util/auto_split_logger.h Summary: Test Plan: Reviewers: CC: Task ID: 1803577 Blame Rev: 26 October 2012, 22:23:50 UTC
d50f8eb Enable LevelDb to create a new log file if current log file is too large. Summary: Enable LevelDb to create a new log file if current log file is too large. Test Plan: Write a script and manually check the generated info LOG. Task ID: 1803577 Blame Rev: Reviewers: dhruba, heyongqiang Reviewed By: heyongqiang CC: zshao Differential Revision: https://reviews.facebook.net/D6003 26 October 2012, 21:55:02 UTC
3a91b78 Keep build_detect_platform portable Summary: AFAIK proper /bin/sh does not support "+=". Note that only our changes use "+=". The Google code does A="$A + $B" rather than A+=$B. Task ID: # Blame Rev: Test Plan: build Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin *PUBLIC* platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6231 26 October 2012, 21:20:04 UTC
65855dd Normalize compaction stats by time in compaction Summary: I used server uptime to compute per-level IO throughput rates. I intended to use time spent doing compaction at that level. This fixes that. Task ID: # Blame Rev: Test Plan: run db_bench, look at results Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin *PUBLIC* platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6237 26 October 2012, 21:19:13 UTC
ea9e087 Merge branch 'master' into performance Conflicts: db/db_bench.cc db/db_impl.cc db/db_test.cc 26 October 2012, 15:57:56 UTC
8eedf13 Fix unit test failure caused by delaying deleting obsolete files. Summary: A previous commit 4c107587ed47af84633f8c61f65516a504d6cd98 introduced the idea that some version updates might not delete obsolete files. This means that if a unit test blindly counts the number of files in the db directory it might not represent the true state of the database. Use GetLiveFiles() insteads to count the number of live files in the database. Test Plan: make check Reviewers: heyongqiang, MarkCallaghan Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6207 26 October 2012, 15:42:05 UTC
5b0fe6c Greedy algorithm for picking files to compact. Summary: It is best if we pick the largest file to compact in a level. This reduces the write amplification factor for compactions. Each level has an auxiliary data structure called files_by_size_ that sorts all files by their size. This data structure is updated when a new version is created. Test Plan: make check Differential Revision: https://reviews.facebook.net/D6195 26 October 2012, 01:27:53 UTC
8fb5f40 firstIndex fix for multi-threaded compaction code. Summary: Prior to multi-threaded compaction, wrap-around would be done by using current_->files_[level[0]. With this change we should be using the first file for which f->being_compacted is not true. https://github.com/facebook/leveldb/commit/1ca0584345af85d2dccc434f451218119626d36e#commitcomment-2041516 Test Plan: make check Differential Revision: https://reviews.facebook.net/D6165 25 October 2012, 15:44:47 UTC
e7206f4 Improve statistics Summary: This adds more statistics to be reported by GetProperty("leveldb.stats"). The new stats include time spent waiting on stalls in MakeRoomForWrite. This also includes the total amplification rate where that is: (#bytes of sequential IO during compaction) / (#bytes from Put) This also includes a lot more data for the per-level compaction report. * Rn(MB) - MB read from level N during compaction between levels N and N+1 * Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1 * Wnew(MB) - new data written to the level during compaction * Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB) * Rn - files read from level N during compaction between levels N and N+1 * Rnp1 - files read from level N+1 during compaction between levels N and N+1 * Wnp1 - files written to level N+1 during compaction between levels N and N+1 * NewW - new files written to level N+1 during compaction * Count - number of compactions done for this level This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB) Compactions Level Files Size(MB) Time(sec) Read(MB) Write(MB) Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s) Rn Rnp1 Wnp1 NewW Count ------------------------------------------------------------------------------------------------------------------------------------- 0 3 6 33 0 576 0 0 576 -1.0 0.0 1.3 0 0 0 0 290 1 127 242 351 5316 5314 570 4747 567 17.0 12.1 12.1 287 2399 2685 286 32 2 161 328 54 822 824 326 496 328 4.0 1.9 1.9 160 251 411 160 161 Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out Uptime(secs): 439.8 Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin *PUBLIC* platform impact section - Bugzilla: # - end platform impact - (cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8) Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D6153 24 October 2012, 21:21:38 UTC
47bce26 Merge branch 'master' into performance 24 October 2012, 05:32:54 UTC
3b06f94 Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_impl.h db/version_set.cc 24 October 2012, 05:30:07 UTC
51d2adf Fix broken build. Add stdint.h to get uint64_t Summary: I still get failures from this. Not sure whether there was a fix in progress. Task ID: # Blame Rev: Test Plan: compile Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin *PUBLIC* platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D6147 23 October 2012, 21:58:53 UTC
4c10758 Delete files outside the mutex. Summary: The compaction process deletes a large number of files. This takes quite a bit of time and is best done outside the mutex lock. Test Plan: make check Differential Revision: https://reviews.facebook.net/D6123 22 October 2012, 18:53:23 UTC
5010daa add "seek_compaction" to log for better debug Summary: Summary: as subject Test Plan: compile Reviewers: dhruba Reviewed By: dhruba CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6117 22 October 2012, 17:00:25 UTC
3489cd6 Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_impl.h 21 October 2012, 09:15:19 UTC
f95219f Delete files outside the mutex. Summary: The compaction process deletes a large number of files. This takes quite a bit of time and is best done outside the mutex lock. Test Plan: make check Differential Revision: https://reviews.facebook.net/D6123 21 October 2012, 09:03:00 UTC
98f23cf Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_impl.h 21 October 2012, 08:55:19 UTC
64c4b9f Delete files outside the mutex. Summary: The compaction process deletes a large number of files. This takes quite a bit of time and is best done outside the mutex lock. Test Plan: Reviewers: CC: Task ID: # Blame Rev: 21 October 2012, 08:49:48 UTC
5016699 Merge branch 'master' into performance 19 October 2012, 23:08:04 UTC
507f5aa Do not enable checksums for zlib compression. Summary: Leveldb code already calculates checksums for each block. There is no need to generate checksums inside zlib. This patch switches-off checksum generation/checking in zlib library. (The Inno support for zlib uses windowsBits=14 as well.) pfabricator marks this file as binary. But here is the diff diff --git a/port/port_posix.h b/port/port_posix.h index 86a0927..db4e0b8 100644 --- a/port/port_posix.h +++ b/port/port_posix.h @@ -163,7 +163,7 @@ inline bool Snappy_Uncompress(const char* input, size_t length, } inline bool Zlib_Compress(const char* input, size_t length, - ::std::string* output, int windowBits = 15, int level = -1, + ::std::string* output, int windowBits = -14, int level = -1, int strategy = 0) { #ifdef ZLIB // The memLevel parameter specifies how much memory should be allocated for @@ -223,7 +223,7 @@ inline bool Zlib_Compress(const char* input, size_t length, } inline char* Zlib_Uncompress(const char* input_data, size_t input_length, - int* decompress_size, int windowBits = 15) { + int* decompress_size, int windowBits = -14) { #ifdef ZLIB z_stream _stream; memset(&_stream, 0, sizeof(z_stream)); Test Plan: run db_bench with zlib compression. Reviewers: heyongqiang, MarkCallaghan Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D6105 19 October 2012, 23:06:33 UTC
e982f5a Merge branch 'master' into performance Conflicts: util/options.cc 19 October 2012, 22:16:42 UTC
cf5adc8 db_bench was not correctly initializing the value for delete_obsolete_files_period_micros option. Summary: The parameter delete_obsolete_files_period_micros controls the periodicity of deleting obsolete files. db_bench was reading in this parameter intoa local variable called 'l' but was incorrectly using another local variable called 'n' while setting it in the db.options data structure. This patch also logs the value of delete_obsolete_files_period_micros in the LOG file at db startup time. I am hoping that this will improve the overall write throughput drastically. Test Plan: run db_bench Reviewers: MarkCallaghan, heyongqiang Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6099 19 October 2012, 22:10:12 UTC
1ca0584 This is the mega-patch multi-threaded compaction published in https://reviews.facebook.net/D5997. Summary: This patch allows compaction to occur in multiple background threads concurrently. If a manual compaction is issued, the system falls back to a single-compaction-thread model. This is done to ensure correctess and simplicity of code. When the manual compaction is finished, the system resumes its concurrent-compaction mode automatically. The updates to the manifest are done via group-commit approach. Test Plan: run db_bench 19 October 2012, 21:00:53 UTC
cd93e82 Enable SSE when building with fbcode support. Summary: fbcode build now support SSE instructions. Delete older version of the compile-helper fbcode.sh. This is subsumed by fbcode.gcc471.sh. Test Plan: run make check Reviewers: heyongqiang, MarkCallaghan Reviewed By: heyongqiang Differential Revision: https://reviews.facebook.net/D6057 18 October 2012, 15:43:25 UTC
aa73538 The deletion of obsolete files should not occur very frequently. Summary: The method DeleteObsolete files is a very costly methind, especially when the number of files in a system is large. It makes a list of all live-files and then scans the directory to compute the diff. By default, this method is executed after every compaction run. This patch makes it such that DeleteObsolete files is never invoked twice within a configured period. Test Plan: run all unit tests Reviewers: heyongqiang, MarkCallaghan Reviewed By: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6045 16 October 2012, 17:26:10 UTC
0230866 Enhance db_bench to allow setting the number of levels in a database. Summary: Enhance db_bench to allow setting the number of levels in a database. Test Plan: run db_bench and look at LOG Reviewers: heyongqiang, MarkCallaghan Reviewed By: MarkCallaghan CC: MarkCallaghan Differential Revision: https://reviews.facebook.net/D6027 15 October 2012, 17:18:49 UTC
5dc784c Fix compilation problem with db_stress when using C11 compiler. Summary: Test Plan: Reviewers: CC: Task ID: # Blame Rev: 13 October 2012, 00:00:25 UTC
24f7983 [tools] Add a tool to stress test concurrent writing to levelDB Summary: Created a tool that runs multiple threads that concurrently read and write to levelDB. All writes to the DB are stored in an in-memory hashtable and verified at the end of the test. All writes for a given key are serialzied. Test Plan: - Verified by writing only a few keys and logging all writes and verifying that values read and written are correct. - Verified correctness of value generator. - Ran with various parameters of number of keys, locks, and threads. Reviewers: dhruba, MarkCallaghan, heyongqiang Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D5829 10 October 2012, 19:12:55 UTC
696b290 Add LevelDb's JNI wrapper Summary: This implement the Java interface by using JNI Test Plan: compile test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D5925 05 October 2012, 20:13:49 UTC
fc23714 Add LevelDb's Java interface Summary: See the wiki below https://our.intern.facebook.com/intern/wiki/index.php/Database/leveldb/Java Test Plan: compile test Reviewers: dhruba Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D5919 05 October 2012, 20:11:31 UTC
back to top