swh:1:snp:764524290233376de64947417267228741ce2485

sort by:
Revision Author Date Message Commit Date
9ac9f31 Releasing Voldemort 1.10.25 27 July 2017, 19:10:27 UTC
52f3cea Updated Gobblin dependency from 0.10.0 to 0.11.0 27 July 2017, 19:10:27 UTC
5dbfcec Release Voldemort 1.0.24 21 July 2017, 23:17:25 UTC
1feb85f Added optional CDN feature to the BnP pipeline. Currently the Voldemort Build and Push (BnP) plugin tells Voldemort cluster to fetch twice from the source HDFS cluster. This optional CDN feature will instead copy files to dedicated CDN clusters, and therefore reduce bandwidth requirement for the source. The following are new attributes that are related to this release: 1. push.cdn.enabled The global switch. Example: true Default: false 2. push.cdn.cluster A list of "destination|cdn" pairs separated by comma, where "destination" is a Voldemort cluster, and "cdn" is the corresponding HDFS cluster used as CDN. if "cdn" is "null", V-cluster will fetch directly from the source HDFS cluster instead, in which case the behavior is identical to the previous versions of Voldemort. Example: tcp://v-cluster1:6666|hdfs://cdn1:9000,tcp://v-cluster2:6666|webhdfs://cdn2:50070,tcp://v-cluster3:6666|null Default: null 3. push.cdn.prefix A directory on the CDN cluster as the root for all distcp copied files. Example: /jobs/VoldemortBnP Default: null 4. push.cdn.readByGroup Set true if CDN files are read by a different user in the same group. Example: true Default: true 5. push.cdn.readByOther Set true if CDN files are read by a different user in a different group. Example: true Default: true 6. push.cdn.writtenByGroup Set true if CDN files are written by a different user in the same group. Example: true Default: true 7. push.cdn.writtenByOther Set true if CDN files are written by a different user in a different group. Example: true Default: true 8. push.cdn.storeWhitelist A comma-separated list of Voldemort store names to which the CDN feature will apply. (for testing purposes) Default: null (means apply to every store) This release is compatible with existing job configurations. The default behavior is identical to the previous version. 21 July 2017, 21:04:28 UTC
3787b4f Merge pull request #476 from FelixGV/fix_client_shell_list_printing_logic Fixed the client shell's list parsing logic 09 June 2017, 22:02:24 UTC
d3e65e6 Fixed the client shell's list parsing logic 09 June 2017, 22:00:20 UTC
c920c4b Fix the connection leaking issue. (#469) 13 January 2017, 22:29:46 UTC
da4a1bf Fixed a compilation error introduced by last commit. (#468) 07 January 2017, 00:28:40 UTC
c66660f Added build.replica.factor check when decide whether Voldemort should fail the data push when data fetches fail in some nodes in HA mode. 06 January 2017, 23:57:30 UTC
e5fe3fb Replace Unicode quotes in scripts with ASCII quotes 29 November 2016, 01:01:55 UTC
2f155a2 Adhere target/source java compatibility for contrib projects We are using Java 1.7 on our hadoop cluster, but locally I use JDK 1.8 for building, so contrib got build with 1.8 by default because its source and target version is not specified explicitly. Consequentially the resulting jar didn't work in our cluster environment. This reconfiguration fixes this. But since the contrib projects don't build anymore with 1.6 source compatibility I bumped up the configured javac.version to 1.7, even though two other options are available: 1. Fix contrib to make it 1.6 compatible again 2. Specify different versions for the main and contrib projects But since Java 1.6 is EOL since quite a while I suggest that no Voldemort servers with a Java 1.6 runtime should be running anymore and it should be save to upgrade and keep the configuration simple. 28 November 2016, 22:49:34 UTC
1d15aa8 Fix detection for fetcher protocol warning Fetcher protocol warning was shown if the recommended protocol is specified explicitly, which is unnecessary confusing 28 November 2016, 22:47:53 UTC
50027dd Make nodes option working again for generate_cluster_xml.py Option -n or --nodes is broken since d2452a971e5691f3f9712826d333050180282695 when the input file check was added. 24 November 2016, 07:37:34 UTC
293dbee Releasing Voldemort 1.10.23 10 November 2016, 23:35:13 UTC
9917bdb BnP now retries fetches when cluster.xml is stale. Previously, there was a race condition where a BnP job would initialize its AdminClients at the beginning of the job, and then hang on to that Cluster state throughout the job. If a maintenance is going on while the job is running, then it's possible that by the time the job gets to the "Push" phase, the Cluster representation may be stale. In those cases, it is possible that a BnP job may attempt to push to a node which has been swapped out of the cluster. This may cause BnP HA to trigger even though the cluster is actually healthy at that time. In order to fix this, two changes are made in this commit: 1. In the VoldemortSwapJob, the AdminClient is constructed from scratch rather than being created based on the previous Cluster state. This should minimize the window during which it is possible to change the cluster.xml and make BnP hit the wrong node, but it does not completely eliminate the race condition. 2. In the AdminStoreSwapper, the invokeFetch() code will check if an exception is caused by a stale cluster state. If it is, it will get a fresh AdminClient and retry the operation. This should totally prevent the race condition. The BnP job will do a limited amount of fetch retries (10 attempts with 30 seconds of wait time between each) and only when hitting soft errors (i.e.: connection failure, etc.). 10 November 2016, 02:08:44 UTC
2084dbc Tweaked the AdminClient's currentVersion so that it is not stale. Previously, there could be a case where an AdminClient is created from a stale Cluster instance, which would lead to isClusterModified() not returning the correct result. This was because the AdminClient would always set its currentVersion to the current time, no matter how long ago the passed in Cluster instance was originally generated. In cases where the cluster.xml configuration is altered after the Cluster instance is constructed, but before the AdminClient is constructed, then there is a potentially very long window during which the wrong currentVersion would be set. 09 November 2016, 21:40:13 UTC
2663510 BnP now kills an async job that it is waiting on if that job times out. Previously, BnP would just leave the aync job running if it timed out, which is wasteful, and could cause a subsequent job retry to fail if there are two fetch jobs running concurrently for the same store. 08 November 2016, 19:33:09 UTC
f22e991 Changed the DeleteAllFailedFetchStrategy so that it affects all nodes. Previously, the DeleteAllFailedFetchStrategy would only attempt to delete data from nodes which succeeded in their fetch. In some failure modes, this is appropriate, but in other cases, it isn't. In any case, there is no harm in trying to delete data on all nodes, even those that failed their fetch. This commit makes it so. 07 November 2016, 23:46:26 UTC
a36d1fe Made admin connection/socket timeout configurable in BnP. Also changed the default socket timeout to 180 seconds. This fixes the following problem: when a node is unreachable and completely shut down, requests to it will time out, which takes 60 seconds. When BnP notices this, it will reach one of the live nodes in the cluster and ask it to deal with the failure. The live node will try to talk to the dead node, which will also take 60 seconds to time out. By the time the live node decides that the dead node is unreachable, and responds to the BnP job, the BnP job will have already timed out. Then, the BnP job will think that the HandleFailedFetchRequest could not complete successfully (even though it did in fact complete successfully) and BnP HA will be aborted. The solution is that the BnP job's socket timeout must be greater than the server's default connection timeout. This was not an issue before when we had insanely long time outs, but those time outs have been reduced considerably in commit 34debd34c5896b6c2a01b1012e89dd1a3a0a0242. This is likely when we regressed on the handling of this failure mode. 04 November 2016, 22:48:16 UTC
faca38e Fix lots of typos and spelling mistakes This shouldn't entail any functional changes (besides some corrected log or assertion error messages) 04 November 2016, 20:00:05 UTC
7637af8 Added some extra logging when OOM occurs in BnP. The AvroStoreBuilderMapper can OOM when manipulating certain bad Avro records. This change does not actually prevent the OOM, but merely prints some useful info before dying. 05 October 2016, 00:03:05 UTC
37a5a28 Replaced the following instances with http://www.project-voldemort.com since urls don't resolve from some places $ grep -r 'http://project-voldemort' . ./clients/python/setup.py: url='http://project-voldemort.com', ./NOTES:For the most up-to-date information see http://project-voldemort.com ./contrib/collections/src/java/voldemort/collections/VStack.java: * voldemort JSON formats: http://project-voldemort.com/design.php 26 September 2016, 21:43:19 UTC
3e4d4b0 Releasing Voldemort 1.10.22 20 September 2016, 17:03:36 UTC
ea37ef6 The BnP job should be resilient to colo failures, but this regressed. This commit adds a safe guard to bring back resilience to full colo failures. Now, if a colo is unreachable, the BnP job will still push to the other (healthy) colos, but it will fail the job afterwards with a message saying which colo failed. 13 September 2016, 00:46:22 UTC
70d0404 Python client has an issue with inconsistent indentation (#446) The indentation in the code is mostly spaces while the offending line is tab indented. Hence, importing and initializing the client fails with an Indentation error. 06 September 2016, 20:49:51 UTC
30fc9fc Provide chunk size suggestion for BnP jobs with chunk overflow exceptions and fix num chunks algorith to round up 30 August 2016, 18:09:42 UTC
0734ec4 Introduced new boolean "readonly.omit.port" server configuration. When set to true, the port will be removed from the fetch URI. In this case, the already-existing "readonly.modify.port" setting is ignored. When set to false (which is the default), then the port will be left as part of the fetch URI (according to the already-existing "readonly.modify.port" setting). 29 August 2016, 23:35:02 UTC
298ef65 vadmin.sh stream support for system stores The commands bin/vadmin.sh stream fetch-entries bin/vadmin.sh stream fetch-keys does not work on System stores like voldsys$_client_registry There is a client side check for valid stores, which only is validating the user stores. Added a check to include the system stores as well. 17 August 2016, 20:14:17 UTC
d7c98ab Data Cleanup job Does not run on system stores 1) Client registry System store is a in-memory store and supposed to be cleaned up after 7 days. Last change to the DataCleanupJob made the system stores fail with the missing store exception. Clients re-use the same client id, so unless lots of clients become dead and removed, this will not cause a leak on the server resources. The effect is negligible. Now the DataCleanupJob checks for both system stores and normal stores for a store definition. 2) If the store retention days is modified to zero, then the store will delete all the records. But if the store is started with 0 retention days it means the data retention is not enabled. Fixed the discrepancy. 17 August 2016, 17:59:26 UTC
34052d6 Revert "Provide chunk size suggestion for BnP jobs with chunk overflow exceptions and fix num chunks algorith to round up" This reverts commit fdd2ca9fd60c8c6c421465875f74a4c848813046. 11 August 2016, 19:11:38 UTC
afc2024 Releasing Voldemort 1.10.21 Fix release notes. 10 August 2016, 16:14:50 UTC
c7b4c1b Merge pull request #436 from squarY/timeoutfix Fix: Extend the timeout of admin request 10 August 2016, 16:02:13 UTC
bdb660a Fix: extend admin request time out from 1min to 5min. Add more logs when handing failed fetch request. Fix issues based on RB. 10 August 2016, 15:55:03 UTC
fdd2ca9 Provide chunk size suggestion for BnP jobs with chunk overflow exceptions and fix num chunks algorith to round up 05 August 2016, 21:59:40 UTC
6e5e96e Releasing Voldemort 1.10.20 28 July 2016, 23:19:00 UTC
890ad42 RO Store Create floods the Log with error messages Creating a RO store queries for an existing store, which fails with StoreNotFoundException. This exception is logged with call stack, which floods the logs on every store creation. This may trick the alerting system into treating this as error. Not logging a call stack and reducing the log to info, when such exceptions are logged. 28 July 2016, 22:08:25 UTC
069bdac Releasing Voldemort 1.10.19 25 July 2016, 19:10:15 UTC
3bed365 Log verifyOrAddStore time in the logs. Currently the time spent in verify or Add Store is not tracked. This change introduces a log line to track this time. Following log message will be added, for the calls. [18:36:23,113 voldemort.client.protocol.admin.AdminClient] INFO verifyOrAddStore() BootStrapUrls: [tcp://localhost:48150] Store: abc-xyz-read-only Verification Time: 10 ms, Creation Time: 39 ms [main] 21 July 2016, 01:37:07 UTC
498f37e Rest Server Port is not serialized correctly Problems 1) While running ./gradlew clean build, exits the process and the build fails in the middle. 2) When Voldemort server rest validation fails it exits the process. 3) Cluster does not serialize the rest port correctly, which caused the rest port validation to fail. 4) Before auto node detection the tests were using in memory cluster instead of the cluster in the metadata. Now both tests and product code use the same code path, which caused the tests to fail. Fix: 1) Cluster serializes the rest port, if it is greater than zero. 2) When rest server validation fails, it throws an exception, instead of exiting the process. (Searched code for System.exit and Coordinator Server does the same, but saving that for a different day). 3) Node state string contains the rest port, if it is present. 4) Let the RestServiceR2StoreTest fail with an actual error message, instead of boiler plate error message, which made the debugging harder. 20 July 2016, 21:01:49 UTC
74fa5bb InsufficientOperationalNodes concurrent exception From time to time, Insufficient operational nodes can throw concurrent modification exception, as failures is not thread safe list. Modified the list to CopyOnWriteArrayList, the code path is only used when nodes fail, so there should not be any noticable impact to the performance. 20 July 2016, 20:58:01 UTC
1014969 Fix the HintedHandOff flaky tests 1) I made some changes to metadata store in auto detect node id, and noticed these tests were failing. On investigation the test failures are caused by using static variables some of which are modified and based on the ordering they may or may not fail. 2) Removed most of the static usage and made most of them as parameters. I still don't completely understand the test as it is quiet complicated, but sprinkled in some sleep to make sure that slops are registered. Tests passed successfully on 50 continous runs. 20 July 2016, 20:58:01 UTC
436e5c7 Clean Up State after tests Problems : 1) Voldemort Servers are stopped using the server.stop that does not delete the home directory. 2) Store files are created in the /tmp directory which are left behind after the tests. Solution : 1) Use ServerTestUtils.stopServer which deletes the home directory 2) Use ServerTestUtils.createTempFile which sets deleteFileOnExit which deletes the file during the JVM exit stage. All changes are only in test files. No changes to the Product code so there is no risk for the Product code. 20 July 2016, 20:57:53 UTC
4a12dfc VoldemortServerTest fails from time to time BouncyCastleProvider changes the state of the JVM. When it was in 2 different tests, the order was not predictable and if it was executed in different order it failed. Made one new test with right ordering to fix the test. 19 July 2016, 06:37:08 UTC
6e51826 Few more debug fixes Added some debug logging, to trace the socket destination for disconnected sockets. Logged the error on the clientTrafficVeriifer, instead of the printStackTrace which gets lost. 19 July 2016, 06:37:01 UTC
12b0923 Update Node Id and Cluster for Node Detection 1) When Node Id is updated, both Metadata and VoldemortServer is updated. Previously, node id update, only updated metdata. There could be still edge cases with updating node id, as node Id is used at far too many places and cached at few of them. But client read and write is expected to work. (Will update the replace node test to verify the same). 2) When node id detection is enabled, updating the cluster.xml will update the node Id. Voldemort as prior will accept new cluster update, even if it changed the client and admin port. After updating it will error out, though the cluster will be in inconsistent state at this point. The behavior is same as prior except for after completing, it errors out and there by notifying the admin. Pre-check and validation is quiet difficult, because of cyclic dependencies. Added integration tests for the above 2. 19 July 2016, 06:27:30 UTC
fe58130 Validation and Generate Script Utility added for testing purpose The GenereateScript is not secure and it depends on the user to give sanitized input and reasonable script. For the same reason it just generates the output file, which must be manually reviewed before executing the script. GenerateScript is more of use it at your own risk. ValidateNodeIdCLI helps to validate the auto detect node Id, before removing the node Id from all the configs. ValidateNodeIdCLI when combined witht he GenerateScript, can be used to validate the result of entire cluster, safely before removing all the node ids. 19 July 2016, 06:03:19 UTC
28bee82 Detect and Validate node Ids Problem: Voldemort server takes node id as configuration parameters. It relies on the node id to identify its role in the cluster. But most production deployments has only one voldemort server per host in a cluster. Under these conditions the deduction of node ids can be automated. In most of the typical production deployments, only node id changes across the voldemort server configuration. This causes configuration duplication and difficult to manage. Fix : Node auto detection can be enabled ( disabled by default) by the property enable.node.id.detection When enabled host names in cluster.xml will be matched against one of the FQDN ( InetAddress.getLocalHost().getCanonicalHostName() ). Validation can also be enabled by the property validate.node.id . Note that when auto detection is enabled, validatiion is always run. So enable the validation config, only if you want to run the validation, but not auto detection. The Implementation to match is also customizable, but windows and other operating systems are not considered, but it should work. It is customizable only for the purpose of writing tests. Not much work will be required to support those use cases though. Note: Auto detection and validation both will fail when more than one node is hosted on the same machine. In such cases, both the parameters should be left in the default disabled state. Tests will follow in the next commit. 19 July 2016, 05:39:52 UTC
3c864ee Run store/schema verification in parallel to reduce BnP job running time. 15 July 2016, 17:32:15 UTC
7615628 Fix the log message when Fetch is disabled When Fetch is disabled the log message is confusing to the user. 12 July 2016, 00:59:49 UTC
0139f86 GetAll support for Quotas 1) GetAll support for Quotas is added. 2) All admin clients use Node based store, so that they get consistent reesults. 3) Tests for the existing and new functionality. 12 July 2016, 00:59:49 UTC
e4463c9 Releasing Voldemort 1.10.18 28 June 2016, 20:28:46 UTC
415624f Utility method for retrieving a storeDefinition 1) Two new methods for retrieving a store from random node or from a particular node. 2) Enhanced the unit tests to test this new method and made the failure node at random to increase the effectiveness of the test case. 3) Fixed the executorService shutdown in teardown. 28 June 2016, 19:03:03 UTC
cb42bd2 Fetch Single Store only for BnP Store creation BnP supports option for querying only the current store. By default it queries full stores.xml as this option requires some server side changes that will go in 1.10.18 But once the server side changes are deployed, using this on a cluster with large number of stores will speed up the pre-processing. 28 June 2016, 06:29:43 UTC
14818c2 Parallel Operation support for AddStore and SetQuota 1) The network operation on AddStore and setQuota can be parallelized by providing the ExecutorService. 2) By default they are done on the caller thread, if no executorService is provided. 3) refactored verifyOrAddStore into smaller methods and made it more manageable. 4) Added tests for the parallel/executorService support. 5) Added Utility method in QuotaUtils for converting to byte array. 28 June 2016, 05:40:17 UTC
79a0aa3 AdminClientPool, to easily pool AdminClient AdminClientPool is added for managing the pools of AdminClient. AdminClient unlike StoreClient can't be used across Cluster modifications. So previously AdminClient needs to be created every time. this was costly as the connections need to be re-established every time. AdminClientPool solves this problem by discarding AdminClient if cluster is modified. AdminClientPool still does not solve the problem of failing operation during cluster modification. But it will work correctly after the cluster is modified. 28 June 2016, 05:30:48 UTC
807ca49 Add a check in the AdminClient's updateRemoteStoreDefList to verify that breaking changes to stores do not get pushed This way when a store is live, we cannot change (for example) the keySerializer or valueSerializer 27 June 2016, 22:47:44 UTC
a41645d Improve the verify or Add store for RO stores Problem : Adding RO store fetches all stores from each node. For some voldemort clusters, when they have lots of stores, this takes a long amount of time to retrieve all the stores, especially when creating stores across data centers. Fix : Rely on the ClientConfig fetch all stores xml property to see if the server supports retrieving single store XML. Prior voldemort servers does not throw an unique exception when a store is missing. So this fix will require the server side change as well to work correctly. 27 June 2016, 19:44:23 UTC
68b0d54 Add method to make re-use of the AdminClient easier When AdminClient is created newly for each operation, AdminClient needs to re-establish every connection to the Voldemort Server. This makes AdminClient operations take a longer period of time. But if AdminClient is reused across cluster modification, then AdminClient will cause inconsistent operations. This new method will help the caller to identify if the cached AdminClient is still valid and can be re-used. 24 June 2016, 07:29:08 UTC
84372f2 Increase the timeout of vadmin tool Increased the timeout of Vadmin tool to 5 seconds from 500 ms default. 23 June 2016, 22:48:13 UTC
05b8e78 Logging Changes for BnP 1) Currently 2 logging statements for each directory processed. Once build primary replicas only was introduced this is creating large number of logs. 2) With this change it is converted to time or count based. The log will be generated either after 30 seconds or processing 100 directories. 3) Total time for directory processing and the empty directories are outputted at the end of the processing. Sample Output from the run: Processed 0 out of 540 directories. Processed 100 out of 540 directories. Processed 200 out of 540 directories. Processed 300 out of 540 directories. Processed 400 out of 540 directories. Processed 500 out of 540 directories. Total Processed directories: 540. Elapsed Time (Seconds):43 Empty directories: [5, 11, 15, 17, 39, 50, 55, 58, 82, 88, 113, 117, 119, 120, 125, 126, 127, 183, 184, 199, 203, 212, 213, 223, 232, 250, 266, 269, 270, 288, 293, 302, 317, 318, 323, 324, 332, 337, 339, 362, 363, 375, 381, 382, 392, 394, 403, 407, 412, 415, 420, 425, 430, 440, 441, 448, 458, 462, 469, 472, 481, 483, 496, 500, 503, 508, 510, 512, 517, 522, 526, 529] 14 June 2016, 01:04:06 UTC
b331a06 When store is missing, error message is not clear When a store is missing on the Server, Voldemort error message on the client used to say Failed to read metadata key:"XXX" delete config/.temp config/.version directories and restart. Now it says store XXX does not exist on node YY This will be easier to reason from the client perspective. 14 June 2016, 01:03:59 UTC
fc96940 Null Pointer Exception in diff message When comparing for store equality, if one store has null, other has not while trying to append the message it throws NPE. Used String.valueOf which handles null. 09 June 2016, 17:20:06 UTC
baf3948 Standardize on the QuotaTypes 1) setQuota to try all nodes, remember the exception and throw the last Exception. 2) get and unset quota now takes in QuotaType enum instead of the string. 09 June 2016, 02:03:13 UTC
e253e1b Releasing Voldemort 1.10.17 02 June 2016, 01:01:16 UTC
5be43f3 Modify debug info on the Client bootstrap When the client bootstraps, it dumps the clientConfig parameters to the log. Removed a deprecated parameter and added other 3 parameters which are useful in debugging. 02 June 2016, 00:56:44 UTC
34debd3 Reduce the Admin Timeout from Build And Push Reduced the Voldemort Admin timeout to 60 seconds from Build And Push Job. Anything greater than 60 seconds should fail. Occassionally Voldemort build and push jobs hang, when a node crashes in a bad state. This should help recover those cases. 02 June 2016, 00:56:44 UTC
2b5a177 Add idle connection timeout for Client connections If Voldemort client lives behind a firewall, the connections could be dropped by the firewall silently. The firewall also drops any future packets sent on the connection, which causes lot of timeout for low throughput voldemort clients. The fix adds the timeout to the client config, by default the idle connection timeout is disabled. 02 June 2016, 00:56:34 UTC
58a3fdf Don't use Properties(properties) constructor Properties(properties) constructor has a different behavior than the one intended. http://stackoverflow.com/questions/2004833/how-to-merge-two-java-util-properties-objects >>> copy/pasted text <<< However, if you treat it like a Map, you need to be very careful with this: new Properties(defaultProperties); This often catches people out, because it looks like a copy constructor, but it isn't. If you use that constructor, and then call something like keySet() (inherited from its Hashtable superclass), you'll get an empty set, because the Map methods of Properties do not take account of the default Properties object that you passed into the constructor. The defaults are only recognised if you use the methods defined in Properties itself, such as getProperty and propertyNames, among others. 01 June 2016, 06:14:16 UTC
5919e0d Releasing Voldemort 1.10.16 24 May 2016, 02:28:34 UTC
41eda49 Lower the node unavailable error to debug When async connect fails, selector reports the error and it is cached in memory. Next connect call will get the error if it happens within twice the timeout period. The logs are logged at the info level, which spams the logs when the server is unavailable for extended period of times. Now it is dropped down to debug level. 23 May 2016, 22:49:01 UTC
a733c94 Voldemort server cleans up HA state. Added code so that the Voldemort server automatically cleans the shared High Availability state from HDFS when appropriate. Currently, this new code runs when: 1. An old Read-Only store-version is deleted, which usually happens asynchronously after a new store-version is activated. 2. When a server transitions from OFFLINE mode to ONLINE mode. This new benahvior is disabled by default, but can be enabled via: push.ha.state.auto.cleanup=true Also added a bit of extra logging in the impacted code paths. 23 May 2016, 21:39:18 UTC
eaa2293 Option for fetching single or all stores during bootstrap Voldemort servers older than 1.8.1 supported only fetching all stores.xml 1.8.1 supported fetching individual stores. During bootstrap this property controls whether to fetch all stores.xml or only the particular store. Exposed the bootstrap retry time in seconds as a config option as well. Added tests for the new code. 20 May 2016, 20:23:16 UTC
3645040 Client shell throws error on closing Admin and the Client factory uses the same suffix, which tries to delete the already de-registered JMX. Set the different identifier for admin. This is just a minor annoyance when you close the shell, it throws an exception. It is just restricted to voldmeort client shell. The metrics are not used by any one and overwriting the metrics is a non issue. 20 May 2016, 20:23:16 UTC
81c3ec5 Print more info When cluster metadata check fails Print the metadata store version on each node, when the cluster metadata check fails. 20 May 2016, 20:23:16 UTC
a04d589 Route System Store to Same Zone System Store queries are not sent to the same zone. Hacked the PipelineRoutedStore to force the system stores with all routing to prefer the same zone routing first. 20 May 2016, 20:23:16 UTC
4b9849e Set Metadata version node by node Previously Metadata version is set using put operation for entire cluster. As soon as one node succeeds, any failure is silently ignored. This caused the cluster to fall out of sync with clients. Now the Metadata version is set node by node and errors are reported so the operator knows of any issues. Fixed the Admin Command line tools to use the New APIs. 20 May 2016, 20:23:16 UTC
021fb77 Last commit regressed the Hadoop Fetcher There were tests failing, and on investigation it turned out that negation was missed. My preference was to write == false instead of negation, but that is C++ style. 20 May 2016, 20:14:18 UTC
6d58d45 Add retry when calling fs.isFile. Change retryIsFile to isFile and add attempt information in error log. 17 May 2016, 02:02:08 UTC
70b0b00 Upgrade to BDB JE 5.0.104 This is rebased code for the Pull request https://github.com/voldemort/voldemort/pull/247 BDB JE 5.0.104 is available as maven artifact and this removes the last checked in jar from Voldemort. Once this is done, Voldemort new builds can be published to Maven automatically. 08 May 2016, 06:16:05 UTC
1b05d55 Added logging to mention the deprecation of 'num.chunks' in BnP. 06 May 2016, 00:08:43 UTC
1f57965 Removed 'num.chunks' config parameter from BnP. Setting this parameter introduced a subtle failure mode when there were too few records in the store. It's not worth fixing the other bug since 'num.chunks' is automatically calculated anyway, based on input data size. This change removes a bit of rope for users to hang themselves with (: 05 May 2016, 22:58:32 UTC
205a6d5 Renamed AdminStoreSwapper#swapStoreData() to #fetchAndSwapStoreData() Also clarified one of the logs so that BnP announces that it's about to fetch (previously, it mentioned "swap" instead). 05 May 2016, 19:17:09 UTC
b01ad7a Use empty stores for Replace Node CLI test This should have caught a bug, which is already fixed in the last commit. There are still couple of bugs in the Offline Mode. AdminClient storeOps uses socketPort, which will be down in the offline mode. AdminPort supports full client operations and hence it should have used the AdminPort. AdminClient on the voldemort server uses cluster for bootstrapping which uses the client port again. This is problematic when the node is node 0. It should use the Admin Port for bootstrapping. Those bugs are in backlog will fix them later. 28 April 2016, 00:49:58 UTC
701e976 Metadata check to include quota check 1) Reliably set Quota on all nodes. 2) Meta check will check for quota on all nodes. 3) Meta check will ignore 0 length RO files 4) Meta check will skip nodes with 0 partitions when verifying store can be fetched. 5) ReplaceNodeCLI node validation and ignore all errors from failing node. 6) Set Quota to report the correct node it is going to run against. 7) When a store is created, set the default quota directly instead of using an admin client to write to the same node. 8) Tests for the issue fixed above. All operability improvement for working with quota. 25 April 2016, 23:24:05 UTC
b128422 ContribJar is missing in the Tar/Zip Some of the commands from the shell fails because of missing contribJar in the tar ball. Include contribJar and protobufJar from the dist directory. 18 April 2016, 21:13:15 UTC
7128033 DataCleanup Job dynamic retention days and deletion Problems Fixed: 1) If a store with data retention is deleted, it incorrectly holds the lock there by prevening other jobs from running. 2) If a store's data retention is modified, it requires the cluster to be bounced as the retention day is read at the start and never altered. Problem not fixed: If the data cleanup job retention frequency is set for the store and if the frequency is modified, it would still require a cluster bounce. Fix: 1) Instead of taking the store retention time, data cleanup job takes in the store name and metadata store and computes the retention time dynamically for each run. 2) When a store can't be retrieved from the metadata store the cleanup job is skipped. 3) All the code after the lock acquistion is moved inside the try/finally block. Previously beginBatchModification was outside of the lock, which threw exception and caused the lock to be not freed. 4) Added unit tests for both the problems fixed. 18 April 2016, 21:13:14 UTC
c8c457f Releasing Voldemort 1.10.15 18 April 2016, 19:56:32 UTC
43f344a The AdminClient's verifyOrAddStore is vulnerable to connectivity issues. This commit makes the following changes: - verifyOrAddStore() is now more resilient to various kinds of exceptions. - VoldemortBuildAndPushJob now logs verifyOrAddStore's exceptions. - ExceptionUtils.recursiveClassEquals() can now look for many exception types. - Added ExceptionUtilsTest. 16 April 2016, 00:00:32 UTC
64eaff0 Multi module gradle build for voldemort Each contrib directory now becomes a separate project (although the no new build.gradle files have been created). This makes the import into Intellij better as it can setup the contrib source roots correctly. It also opens the door to refactoring the project to be more conventional in it's layout and artifact publishing. gradle 2.9 + fixing the eclipse project generation Had to rework the eclipse project generation configuration as it seems to disagree with the multi module project structure. The upgrade to gradle 2.9 fixes as issue with the eclipse generator where the JDK would be inserted twice into the .classpath file. Note that you can use the eclipse import gradle project feature, the only gotcha is that it defaults the output directory to /bin, which is means the script directory gets nuke'd by eclipse on rebuild. As a work around this config can be changed manually and then the bin dir restored from git if anybody prefers to use the native gradle support in eclipse. Have also added defaults for the test resource directory, this prevents IDEA from creating src/test/resources which is a little confusing. Note this will only happen if you use 'Create empty content roots' option in IDEA. Javadoc generation is disabled by default. The javadoc can be reenabled using -Pjavadoc.enabled=true The fix for zip & tar is picked from https://github.com/arunthirupathi/voldemort/commit/2e0e9cf58a5fd9dd386dcbed5f1af2e497290e2c 09 April 2016, 12:50:43 UTC
464a348 Set-Metadata null Version error Set Metadata might result in the following error as listFiles can return null java.lang.NullPointerException at voldemort.store.configuration.ConfigurationStorageEngine.put(ConfigurationStorageEngine.java:146) at voldemort.store.configuration.ConfigurationStorageEngine.put(ConfigurationStorageEngine.java:50) at voldemort.store.metadata.MetadataStore.put(MetadataStore.java:355) 30 March 2016, 23:57:27 UTC
6809fce Releasing Voldemort 1.10.14 30 March 2016, 00:29:17 UTC
74b2c09 Merge pull request #396 from gaojieliu/FileFetcher_Stats Expose data points in FileFetcher and create autometrics sensors for them 30 March 2016, 00:02:55 UTC
0820122 This change is mostly to expose more aggregated metrics for HDFS data pushes: 1. totalBytesFetched : the total bytes transferred from HDFS so far; 2. totalFetchRetries : the total fetch retry number so far; 3. totalCheckSumFailures : the total data file checksum failures happened so far; 4. totalAuthenticationFailures : the total authentication failures happened so far; 5. totalFileNotFoundFailures : the total file-not-found failures happened so far; 6. totalFileReadFailures : the total HDFS file read failures happened so far; 7. totalQuotaExceedFailures : the total quota exceed failures happened so far; 8. totalUnauthorizedStoreFailures : the total unauthorized store push failures happened so far; 9. parallelFetches : the total number of active fetches right now; 10. totalFetches : the total HDFS fetch number so far; 11. totalIncompleteFetches : the total incomplete fetch number so far; 12. totalDataFetchRate : the total data fetch rate right now; 29 March 2016, 23:41:58 UTC
64726a9 update checkout from disk instead of memory buffer. 29 March 2016, 02:02:50 UTC
46651fa Update version after updating the data Currently the version is updated before the data and it makes the client to read the wrong values. Update the Version after updating the data, so when the client reads they have the correct value. 25 March 2016, 20:15:10 UTC
d0364a3 minor change in StoreSwapperTest#testConcurrencyPush 25 March 2016, 00:51:25 UTC
66bfe3b Hdfs FileSystem handles are leaked in Fetch Issue : After the HadoopFileSystem object is created, the validity of the fileSystem is verified by doing a sample operation. If the operation fails the Hadoop FileSystem object is leaked. This object should be cleaned up by the Garbage collection, but all the FileSystem objects are cached, so this is leaked. When voldemort server is used with secure webhdfs (swebhdfs) file system it leaks enough memory to kill the servers eventually. Previously in voldemort webhdfs file system handles were leaked. Apparently webhdfs file system handles are very cheap. But in SwebHdfs they have the security certificate embedded in them. This causes them to be very big. Heap Dump analysis WebHdfsFileSystem - 3768 Objects - 80 MB SWebHdfsFileSystem - 1748 Objects - 3 GB Solution : 1) Disable the caching for the following reason Hadoop FileSystem class caches the FileSystem objects based on the scheme , authority and UserGroupInformation. The default config was to generate new UserGroupInformation for each call, so the cache will be never hit. In the case where the FileSystem is not closed correctly, it will leak handles. But if the UserGroupInformation is re-used, it will cause the FileSystem object to be shared between HdfsFetcher / HdfsFailedFetchLock. Each Voldemort HdfsFetcher/HAFailedFetchLock lock closes the fileSystem object at the end, though others might still be using it. This causes random failures. Since it does not work in both the cases, the Caching is disabled. The caching should be only enabled if the UserGroupInformation is to be re-used and the close bug is fixed. 2) Clean up the file handles on the error cases. Traced down all the handles and cleaned them up on the error path. 24 March 2016, 18:29:35 UTC
af3d605 Check server state when bring nodes back Fixed the issue when starting up Voldemort in offilne state, it actually goes online and keeps listening client requests. 23 March 2016, 21:36:12 UTC
7f6ff63 fixing a NPE thrown by the StorageEngineService on startup with views Result of the getCapability() implementation returning a null instead of throwing the required NoSuchCapabilityException. Example of the exception: Exception in thread "main" java.lang.NullPointerException at voldemort.server.storage.StorageService.startInner(StorageService.java:424) at voldemort.common.service.AbstractService.start(AbstractService.java:62) at voldemort.server.VoldemortServer.startInner(VoldemortServer.java:374) at voldemort.common.service.AbstractService.start(AbstractService.java:62) at voldemort.server.VoldemortServer.main(VoldemortServer.java:437) 18 March 2016, 20:37:59 UTC
394898e This update includes two parts: 1. build.gradle change is to fix 'Wrong package statement' error when import voldemort to IntelliJ; 2. VoldemortBuildAndPushJob.java change is to make log message clearer when delete the temp files in grid generated by b&p job; 12 March 2016, 01:07:06 UTC
back to top