Revision history - refs/heads/adminClientFixes - origin: https://github.com/voldemort/voldemort

visit type:

Revision	Author	Date	Message	Commit Date
415624f	Arunachalam Thirupathi	28 June 2016, 19:03:03 UTC	Utility method for retrieving a storeDefinition 1) Two new methods for retrieving a store from random node or from a particular node. 2) Enhanced the unit tests to test this new method and made the failure node at random to increase the effectiveness of the test case. 3) Fixed the executorService shutdown in teardown.	28 June 2016, 19:03:03 UTC
cb42bd2	ARUNACHALAM THIRUPATHI	28 June 2016, 06:29:43 UTC	Fetch Single Store only for BnP Store creation BnP supports option for querying only the current store. By default it queries full stores.xml as this option requires some server side changes that will go in 1.10.18 But once the server side changes are deployed, using this on a cluster with large number of stores will speed up the pre-processing.	28 June 2016, 06:29:43 UTC
14818c2	Arunachalam Thirupathi	28 June 2016, 05:27:08 UTC	Parallel Operation support for AddStore and SetQuota 1) The network operation on AddStore and setQuota can be parallelized by providing the ExecutorService. 2) By default they are done on the caller thread, if no executorService is provided. 3) refactored verifyOrAddStore into smaller methods and made it more manageable. 4) Added tests for the parallel/executorService support. 5) Added Utility method in QuotaUtils for converting to byte array.	28 June 2016, 05:40:17 UTC
79a0aa3	Arunachalam Thirupathi	27 June 2016, 22:37:38 UTC	AdminClientPool, to easily pool AdminClient AdminClientPool is added for managing the pools of AdminClient. AdminClient unlike StoreClient can't be used across Cluster modifications. So previously AdminClient needs to be created every time. this was costly as the connections need to be re-established every time. AdminClientPool solves this problem by discarding AdminClient if cluster is modified. AdminClientPool still does not solve the problem of failing operation during cluster modification. But it will work correctly after the cluster is modified.	28 June 2016, 05:30:48 UTC
807ca49	Matthew Wise	23 June 2016, 00:01:42 UTC	Add a check in the AdminClient's updateRemoteStoreDefList to verify that breaking changes to stores do not get pushed This way when a store is live, we cannot change (for example) the keySerializer or valueSerializer	27 June 2016, 22:47:44 UTC
a41645d	ARUNACHALAM THIRUPATHI	24 June 2016, 07:11:13 UTC	Improve the verify or Add store for RO stores Problem : Adding RO store fetches all stores from each node. For some voldemort clusters, when they have lots of stores, this takes a long amount of time to retrieve all the stores, especially when creating stores across data centers. Fix : Rely on the ClientConfig fetch all stores xml property to see if the server supports retrieving single store XML. Prior voldemort servers does not throw an unique exception when a store is missing. So this fix will require the server side change as well to work correctly.	27 June 2016, 19:44:23 UTC
68b0d54	ARUNACHALAM THIRUPATHI	24 June 2016, 06:46:25 UTC	Add method to make re-use of the AdminClient easier When AdminClient is created newly for each operation, AdminClient needs to re-establish every connection to the Voldemort Server. This makes AdminClient operations take a longer period of time. But if AdminClient is reused across cluster modification, then AdminClient will cause inconsistent operations. This new method will help the caller to identify if the cached AdminClient is still valid and can be re-used.	24 June 2016, 07:29:08 UTC
84372f2	Arunachalam Thirupathi	23 June 2016, 22:48:13 UTC	Increase the timeout of vadmin tool Increased the timeout of Vadmin tool to 5 seconds from 500 ms default.	23 June 2016, 22:48:13 UTC
05b8e78	ARUNACHALAM THIRUPATHI	10 June 2016, 15:27:29 UTC	Logging Changes for BnP 1) Currently 2 logging statements for each directory processed. Once build primary replicas only was introduced this is creating large number of logs. 2) With this change it is converted to time or count based. The log will be generated either after 30 seconds or processing 100 directories. 3) Total time for directory processing and the empty directories are outputted at the end of the processing. Sample Output from the run: Processed 0 out of 540 directories. Processed 100 out of 540 directories. Processed 200 out of 540 directories. Processed 300 out of 540 directories. Processed 400 out of 540 directories. Processed 500 out of 540 directories. Total Processed directories: 540. Elapsed Time (Seconds):43 Empty directories: [5, 11, 15, 17, 39, 50, 55, 58, 82, 88, 113, 117, 119, 120, 125, 126, 127, 183, 184, 199, 203, 212, 213, 223, 232, 250, 266, 269, 270, 288, 293, 302, 317, 318, 323, 324, 332, 337, 339, 362, 363, 375, 381, 382, 392, 394, 403, 407, 412, 415, 420, 425, 430, 440, 441, 448, 458, 462, 469, 472, 481, 483, 496, 500, 503, 508, 510, 512, 517, 522, 526, 529]	14 June 2016, 01:04:06 UTC
b331a06	ARUNACHALAM THIRUPATHI	03 June 2016, 06:15:44 UTC	When store is missing, error message is not clear When a store is missing on the Server, Voldemort error message on the client used to say Failed to read metadata key:"XXX" delete config/.temp config/.version directories and restart. Now it says store XXX does not exist on node YY This will be easier to reason from the client perspective.	14 June 2016, 01:03:59 UTC
fc96940	Arunachalam Thirupathi	09 June 2016, 17:20:06 UTC	Null Pointer Exception in diff message When comparing for store equality, if one store has null, other has not while trying to append the message it throws NPE. Used String.valueOf which handles null.	09 June 2016, 17:20:06 UTC
baf3948	Arunachalam Thirupathi	09 June 2016, 02:03:13 UTC	Standardize on the QuotaTypes 1) setQuota to try all nodes, remember the exception and throw the last Exception. 2) get and unset quota now takes in QuotaType enum instead of the string.	09 June 2016, 02:03:13 UTC
e253e1b	Arunachalam Thirupathi	02 June 2016, 01:01:16 UTC	Releasing Voldemort 1.10.17	02 June 2016, 01:01:16 UTC
5be43f3	Arunachalam Thirupathi	01 June 2016, 18:26:42 UTC	Modify debug info on the Client bootstrap When the client bootstraps, it dumps the clientConfig parameters to the log. Removed a deprecated parameter and added other 3 parameters which are useful in debugging.	02 June 2016, 00:56:44 UTC
34debd3	ARUNACHALAM THIRUPATHI	01 June 2016, 15:05:47 UTC	Reduce the Admin Timeout from Build And Push Reduced the Voldemort Admin timeout to 60 seconds from Build And Push Job. Anything greater than 60 seconds should fail. Occassionally Voldemort build and push jobs hang, when a node crashes in a bad state. This should help recover those cases.	02 June 2016, 00:56:44 UTC
2b5a177	Arunachalam Thirupathi	01 June 2016, 06:18:40 UTC	Add idle connection timeout for Client connections If Voldemort client lives behind a firewall, the connections could be dropped by the firewall silently. The firewall also drops any future packets sent on the connection, which causes lot of timeout for low throughput voldemort clients. The fix adds the timeout to the client config, by default the idle connection timeout is disabled.	02 June 2016, 00:56:34 UTC
58a3fdf	Arunachalam Thirupathi	01 June 2016, 06:14:16 UTC	Don't use Properties(properties) constructor Properties(properties) constructor has a different behavior than the one intended. http://stackoverflow.com/questions/2004833/how-to-merge-two-java-util-properties-objects >>> copy/pasted text <<< However, if you treat it like a Map, you need to be very careful with this: new Properties(defaultProperties); This often catches people out, because it looks like a copy constructor, but it isn't. If you use that constructor, and then call something like keySet() (inherited from its Hashtable superclass), you'll get an empty set, because the Map methods of Properties do not take account of the default Properties object that you passed into the constructor. The defaults are only recognised if you use the methods defined in Properties itself, such as getProperty and propertyNames, among others.	01 June 2016, 06:14:16 UTC
5919e0d	Arunachalam Thirupathi	24 May 2016, 02:28:34 UTC	Releasing Voldemort 1.10.16	24 May 2016, 02:28:34 UTC
41eda49	Arunachalam Thirupathi	23 May 2016, 22:49:01 UTC	Lower the node unavailable error to debug When async connect fails, selector reports the error and it is cached in memory. Next connect call will get the error if it happens within twice the timeout period. The logs are logged at the info level, which spams the logs when the server is unavailable for extended period of times. Now it is dropped down to debug level.	23 May 2016, 22:49:01 UTC
a733c94	Felix GV	17 May 2016, 01:24:50 UTC	Voldemort server cleans up HA state. Added code so that the Voldemort server automatically cleans the shared High Availability state from HDFS when appropriate. Currently, this new code runs when: 1. An old Read-Only store-version is deleted, which usually happens asynchronously after a new store-version is activated. 2. When a server transitions from OFFLINE mode to ONLINE mode. This new benahvior is disabled by default, but can be enabled via: push.ha.state.auto.cleanup=true Also added a bit of extra logging in the impacted code paths.	23 May 2016, 21:39:18 UTC
eaa2293	ARUNACHALAM THIRUPATHI	12 May 2016, 15:44:00 UTC	Option for fetching single or all stores during bootstrap Voldemort servers older than 1.8.1 supported only fetching all stores.xml 1.8.1 supported fetching individual stores. During bootstrap this property controls whether to fetch all stores.xml or only the particular store. Exposed the bootstrap retry time in seconds as a config option as well. Added tests for the new code.	20 May 2016, 20:23:16 UTC
3645040	ARUNACHALAM THIRUPATHI	12 May 2016, 15:42:42 UTC	Client shell throws error on closing Admin and the Client factory uses the same suffix, which tries to delete the already de-registered JMX. Set the different identifier for admin. This is just a minor annoyance when you close the shell, it throws an exception. It is just restricted to voldmeort client shell. The metrics are not used by any one and overwriting the metrics is a non issue.	20 May 2016, 20:23:16 UTC
81c3ec5	Arunachalam Thirupathi	09 May 2016, 23:30:59 UTC	Print more info When cluster metadata check fails Print the metadata store version on each node, when the cluster metadata check fails.	20 May 2016, 20:23:16 UTC
a04d589	Arunachalam Thirupathi	09 May 2016, 23:28:22 UTC	Route System Store to Same Zone System Store queries are not sent to the same zone. Hacked the PipelineRoutedStore to force the system stores with all routing to prefer the same zone routing first.	20 May 2016, 20:23:16 UTC
4b9849e	Arunachalam Thirupathi	09 May 2016, 23:23:06 UTC	Set Metadata version node by node Previously Metadata version is set using put operation for entire cluster. As soon as one node succeeds, any failure is silently ignored. This caused the cluster to fall out of sync with clients. Now the Metadata version is set node by node and errors are reported so the operator knows of any issues. Fixed the Admin Command line tools to use the New APIs.	20 May 2016, 20:23:16 UTC
021fb77	Arunachalam Thirupathi	20 May 2016, 20:14:18 UTC	Last commit regressed the Hadoop Fetcher There were tests failing, and on investigation it turned out that negation was missed. My preference was to write == false instead of negation, but that is C++ style.	20 May 2016, 20:14:18 UTC
6d58d45	Yan Yan	16 May 2016, 23:46:33 UTC	Add retry when calling fs.isFile. Change retryIsFile to isFile and add attempt information in error log.	17 May 2016, 02:02:08 UTC
70b0b00	ARUNACHALAM THIRUPATHI	08 May 2016, 06:16:05 UTC	Upgrade to BDB JE 5.0.104 This is rebased code for the Pull request https://github.com/voldemort/voldemort/pull/247 BDB JE 5.0.104 is available as maven artifact and this removes the last checked in jar from Voldemort. Once this is done, Voldemort new builds can be published to Maven automatically.	08 May 2016, 06:16:05 UTC
1b05d55	Felix GV	06 May 2016, 00:08:43 UTC	Added logging to mention the deprecation of 'num.chunks' in BnP.	06 May 2016, 00:08:43 UTC
1f57965	Felix GV	05 May 2016, 22:52:50 UTC	Removed 'num.chunks' config parameter from BnP. Setting this parameter introduced a subtle failure mode when there were too few records in the store. It's not worth fixing the other bug since 'num.chunks' is automatically calculated anyway, based on input data size. This change removes a bit of rope for users to hang themselves with (:	05 May 2016, 22:58:32 UTC
205a6d5	Felix GV	05 May 2016, 19:17:09 UTC	Renamed AdminStoreSwapper#swapStoreData() to #fetchAndSwapStoreData() Also clarified one of the logs so that BnP announces that it's about to fetch (previously, it mentioned "swap" instead).	05 May 2016, 19:17:09 UTC
b01ad7a	Arunachalam Thirupathi	28 April 2016, 00:49:58 UTC	Use empty stores for Replace Node CLI test This should have caught a bug, which is already fixed in the last commit. There are still couple of bugs in the Offline Mode. AdminClient storeOps uses socketPort, which will be down in the offline mode. AdminPort supports full client operations and hence it should have used the AdminPort. AdminClient on the voldemort server uses cluster for bootstrapping which uses the client port again. This is problematic when the node is node 0. It should use the Admin Port for bootstrapping. Those bugs are in backlog will fix them later.	28 April 2016, 00:49:58 UTC
701e976	Arunachalam Thirupathi	23 April 2016, 01:21:47 UTC	Metadata check to include quota check 1) Reliably set Quota on all nodes. 2) Meta check will check for quota on all nodes. 3) Meta check will ignore 0 length RO files 4) Meta check will skip nodes with 0 partitions when verifying store can be fetched. 5) ReplaceNodeCLI node validation and ignore all errors from failing node. 6) Set Quota to report the correct node it is going to run against. 7) When a store is created, set the default quota directly instead of using an admin client to write to the same node. 8) Tests for the issue fixed above. All operability improvement for working with quota.	25 April 2016, 23:24:05 UTC
b128422	ARUNACHALAM THIRUPATHI	18 April 2016, 00:27:10 UTC	ContribJar is missing in the Tar/Zip Some of the commands from the shell fails because of missing contribJar in the tar ball. Include contribJar and protobufJar from the dist directory.	18 April 2016, 21:13:15 UTC
7128033	ARUNACHALAM THIRUPATHI	18 April 2016, 00:12:54 UTC	DataCleanup Job dynamic retention days and deletion Problems Fixed: 1) If a store with data retention is deleted, it incorrectly holds the lock there by prevening other jobs from running. 2) If a store's data retention is modified, it requires the cluster to be bounced as the retention day is read at the start and never altered. Problem not fixed: If the data cleanup job retention frequency is set for the store and if the frequency is modified, it would still require a cluster bounce. Fix: 1) Instead of taking the store retention time, data cleanup job takes in the store name and metadata store and computes the retention time dynamically for each run. 2) When a store can't be retrieved from the metadata store the cleanup job is skipped. 3) All the code after the lock acquistion is moved inside the try/finally block. Previously beginBatchModification was outside of the lock, which threw exception and caused the lock to be not freed. 4) Added unit tests for both the problems fixed.	18 April 2016, 21:13:14 UTC
c8c457f	Sidian Wu	18 April 2016, 19:56:32 UTC	Releasing Voldemort 1.10.15	18 April 2016, 19:56:32 UTC
43f344a	Felix GV	15 April 2016, 18:32:40 UTC	The AdminClient's verifyOrAddStore is vulnerable to connectivity issues. This commit makes the following changes: - verifyOrAddStore() is now more resilient to various kinds of exceptions. - VoldemortBuildAndPushJob now logs verifyOrAddStore's exceptions. - ExceptionUtils.recursiveClassEquals() can now look for many exception types. - Added ExceptionUtilsTest.	16 April 2016, 00:00:32 UTC
64eaff0	Gareth Davis	23 November 2015, 11:42:10 UTC	Multi module gradle build for voldemort Each contrib directory now becomes a separate project (although the no new build.gradle files have been created). This makes the import into Intellij better as it can setup the contrib source roots correctly. It also opens the door to refactoring the project to be more conventional in it's layout and artifact publishing. gradle 2.9 + fixing the eclipse project generation Had to rework the eclipse project generation configuration as it seems to disagree with the multi module project structure. The upgrade to gradle 2.9 fixes as issue with the eclipse generator where the JDK would be inserted twice into the .classpath file. Note that you can use the eclipse import gradle project feature, the only gotcha is that it defaults the output directory to /bin, which is means the script directory gets nuke'd by eclipse on rebuild. As a work around this config can be changed manually and then the bin dir restored from git if anybody prefers to use the native gradle support in eclipse. Have also added defaults for the test resource directory, this prevents IDEA from creating src/test/resources which is a little confusing. Note this will only happen if you use 'Create empty content roots' option in IDEA. Javadoc generation is disabled by default. The javadoc can be reenabled using -Pjavadoc.enabled=true The fix for zip & tar is picked from https://github.com/arunthirupathi/voldemort/commit/2e0e9cf58a5fd9dd386dcbed5f1af2e497290e2c	09 April 2016, 12:50:43 UTC
464a348	Arunachalam Thirupathi	25 March 2016, 21:27:25 UTC	Set-Metadata null Version error Set Metadata might result in the following error as listFiles can return null java.lang.NullPointerException at voldemort.store.configuration.ConfigurationStorageEngine.put(ConfigurationStorageEngine.java:146) at voldemort.store.configuration.ConfigurationStorageEngine.put(ConfigurationStorageEngine.java:50) at voldemort.store.metadata.MetadataStore.put(MetadataStore.java:355)	30 March 2016, 23:57:27 UTC
6809fce	Sidian Wu	30 March 2016, 00:12:13 UTC	Releasing Voldemort 1.10.14	30 March 2016, 00:29:17 UTC
74b2c09	gaojieliu	30 March 2016, 00:02:55 UTC	Merge pull request #396 from gaojieliu/FileFetcher_Stats Expose data points in FileFetcher and create autometrics sensors for them	30 March 2016, 00:02:55 UTC
0820122	Gaojie Liu	29 March 2016, 23:20:26 UTC	This change is mostly to expose more aggregated metrics for HDFS data pushes: 1. totalBytesFetched : the total bytes transferred from HDFS so far; 2. totalFetchRetries : the total fetch retry number so far; 3. totalCheckSumFailures : the total data file checksum failures happened so far; 4. totalAuthenticationFailures : the total authentication failures happened so far; 5. totalFileNotFoundFailures : the total file-not-found failures happened so far; 6. totalFileReadFailures : the total HDFS file read failures happened so far; 7. totalQuotaExceedFailures : the total quota exceed failures happened so far; 8. totalUnauthorizedStoreFailures : the total unauthorized store push failures happened so far; 9. parallelFetches : the total number of active fetches right now; 10. totalFetches : the total HDFS fetch number so far; 11. totalIncompleteFetches : the total incomplete fetch number so far; 12. totalDataFetchRate : the total data fetch rate right now;	29 March 2016, 23:41:58 UTC
64726a9	Sidian Wu	28 March 2016, 19:10:19 UTC	update checkout from disk instead of memory buffer.	29 March 2016, 02:02:50 UTC
46651fa	Arunachalam Thirupathi	19 March 2016, 02:14:19 UTC	Update version after updating the data Currently the version is updated before the data and it makes the client to read the wrong values. Update the Version after updating the data, so when the client reads they have the correct value.	25 March 2016, 20:15:10 UTC
d0364a3	Sidian Wu	25 March 2016, 00:37:25 UTC	minor change in StoreSwapperTest#testConcurrencyPush	25 March 2016, 00:51:25 UTC
66bfe3b	Arunachalam Thirupathi	22 March 2016, 18:19:37 UTC	Hdfs FileSystem handles are leaked in Fetch Issue : After the HadoopFileSystem object is created, the validity of the fileSystem is verified by doing a sample operation. If the operation fails the Hadoop FileSystem object is leaked. This object should be cleaned up by the Garbage collection, but all the FileSystem objects are cached, so this is leaked. When voldemort server is used with secure webhdfs (swebhdfs) file system it leaks enough memory to kill the servers eventually. Previously in voldemort webhdfs file system handles were leaked. Apparently webhdfs file system handles are very cheap. But in SwebHdfs they have the security certificate embedded in them. This causes them to be very big. Heap Dump analysis WebHdfsFileSystem - 3768 Objects - 80 MB SWebHdfsFileSystem - 1748 Objects - 3 GB Solution : 1) Disable the caching for the following reason Hadoop FileSystem class caches the FileSystem objects based on the scheme , authority and UserGroupInformation. The default config was to generate new UserGroupInformation for each call, so the cache will be never hit. In the case where the FileSystem is not closed correctly, it will leak handles. But if the UserGroupInformation is re-used, it will cause the FileSystem object to be shared between HdfsFetcher / HdfsFailedFetchLock. Each Voldemort HdfsFetcher/HAFailedFetchLock lock closes the fileSystem object at the end, though others might still be using it. This causes random failures. Since it does not work in both the cases, the Caching is disabled. The caching should be only enabled if the UserGroupInformation is to be re-used and the close bug is fixed. 2) Clean up the file handles on the error cases. Traced down all the handles and cleaned them up on the error path.	24 March 2016, 18:29:35 UTC
af3d605	Sidian Wu	22 March 2016, 01:51:10 UTC	Check server state when bring nodes back Fixed the issue when starting up Voldemort in offilne state, it actually goes online and keeps listening client requests.	23 March 2016, 21:36:12 UTC
7f6ff63	Gareth Davis	01 December 2015, 15:46:47 UTC	fixing a NPE thrown by the StorageEngineService on startup with views Result of the getCapability() implementation returning a null instead of throwing the required NoSuchCapabilityException. Example of the exception: Exception in thread "main" java.lang.NullPointerException at voldemort.server.storage.StorageService.startInner(StorageService.java:424) at voldemort.common.service.AbstractService.start(AbstractService.java:62) at voldemort.server.VoldemortServer.startInner(VoldemortServer.java:374) at voldemort.common.service.AbstractService.start(AbstractService.java:62) at voldemort.server.VoldemortServer.main(VoldemortServer.java:437)	18 March 2016, 20:37:59 UTC
394898e	Gaojie Liu	11 March 2016, 23:36:37 UTC	This update includes two parts: 1. build.gradle change is to fix 'Wrong package statement' error when import voldemort to IntelliJ; 2. VoldemortBuildAndPushJob.java change is to make log message clearer when delete the temp files in grid generated by b&p job;	12 March 2016, 01:07:06 UTC
114e88a	Sidian Wu	11 March 2016, 00:56:55 UTC	Disable store creation on the BnP side Refactored Admin.storeMgmtOps#verifyOrAddStore. It now takes a new boolean argument "creationStore" to decide whether or not adds new stores into the cluster if they are not found. Add a boolean value eableStoreCreation in VoldemortBuildAndPushJob.	11 March 2016, 00:56:55 UTC
3cfec9e	Sidian Wu	02 March 2016, 21:23:56 UTC	Resolved concurrency push conflict and add unit test, refactor quirky AdmintClient used in HdfsFetcher, and add hyperlink in README.md 1. It was very likely to cause content conflict when multiply jobs try to push to the same store simultaneously. We add a hashmap in AdminRequestHandler to check whether a store is currently fetching or not. And if so, we block the rest of fetching request and throw an exception. 2. Added a unit test case under StoreSwapperTest. 3. The HdfsFetcher used to create an admintClient instance and asked itself "diskQuotaSizeInKB". This was a quirky used adminClient. We removed it and passed quota size directly from AdminRequestHandler. 4. Add a hyperlink in README.md to "A quick git guild for people who want to make contributions to Voldemort".	10 March 2016, 19:51:21 UTC
17f0def	Sidian Wu	28 February 2016, 11:11:42 UTC	Better error message when pushing to a store with storage quota 0 Modidied AdminStoreSwapper#nvokeFetch() to throw originial sub-thread execption and changed InvalidBostrapURLException to make it more clear.	02 March 2016, 01:16:28 UTC
179347a	Arunachalam Thirupathi	01 March 2016, 01:06:21 UTC	Report metrics for Scheduler and Async Service Scheduler Service reports tasks currently running and number of tasks in the queue Async Service reports cumulative Wait time and number of tasks waiting in the queue.	01 March 2016, 01:06:21 UTC
9f37c86	Arunachalam Thirupathi	09 February 2016, 02:18:58 UTC	Releasing Voldemort 1.10.13	09 February 2016, 02:18:58 UTC
abcf36a	Arunachalam Thirupathi	08 February 2016, 21:50:03 UTC	Gradle Protobuf shadowed jar Gradle Protobuf shadowed jar	09 February 2016, 02:11:05 UTC
ca2c427	Sidian Wu	22 January 2016, 19:40:37 UTC	ClientRequestExecutor Pool test unit test failure Previously by mistake it was assumed that JDK changed the exception behavior. But the problem was host/domain unknown.host was registered which caused the code to fail. Now since the host is changed to unknown.invalid the test is returned to the previous stage.	22 January 2016, 19:40:37 UTC
7e104c1	Felix GV	19 January 2016, 19:14:06 UTC	Better exception handling in HdfsFetcher#fetchFromSource(). Previously, a FileSystem#close() operation was guarded by a try/catch looking for IOExceptions only. It turns out this close() function can sometimes throw NPEs as well. This commit expands the catch clause to any exception type, and updates the logged message accordingly.	19 January 2016, 19:59:09 UTC
8da3c67	Felix GV	12 January 2016, 06:59:04 UTC	Releasing Voldemort 1.10.12	12 January 2016, 06:59:21 UTC
5ee62b6	Felix GV	12 January 2016, 03:08:56 UTC	Changed HDFS directory size measurement in HdfsFetcher. This fixes two issues in 'build.primary.replicas.only': 1. The progress report in the AsyncTask was wrong, which also resulted in erroneous logs in the BnP job. 2. The previous implementation was DDOSing the NameNode with a furious amount of ListStatus operation, most of which it did not even need to get the result of. Also added retry with random back off when the ListStatus operations fail, and cleaned up HdfsFetcherAdvancedTest a little bit.	12 January 2016, 06:56:22 UTC
1fd44c9	Felix GV	09 January 2016, 02:18:04 UTC	Cleaned up the HadoopStoreWriterTest. It now leverages the BuildAndPushMapper code in order to generate files with the appropriate Read-Only formats. The tests in this class were broken since the more stringent sanity checks introduced in 421ada5e52c4db9330776beb2d3c4e7dbf720e2b.	09 January 2016, 02:48:44 UTC
490d2a1	Arunachalam Thirupathi	08 January 2016, 19:35:41 UTC	Releasing Voldemort 1.10.11	08 January 2016, 19:35:41 UTC
b46c97f	Gareth Davis	08 January 2016, 14:43:28 UTC	fixing the 'unknown.host' test in ClientRequestExecutorPoolTest it turns out that somebody has registered 'unknown.host': $ host unknown.host unknown.host has address 188.68.51.215 unknown.host mail is handled by 10 serpens.uberspace.de. which is kinda of annoying. Have switched to using unknown.invalid as this is reserved and should never resolve.	08 January 2016, 19:16:37 UTC
421ada5	Felix GV	08 January 2016, 10:22:07 UTC	Bug fixes and improvements for Build and Push. - HadoopStoreWriter did not handle multiple chunks correctly which caused reduce tasks to step on each others' toes. Fixed. - Changed default 'reducer.per.bucket' config to true. - AbstractStoreBuilderConfigurable.getPartition() was broken when 'build.primary.replicas.only' was enabled. Fixed it and added a lot more comments. - Added defensive code in HadoopStoreWriter to catch future regressions in the partitioning (shuffling) code. - Improved logging in Props and HadoopStoreBuilder.	08 January 2016, 18:55:27 UTC
b8d4f03	Felix GV	08 January 2016, 06:56:06 UTC	Fixed a bug where the Read-Only data directory config is not honored properly. This fixes a regression introduced in a7061474f905fe60a9581dbf4e14431d00b130a2.	08 January 2016, 07:03:50 UTC
6aec0c1	Arunachalam Thirupathi	07 January 2016, 02:25:21 UTC	Releasing Voldemort 1.10.10	07 January 2016, 02:25:21 UTC
2b65a88	ARUNACHALAM THIRUPATHI	06 January 2016, 08:24:51 UTC	AdminClient does not obey the Client Timeout 1) For BuildAndPush If the Voldemort Cluster is far away from the Hadoop Cluster, the connection timeout causes the Job to fail. Increased the connection timeout in Azkaban.* files for this issue. 2) ClientConfig timeout is ignored by the AdminClient bootstrap methods and it constructs an arbitary ClientConfig. Fixed that. 3) AdminClient passes in empty AdminClientConfig and ClientConfig at multiple places. Removed that.	07 January 2016, 02:07:48 UTC
47e62bb	Felix GV	06 January 2016, 06:01:19 UTC	Releasing Voldemort 1.10.9	06 January 2016, 06:01:19 UTC
c6f689e	Felix GV	06 January 2016, 00:45:03 UTC	Fix for a regression introduced in a7061474f905fe60a9581dbf4e14431d00b130a2. Under certain partition assignment configurations, servers would skip the initialization of some partitions, which would then result in NPEs getting thrown during get requests. This issue is now fixed. In addition to that, the following changes are also included in this commit: - More comprehensive tests in ReadOnlyStorageEngineTest to catch the edge case. - Skipped the canGetGoodCompressedKeys() test in ReadOnlyStorageEngineTest since key compression is not supported in Read-Only / Build and Push. - Better error reporting in ChunkedFileSet.getChunkForKey(byte[] key).	06 January 2016, 05:42:53 UTC
fa5bf0e	ARUNACHALAM THIRUPATHI	05 January 2016, 22:51:24 UTC	UnknownHostException is different between JDK 1.6 and JDK 1.8 http://docs.oracle.com/javase/7/docs/api/java/nio/channels/UnresolvedAddressException.html has bad class hiearchy in JDK 1.6 seems like corrected in JDK 1.8 It is not instance of IO or Connect Exceptions. This broke the test, now the test looks for one of the other exception. The intention of the test is to keep the catastrophic errors up to date so it is hacky, but OK.	05 January 2016, 22:51:24 UTC
75fb6c8	Felix GV	17 December 2015, 02:39:45 UTC	Made vadmin.sh more resilient to node failures.	05 January 2016, 19:32:06 UTC
bbb7e99	Arunachalam Thirupathi	05 January 2016, 01:35:43 UTC	Releasing Voldemort 1.10.8	05 January 2016, 01:35:43 UTC
412614d	Arunachalam Thirupathi	23 December 2015, 00:41:05 UTC	BnP job corrupts store when compression mismatches BnP job does not compare compression when making the schema check. One store was created with compression enabled on the first run and the next run removed it. The store silently corrupted the data. Now compression is compared and if compression does not match, it errors out instead of corrupting the data.	05 January 2016, 01:01:00 UTC
445b240	Arunachalam Thirupathi	18 December 2015, 22:27:06 UTC	Disabling Fetch Fails BnP HA When a node is in offline mode, it responds with fetch disabled error. Currently the error is not treated as a soft error and this fails the HA BnP. With this code change, FetchDisabled is considered as a soft error and the fetch will continue as normal.	05 January 2016, 00:58:09 UTC
8159584	Arunachalam Thirupathi	18 December 2015, 02:38:10 UTC	Add File List and Length based check for RO 1) Currently Stores are validated only for the definition and by doing a random get against any of the partition. Now for RO stores, the files are validated by their replication factor. If RF=1, a line will appear that this store consistency can't be validated. Store xyz has replication factor of 1, skipping consistency check across nodes If RF > 1, a file should have exactly the number of copies as RF. If the file lengths does not match, a warning is printed. Store abc File ReadOnlyFile [name=97_0_0.data, size=0] is expected in 2 nodes, but present only in [Node localhost:6666 [id 0]] Store abc File ReadOnlyFile [name=97_1_0.data, size=408351] is expected in 2 nodes, but present only in [Node localhost:6669 [id 1]] If the file is missing the following warning will appear but only once. The error reporting is very noisy, this is to alert for the presence of error. So thrown together quickly without considering the ease of use. Verified backward and forward compatibility of the change.	05 January 2016, 00:51:28 UTC
50f8eed	Arunachalam Thirupathi	16 December 2015, 08:14:46 UTC	Adding file Length to GetROFileListRepsonse This will let build validators for the RO files being fetched.	05 January 2016, 00:49:20 UTC
c4d65a8	Arunachalam Thirupathi	23 December 2015, 23:33:47 UTC	Fix the testNodeDownReplacement intermittent failure SchedulerService is not waiting for the scheduled jobs to shutdown and it proceeds with killing the BDB which causes a cursos exception being thrown. This will fix the test.	23 December 2015, 23:33:47 UTC
cd3b9e0	Arunachalam Thirupathi	23 December 2015, 20:14:54 UTC	Fix RO JMX register/Unregister 1) Stores JMX are registered by JmxService class. 2) Previously ReadOnlyStorageConfiguration registered the same metrics by prepending the NodeId. As Part of the commit https://github.com/voldemort/voldemort/commit/2aec46a1edfbb6ab1eac4529d59ceb13f51e9fec#diff-005b79e324515c9e1045a61d0aee6d07 I fixed it and removed the NodeId. But I did not realize that this caused the name collission and overwriting. [18:50:45,445 voldemort.server.jmx.JmxService] WARN Overwriting mbean voldemort.store.readonly:type=test2 [main] [18:50:45,447 voldemort.server.jmx.JmxService] WARN Overwriting mbean voldemort.store.readonly:type=test1 [main] Now ReadOnlyStorageConfiguration does not register any JMX Metric at all. Used the JConsole to verify that it is the same object registered under two different names. Now the duplicate one is gone and the warnings on the shutdown of Read Only server will be gone as well.	23 December 2015, 20:14:54 UTC
bc38518	Felix GV	16 December 2015, 02:29:11 UTC	Fixed regression introduced in fe5b01d71f85dac7bb431f12c4ccef3d26aaee47.	16 December 2015, 02:29:11 UTC
3dfcc66	Felix GV	16 December 2015, 01:05:13 UTC	Releasing Voldemort 1.10.7	16 December 2015, 01:05:13 UTC
fe5b01d	Felix GV	16 December 2015, 00:26:42 UTC	Read-Only fetches now abort immediately when killed. Also improved vadmin.sh's output format and error handling.	16 December 2015, 00:59:35 UTC
fc97f04	Arunachalam Thirupathi	15 December 2015, 21:15:45 UTC	Enhanced Admin Meta Check 1) When no parameter is passed in, it silently ignores all the checks. Now the default is changed to check all. 2) Random key is generated to probe the store. Previously it always passed in byte 0, which failed with InvalidMetadataException.	15 December 2015, 21:17:13 UTC
5a7ae63	ARUNACHALAM THIRUPATHI	14 December 2015, 18:45:58 UTC	BouncyCastle jar is required even when not enabled BouncyCastle jar is referenced from VoldemortServer and when VoldemortServer class is loaded, all its references are resolved which causes it to fail with ClassLoader error. Added one more level of indirection to avoid the direct reference and hence BouncyCastle is not required to be available in the class path when not enabled.	15 December 2015, 03:13:10 UTC
ca1d52c	Jiaan	08 November 2015, 03:13:19 UTC	add a customized workload that supports key size distribution and value size distribution from external files	15 December 2015, 03:08:41 UTC
a706147	Felix GV	12 December 2015, 01:26:55 UTC	Introduced 'build.primary.replicas.only' mode in BnP. Summary: This new mode provides the capability of pushing to multiple clusters with different number of nodes and different partition assignments. Compatibility: Although this new mode only works if both the BnP job and the Voldemort servers are upgraded, the change can be rolled out gradually without breaking anything. There is a negotiation phase at the beginning of the BnP job which determines if all servers of all clusters are capable and willing (i.e.: configured) of using the new mode. If not all servers are upgraded and enabled, then the BnP job falls back to its old behavior. Likewise, if a server gets a fetch request from a non-upgraded BnP job, it will work just like before. By default, servers answer the negotiation by saying they support the new mode. The old behavior can be forced with the following server-side configuration: readonly.build.primary.replicas.only=false Running in this new mode has several implications: 1. When running in the new mode, store files are stored in the BnP output directory under nested partition directories, rather than in nested node directories. 2. The MR job uses half as many reducers and half as much shuffle bandwidth compared to before. 3. The meta checksum is now done per partition, rather than per node. 4. Instead of having one .metadata file per partition, there is now only a single full-store.metadata file at the root of the output directory. 5. The server-side HdfsFetcher code inspects the metadata file and determines if it should operate in 'build.primary.replicas.only' mode or not. If yes, then the server determines which partitions it needs to fetch on its own, rather than relying on what the BnP job placed in a node-specific directory. 6. The replica type number contained in Read-Only V2 file names is now useless, but we are keeping it in there just to avoid unnecessary changes. 7. When initializing a Read-Only V2 store directory, the server now looks for files named with the incorrect replica type, and if it finds any, it renames them to the replica type expected by this server. Other changes: 1. Added socket port to Node's toString functions. Also made the output of the Node's toString(), briefToString() and getStateString() functions more consistent. 2. Introduced new Protobuf message for the GetConfig admin request. This new message is intended to be a generic way to retrieve any of server config. 3. Refactored VoldemortConfig to provide access to any config by its string key. Also cleaned up a lot of hard-coded strings, which are constants now. 4. Various minor refactorings in BnP code.	15 December 2015, 02:45:06 UTC
69fcd3f	Felix GV	06 November 2015, 02:52:03 UTC	Lots of BnP code clean up. - Refactored common Mapper and Collector code via the new BuildAndPushMapper and AbstractCollectorWrapper classes. - Refactored common 'reducer.per.bucket' code. HadoopStoreWriterPerBucket offers a superset of the HadoopStoreWriter's functionality, so it's not worth keeping the latter at all. - Changed some hard-coded strings to constants. - Deleted the following classes completely, which were either dead code, duplicate code, or useless abstractions: contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/AvroStoreBuilderReducerPerBucket.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/HadoopStoreBuilderReducerPerBucket.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/IdentityJsonMapper.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/IdentityJsonReducer.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/JobState.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/VoldemortStoreBuilderMapper.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/azkaban/VoldemortStoreBuilderJob.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/serialization/JsonConfigurable.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/serialization/JsonDeserializerComparator.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/serialization/JsonMapper.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/serialization/JsonOutputCollector.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/serialization/JsonReducer.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/serialization/JsonSequenceFileOutputFormat.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/utils/EmailMessage.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/utils/KeyValuePartitioner.java contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/utils/MapperKeyValueWriter.java	12 December 2015, 01:25:37 UTC
0e583be	Felix GV	11 December 2015, 17:48:27 UTC	Better error message for an unknown admin operations. Previously, the AdminServiceRequestHandler returned an error message which said "Metadata Key passed '' is not handled yet" whenever attempting to parse VProtoAdmin messages serialized by an AdminClient with more/newer capabilities than the server's.	12 December 2015, 00:24:21 UTC
22be8bc	Arunachalam Thirupathi	09 December 2015, 03:21:01 UTC	Releasing Voldemort 1.10.6	09 December 2015, 03:21:01 UTC
36d3655	Arunachalam Thirupathi	08 December 2015, 22:32:12 UTC	Swap Only on IOException,UnreachableStoreException Currently Swap is attempted for any failure. Now the swap will be attempted only if the failure is an IO or UnreachableStoreException. Other exceptions will cause the push to fail.	08 December 2015, 23:36:33 UTC
33f48ed	ARUNACHALAM THIRUPATHI	08 December 2015, 07:51:08 UTC	Create JMX mbeans for tracking multiple server states Following states are tracked Server 0->normal, 1->offline, 2->rebalancing SlopStreaming 0->enabled,1->disabled PartitionStreaming 0->enabled,1->disabled ReadOnlyFetching 0->enabled,1->disabled QuotaEnforcing 0->enabled,1->disabled	08 December 2015, 07:51:08 UTC
aa11fd1	Yan	08 December 2015, 02:05:18 UTC	Merge pull request #352 from squarY/bouncycastle Let read only server use bouncy castle as JCE provider.	08 December 2015, 02:05:18 UTC
695ae80	Yan Yan(Data Infrastructure)	03 December 2015, 19:42:00 UTC	Let read only server use bouncy castle as JCE provider. Initialize BouncyCastleProvider in VoldemortServer constructor if it's enabled. Fix minor format issue. Fix unessary format changing. Fix unessary format issue. Remove useless parameter.	05 December 2015, 00:17:47 UTC
c927688	Felix GV	04 December 2015, 01:24:36 UTC	BnP logging improvements: - BnP job now emits config properties in one entry per line. - Bumped up some useful logs to INFO level in the HttpHook.	04 December 2015, 19:08:41 UTC
5b09c8e	Yan	25 November 2015, 23:14:16 UTC	Merge pull request #350 from squarY/master Releasing Voldemort 1.10.5	25 November 2015, 23:14:16 UTC
fb23553	Yan Yan(Data Infrastructure)	25 November 2015, 02:39:22 UTC	Releasing Voldemort 1.10.5 Fix typos.	25 November 2015, 23:12:37 UTC
d53d534	Yan	23 November 2015, 22:32:48 UTC	Merge pull request #349 from squarY/sslbnpmerged Squash commits. Let Voldemort node can modify URL before fetching file.	23 November 2015, 22:32:48 UTC
f88adc4	Yan Yan	19 November 2015, 07:47:05 UTC	Add 3 properties in Voldemort configuration. Let voldemort node can turn on or turn off SSL when fetching file from HDFS. Fix the issue when url dose not contain protocl(Eg. local file path), parsing url will cause String index out of range exception. Rollback the format change to original codes. Make modify URL feature more generic and use java.net.URL instead of parsing URL mannually. Move modify URL feature to Utils class. And let VoldemortSwapJob to invode this method to replace url. Let voldemort node modify URL separately before fetching file.	21 November 2015, 01:03:33 UTC
71cc55c	Arunachalam Thirupathi	17 November 2015, 02:57:09 UTC	Update metadata version cluster, stores At times metadata version on a cluster.xml and stores.xml drifts The next metadata update instead of consolidating these versions updates at few places and ignores at few other places. This causes the client to not re-bootstrap correctly when the cluster.xml or stores.xml is changed. Now when a cluster.xml is changed, the version is synchronized across the cluster to let the clients auto rebootstrap. This fix merges the VectorClock on all the nodes to be updated so that the Stores version will be updated correctly. The old methods which does not take nodes as parameters are removed and the public method exposes the nodes as parameters.	18 November 2015, 02:50:45 UTC
b30fdfe	Greg Banks	17 November 2015, 06:27:25 UTC	Merge pull request #345 from gnb/VOLDENG-2171bis Extend shell "preflist" command to show partitions, v2	17 November 2015, 06:27:25 UTC
24c8290	Greg Banks	17 November 2015, 01:07:51 UTC	Extend shell "preflist" command to show partitions, v2 After valuable feedback from athirupthi	17 November 2015, 01:07:51 UTC
64c90eb	Greg Banks	16 November 2015, 22:10:09 UTC	Merge pull request #343 from gnb/VOLDENG-2170 Fix shell "preflist" command key parsing	16 November 2015, 22:10:09 UTC

Newer
Older