Revision history - refs/heads/release-1.9.21 - origin: https://github.com/voldemort/voldemort

visit type:

Revision	Author	Date	Message	Commit Date
4622df9	Felix GV	24 September 2015, 23:19:32 UTC	Releasing Voldemort 1.9.21	24 September 2015, 23:19:32 UTC
26e1f48	Felix GV	24 September 2015, 07:32:28 UTC	New server config to determine HDFS login interval. Default: fetcher.login.interval.ms=-1 (re-login every time)	24 September 2015, 21:59:21 UTC
711cc40	Felix GV	15 September 2015, 01:37:00 UTC	Releasing Voldemort 1.9.20	15 September 2015, 01:40:58 UTC
ebbc44f	James Lent	14 September 2015, 15:22:52 UTC	Fix the RocksDB iteration logic for keys and entries. The problems addressed include: 1) Key and entry iterators failed to strip prefix from keys (if required). 2) RocksdbStorageEngineTest failed to test BdbStorageEngine (all tests). 3) Both Key iterators would miss first entry and then iterate off the end. With these changes I was able to remove the Ignore annotation from 2 tests.	15 September 2015, 00:18:45 UTC
2738b7c	Felix GV	14 September 2015, 22:53:34 UTC	Fixed HdfsFetcher test code and trimmed some fat.	14 September 2015, 22:54:40 UTC
2aec46a	Arunachalam Thirupathi	11 September 2015, 01:59:30 UTC	Avoid duplicate JMX for Store and Pipeline stats Current code creates duplicate JMX for storeClient , pipeline stats when the getStoreClient method is called more than once. This fix avoids the duplicate jmx registration by caching the first creation and registering only once. Read only stroage engine registers the last swapped with node id in the name, this creates multiple counters which clutters the log and tracking across multiple nodes last swap time difficult to visualize.	14 September 2015, 18:35:28 UTC
7975559	Felix GV	11 September 2015, 23:13:05 UTC	BnP HA improvements. - HdfsFailedFetchLock now happens on the server-side. - BnP no longer relies on any specific node (the node.id config parameter is eliminated). - Moved some authentication-related code out of HdfsFetcher and into HadoopUtils.	14 September 2015, 04:22:10 UTC
5c3d158	Felix GV	08 September 2015, 22:33:46 UTC	Graceful recovery from incomplete BnP store creation.	08 September 2015, 22:33:46 UTC
8563062	Felix GV	31 August 2015, 20:59:24 UTC	Added long retries to ServerTestUtils.startVoldemortServer It will now retry for up to 120 attempts, and up to 5 minutes before giving up. This is a workaround for the flaky non-deterministic issues we see in a lot of tests.	02 September 2015, 17:33:25 UTC
4b0fc58	Felix GV	31 August 2015, 20:58:35 UTC	Marked some RocksDB tests with @Ignore. The corresponding functionality is not implemented, so these failures are expected.	31 August 2015, 20:58:35 UTC
54164ff	Felix GV	28 August 2015, 21:23:54 UTC	Changed routing strategy for system stores to "all-routing". Previously, the metadata_version_persistence and store_quotas stores were using the "local-pref-all-routing" strategy which leads to undeterministic behavior.	28 August 2015, 21:23:54 UTC
4280c14	Felix GV	25 August 2015, 00:37:19 UTC	Fixed broken test and marked HTTP Service as deprecated.	25 August 2015, 00:37:19 UTC
9e06265	Felix GV	24 August 2015, 23:22:16 UTC	Added warning messages on the stream admin commands. These commands are not considered production-ready. They are intended for debugging purposes. Also cleaned up a bit of dead code in admin commands.	24 August 2015, 23:22:16 UTC
8997512	James Lent	20 August 2015, 17:44:40 UTC	Add a 'binary' format to the fetch-entries admin command so that it will create a file comaptible with the update-entries command. For consistency sake add this same format to the fetch-keys command.	24 August 2015, 22:38:28 UTC
ef29e5d	Arunachalam Thirupathi	22 August 2015, 01:09:35 UTC	Fork lift corrupts the data on schema mismatch 1) If the source and destination schema does not match, currently forklift corrupts the destination data by streaming in bytes from the source data. After this commit, the forklift will fail when a schema mismatch is detected. The old behavior if required can be achieved by undocumented parameter ignore-schema-mismatch Default of fork lift which forklifts all stores is changed to fail when store name is not specified. I can't imagine a situation where you want to forklift form one cluster to other often. If an admin forgets to specify this parameter, they are forklifting the entire cluster which is not definitely intended default. Added 5 unit tests ( 3 for key mismatch and 2 for value mismatch). Added pretty print functions to Compression and SerializerDefintion.	24 August 2015, 22:35:59 UTC
fce83e7	Arunachalam Thirupathi	22 August 2015, 01:07:25 UTC	Make avro utf8 and bytes readable in shell output 1) fetch-keys and fetch-entries streaming option for utf8 and bytes are not human readable. This is a problem if you want to sample and read them using the shell. 2) voldemort-shell.sh does not output the avro bytes in a readable format. The previous output was some internal state and it does not convey what is the output.	24 August 2015, 22:35:59 UTC
eec454a	Felix GV	20 August 2015, 01:09:14 UTC	Improving debuggability of read-only fetches. - All SchedulerService threads now have a unique name, instead of being all called "java.util.concurrent.ThreadPoolExecutor$Worker". - AsyncOperation instances will now override their current thread's name to provide even more detail about what's running, and then restore the original thread name. - The fetcher loop in the server will report slightly more useful info to the BnP job via the AsyncOperation's status message. - The fetcher loop in the server will also print local logs which are more similar to the BnP-side log, for easier log correlation. - HdfsCopyStats flush lines as they happen, in order to improve debuggability of stalled or abrupbtly interrupted fetches...	24 August 2015, 22:27:48 UTC
fd345f7	Felix GV	15 August 2015, 00:20:02 UTC	Releasing Voldemort 1.9.19	15 August 2015, 00:20:02 UTC
07f0f72	Felix GV	14 August 2015, 22:33:10 UTC	Removed MemLock class and related configs. It turns out our mlock call always failed and we've been running fine without it anyway, so there's no point in keeping that code around.	14 August 2015, 22:33:10 UTC
9e5fe19	Greg Banks	13 August 2015, 14:45:24 UTC	Cleanup formatting of system store schema.	13 August 2015, 14:45:24 UTC
a6294ce	Arunachalam Thirupathi	12 August 2015, 22:26:15 UTC	Revert "Clean Set Quota Fix" This reverts commit 3ff1cbe47a2206834c2cd6b77878dcccb3f0be67. The unit tests are broken. On a deeper look, file backed caching storage engine, does not have a version per key, but at a file level. Any updates to the file with lower version will be rejected. So we have to go with super clocks for now. I am reverting the commit to unblock the release of a new version.	12 August 2015, 22:27:04 UTC
fa0acf6	James Lent	22 July 2015, 19:19:38 UTC	Ensure that all the AbstractStorageEngineTest tests get run by all the subclasses. Add the @Test annotation to several tests. Without that annotation it appears that those subclasses that are parameterized do not run these specific test cases. This is most likely a base gradle issue. Perhaps somewhat related to: https://issues.gradle.org/browse/GRADLE-3112	12 August 2015, 19:03:44 UTC
da0ec6d	Felix GV	12 August 2015, 01:35:40 UTC	Removed HadoopStoreJobRunner and related scripts. This is to avoid confusion. There is no point in running those scripts. - For building and pushing a read-only store, VoldemortBuildAndPushJobRunner and ./bin/run-bnp.sh should be used. If someone wants to just build without pushing, that can be done by passing push=false to the BnP config. - For swapping a store version, it can be done via the vadmin.sh script.	12 August 2015, 17:54:11 UTC
00ad250	Felix GV	12 August 2015, 00:09:29 UTC	Changes to @elad's shadowJar build, to ensure all dependencies are bundled up.	12 August 2015, 17:54:11 UTC
a4bc78c	Elad Efrat	04 August 2015, 13:22:17 UTC	Minimal config for read-only store, with Hadoop (BnP) hints.	12 August 2015, 17:54:11 UTC
3add1aa	Elad Efrat	04 August 2015, 13:16:06 UTC	Print number of partitions, useful for debugging.	12 August 2015, 17:54:11 UTC
e5eee78	Elad Efrat	04 August 2015, 13:12:45 UTC	Add BnP job and script from @FelixGV with slight changes.	12 August 2015, 17:54:11 UTC
c974058	Elad Efrat	04 August 2015, 13:05:25 UTC	Make this work on Hadoop 2.x by shading a few dependencies. Shade avro (to 1.4.0) and protobuf (to 2.3.0) and always include jdom (1.1). Mostly from #274, also see #284.	12 August 2015, 17:54:11 UTC
e3113e3	Felix GV	12 August 2015, 01:56:04 UTC	Addressing stylistic comment.	12 August 2015, 17:41:35 UTC
98cd411	ARUNACHALAM THIRUPATHI	12 August 2015, 01:40:33 UTC	Merge pull request #296 from arunthirupathi/setQuota Clean Set Quota Fix	12 August 2015, 01:40:33 UTC
8c068ac	Felix GV	12 August 2015, 00:46:46 UTC	Allow RO servers to run without Kerberos enabled.	12 August 2015, 00:46:46 UTC
3ff1cbe	Arunachalam Thirupathi	11 August 2015, 19:35:49 UTC	Clean Set Quota Fix Clean up the Set quota to not generate super clocks. Wait for all the nodes to complete, before returning success.	11 August 2015, 19:35:49 UTC
2a0c81e	Greg Banks	06 August 2015, 15:49:56 UTC	Remove unused members and methods from MemLock public setFile() is unused, and also dangerous because there is no point in the lifetime of the object when it can have a useful effect. The descriptor member is not needed after the ctor.	11 August 2015, 18:30:03 UTC
dc91e04	Greg Banks	06 August 2015, 15:41:08 UTC	Factor out some common code in ChunkedFileSet into a new mapAndRememberIndexFile() method.	11 August 2015, 18:30:03 UTC
faa2ca0	Greg Banks	06 August 2015, 15:32:14 UTC	Remove unused mapFile() method in ChunkedFileSet.	11 August 2015, 18:30:03 UTC
8647f82	Greg Banks	06 August 2015, 15:30:11 UTC	Simplify MappedFileReader, close Unix fd early Remove most of the members and make them local variables in the map() function which is the only place that uses them. This simplifies the c'tor and means it cannot throw IOException anymore. It also means we close the Unix file descriptor used to create the mapping much earlier. This will help reduce the load on file descriptors.	11 August 2015, 18:30:03 UTC
1eafb88	Greg Banks	06 August 2015, 07:41:41 UTC	Remove unused fields from MappedFileReader - fadvise was an unused leftover from previous code - offset was used but it's value was always zero, stop pretending otherwise	11 August 2015, 18:30:03 UTC
a4fe60a	Greg Banks	06 August 2015, 07:38:03 UTC	Merge BaseMappedFile into its only subclass. It had precisely one subclass, did not define any semantically meaningful behavior, and nobody used the base class. It will not be missed.	11 August 2015, 18:30:03 UTC
4b74809	Greg Banks	06 August 2015, 06:59:32 UTC	Report mmap/munmap errors using IOException which the signature claims can be thrown but never was. Ensure that the only callchain (MappedFileReader -> MemLock -> mman) will actually handle and log IOException correctly. Also, stop claiming that mlock() and munlock() throw IOException. They never did and it would not be helpful, in fact we want to ignore all errors from those functions because their normal behavior in unprivileged processes is to fail.	11 August 2015, 18:30:03 UTC
9fb2689	Greg Banks	06 August 2015, 05:33:50 UTC	Remove all traces of MAP_ALIGN which doesn't exist on Linux (the binary value we were passing into the kernel was some other harmless option) and even if it did exist it has no effect because we're hardcoding the alignment to 0 which means "let the kernel choose" i.e. the default behavior.	11 August 2015, 18:30:03 UTC
be018e3	Greg Banks	06 August 2015, 05:31:35 UTC	Remove all traces of MAP_LOCKED which we didn't actually use except in some stale comments, and which is not implemented on any current OS anyway.	11 August 2015, 18:30:03 UTC
9202548	singhsiddharth	11 August 2015, 06:22:10 UTC	Merge pull request #295 from FelixGV/fixes_to_DataCleanupJobTest Bump EventThrotller window + fixes to data cleanup job test	11 August 2015, 06:22:10 UTC
d25d169	Felix GV	11 August 2015, 04:51:19 UTC	Upgrade to Hadoop 2.3.0-cdh5.1.5 and Kerberos clean up.	11 August 2015, 05:13:55 UTC
9a48c7e	Felix GV	11 August 2015, 00:51:59 UTC	Added STORAGE_SPACE quota and cleaned up some vadmin stuff.	11 August 2015, 04:40:56 UTC
e65b0e0	Felix GV	11 August 2015, 03:01:12 UTC	Fixed DataCleanupJobTest.	11 August 2015, 03:03:05 UTC
3c6be4b	Felix GV	11 August 2015, 02:59:28 UTC	Bumped up EventThrottler's default window time to 1000 ms.	11 August 2015, 03:02:49 UTC
8d6f06a	singhsiddharth	07 August 2015, 00:11:19 UTC	Remove sleep after deprecated warning	07 August 2015, 00:11:19 UTC
df46fdc	singhsiddharth	29 July 2015, 19:28:58 UTC	Merge pull request #288 from FelixGV/remove_scala_and_ec2_testing Remove scala, ec2 testing and public-lib directory	29 July 2015, 19:28:58 UTC
b38aabe	Felix GV	29 July 2015, 18:46:01 UTC	Removed public-lib as it was only used by the deprecated ant build.	29 July 2015, 18:46:01 UTC
3bfc934	Felix GV	29 July 2015, 18:39:54 UTC	Deleted unused cruft (scala shell and ec2-testing contrib).	29 July 2015, 18:39:54 UTC
105d1ed	Felix GV	27 July 2015, 17:33:45 UTC	Removed a duplicate log in AdminClient.waitForCompletion	27 July 2015, 17:33:45 UTC
dcbd8c9	Felix GV	23 July 2015, 20:35:15 UTC	Improved BnP logging.	23 July 2015, 20:35:15 UTC
3a935aa	Felix GV	20 July 2015, 22:54:37 UTC	Releasing Voldemort 1.9.18	20 July 2015, 22:54:37 UTC
724596d	Felix GV	20 July 2015, 19:43:28 UTC	Voldemort BnP pushes to all colos in parallel. Also contains many logging improvements to discriminate between hosts and clusters.	20 July 2015, 19:43:28 UTC
9c61ada	Felix GV	14 July 2015, 20:55:01 UTC	Rewrite of the EventThrottler code to use Tehuti. - Makes throttling less vulnerable to spiky traffic sneaking in "between the interval". - Also fixes throttling for the HdfsFetcher when compression is enabled.	16 July 2015, 01:18:31 UTC
82f80b6	Arunachalam Thirupathi	10 July 2015, 20:25:23 UTC	Fix the white space changes The previous refactor was done from my mac book which did not replace the tabs with spaces. This messed up lot of the editing. Instead of re-doing the change with spaces, I just formatted the code which is easier and no re-verification is required. you can review teh commit by adding ?w=1 on github url or use the git diff -w if you are using the command line to ignore the whitespaces and there are not many changes.	10 July 2015, 20:25:23 UTC
168cb69	ARUNACHALAM THIRUPATHI	28 June 2015, 05:16:29 UTC	Pass in additional parameters to fetch 1) Currently the AsyncOperationStatus is set for HdfsFetcher if 2 or more fetches are going on, this would produce erroneous results. 2) Add StoreName, Version, Metadatastore for use in future fetches. 3) Enabled the Hadoop* Tests, don't know why they were not run in ant tests. When I ported them for parity reasons I disabled them too, but now enabling it as the test seems valid. 4) made the fetch throw IOException instead of throwable, which seems less reliable and catching more than it is intended.	01 July 2015, 06:41:08 UTC
d70ed85	ARUNACHALAM THIRUPATHI	28 June 2015, 03:23:40 UTC	Refactor file fetcher to Strategy Interface/class Refactor the file fetcher to Strategy Interface and class In the future this lets you modify the file fetching strategey like having BuildAndPush build only one copy for partition, chunk and the fetcher can fetch them under different names. There is no logic change, just the code is refactored.	01 July 2015, 06:41:08 UTC
6a42f59	Felix GV	01 July 2015, 00:04:33 UTC	Improved path handling and validation in VoldemortSwapJob	01 July 2015, 01:04:12 UTC
aa51b0b	ARUNACHALAM THIRUPATHI	30 June 2015, 21:47:13 UTC	Merge pull request #271 from dallasmarlow/coordinator-class Thanks for the fix @dallasmarlow update coordinator class name in server script	30 June 2015, 21:47:13 UTC
a45fc83	Felix GV	30 June 2015, 21:06:29 UTC	Fixed voldemort.cluster.ClusterTest	30 June 2015, 21:06:29 UTC
c2db8fd	Felix GV	30 June 2015, 18:11:45 UTC	First-cut implementation of Build and Push High Availability. This commit introduces a limited form of HA for BnP. The new functionality is disabled by default and can be enabled via the following server-side configurations, all of which are necessary: push.ha.enabled=true push.ha.cluster.id=<some arbitrary name which is unique per physical cluster> push.ha.lock.path=<some arbitrary HDFS path used for shared state> push.ha.lock.implementation=voldemort.store.readonly.swapper.HdfsFailedFetchLock push.ha.max.node.failure=1 The Build and Push job will interrogate each cluster it pushes to and honor each clusters' individual settings (i.e.: one can enable HA on one cluster at a time, if desired). However, even if the server settings enable HA, this should be considered a best effort behavior, since some BnP users may be running older versions of BnP which will not honor HA settings. Furthermore, up-to-date BnP users can also set the following config to disable HA, regardless of server-side settings: push.ha.enabled=false Below is a description of the behavior of BnP HA, when enabled. When a Voldemort server fails to do some fetch(es), the BnP job attempts to acquire a lock by moving a file into a shared directory in HDFS. Once the lock is acquired, it will check the state in HDFS to see if any nodes have already been marked as disabled by other BnP jobs. It then determines if the Voldemort node(s) which failed the current BnP job would bring the total number of unique failed nodes above the configured maximum, with the following outcome in each case: - If the total number of failed nodes is equal or lower than the max allowed, then metadata is added to HDFS to mark the store/version currently being pushed as disabled on the problematic node. Afterwards, if the Voldemort server that failed the fetch is still online, it will be asked to go in offline node (this is best effort, as the server could be down). Finally, BnP proceeds with swapping the new data set version on, as if all nodes had fetched properly. - If, on the other hand, the total number of unique failed nodes is above the configured max, then the BnP job will fail and the nodes that succeeded the fetch will be asked to delete the new data, just like before. In either case, BnP will then release the shared lock by moving the lock file outside of the lock directory, so that other BnP instances can go through the same process one at a time, in a globally coordinated (mutually exclusive) fashion. All HA-related HDFS operations are retried every 10 seconds up to 90 times (thus for a total of 15 minutes). These are configurable in the BnP job via push.ha.lock.hdfs.timeout and push.ha.lock.hdfs.retries respectively. When a Voldemort server is in offline mode, in order for BnP to continue working properly, the BnP jobs must be configured so that push.cluster points to the admin port, not the socket port. Configured in this way, transient HDFS issues may lead to the Voldemort server being put in offline mode, but wouldn't prevent future pushes from populating the newer data organically. External systems can be notified of the occurrences of the BnP HA code getting triggered via two new BuildAndPushStatus passed to the custom BuildAndPushHooks registered with the job: SWAPPED (when things work normally) and SWAPPED_WITH_FAILURES (when a swap occurred despite some failed Voldemort node(s)). BnP jobs that failed because the maximum number of failed Voldemort nodes would have been exceeded still fail normally and trigger the FAILED hook. Future work: - Auro-recovery: Transitioning the server from offline to online mode, as well as cleaning up the shared metadata in HDFS, is not handled automatically as part of this commit (which is the main reason why BnP HA should not be enabled by default). The recovery process currently needs to be handled manually, though it could be automated (at least for the common cases) as part of future work. - Support non-HDFS based locking mechanisms: the HdfsFailedFetchLock is an implementation of a new FailedFetchLock interface, which can serve as the basis for other distributed state/locking mechanisms (such as Zookeeper, or a native Voldemort-based solution). Unrelated minor fixes and clean ups included in this commit: - Cleaned up some dead code. - Cleaned up abusive admin client instantiations in BnP. - Cleaned up the closing of resources at the end of the BnP job. - Fixed a NPE in the ReadOnlyStorageEngine. - Fixed a broken sanity check in Cluster.getNumberOfTags(). - Improved some server-side logging statements. - Fixed exception type thrown in ConfigurationStorageEngine's and FileBackedCachingStorageEngine's getCapability().	30 June 2015, 18:11:45 UTC
050ec92	ARUNACHALAM THIRUPATHI	29 June 2015, 21:18:22 UTC	Merge pull request #273 from bitti/master @bitti thanks for the fix, merged it in. Fix SecurityException when running HadoopStoreJobRunner in an oozie java action	29 June 2015, 21:18:22 UTC
fb9cab6	David Ongaro	17 June 2015, 16:24:58 UTC	Fix SecurityException when running HadoopStoreJobRunner in oozie	17 June 2015, 16:24:58 UTC
f3801cf	Arunachalam Thirupathi	12 June 2015, 23:48:22 UTC	Releasing voldemort 1.9.17	12 June 2015, 23:48:22 UTC
88fcf8d	Arunachalam Thirupathi	31 May 2015, 08:51:47 UTC	ConnectionException is not catastrophic 1) If a connection timesout or fails during protocol negotiation, they are treated as normal errors instead of catastrophic errors. Connection timeout was a regression from NIO connect fix. Protocol negotiation timeout is a new change to detect the failed servers faster. 2) When a node is marked down, the outstanding queued requests are not failed and let them go through the connection creation cycle. When there is no outstanding requests they can wait infinitely until the next request comes up. 3) UnreachableStoreException is sometimes double wrapped. This causes the catastrophic errors to be not detected accurately. Created an utility method, when you are not sure if the thrown exception could be UnreachableStoreException use this method, which handles this case correctly. 4) In non-blocking connect if the DNS does not resolve the Java throws UnresolvedAddressException instead of UnknownHostException. Probably an issue in java. Also UnresolvedAddressException is not derived from IOException but from IllegalArgumentException which is weird. Fixed the code to handle this. 5) Tuned the remembered exceptions timeout to twice the connection timeout. Previously it was hardcoded to 3 seconds, which was too aggressive when the connection for some use cases where set to more than 5 seconds. Added unit tests to verify all the above cases.	12 June 2015, 23:23:22 UTC
2b95f0d	Dallas Marlow	12 June 2015, 16:37:50 UTC	update coordinator class name in server script	12 June 2015, 16:37:50 UTC
d65f7db	Felix GV	09 June 2015, 13:46:34 UTC	Releasing Voldemort 1.9.16	09 June 2015, 13:46:34 UTC
c574a37	Felix GV	09 June 2015, 13:41:07 UTC	Standardized recent release_notes formatting.	09 June 2015, 13:41:07 UTC
97d8694	Felix GV	09 June 2015, 01:34:42 UTC	Some more AvroUtils and BnP clean ups.	09 June 2015, 13:33:27 UTC
e13f6a2	Greg Banks	07 June 2015, 20:20:21 UTC	Fix error reporting in AvroUtils.getSchemaFromPath() - report errors with an exception - report errors exactly once - provide the failing pathname - don't generate spurious cascading NPE failures	09 June 2015, 01:56:31 UTC
037a0dc	ARUNACHALAM THIRUPATHI	08 June 2015, 23:01:48 UTC	Merge pull request #269 from FelixGV/VoldemortConfig_bug Fixed VoldemortConfig bug introduced in 3692fa3.	08 June 2015, 23:01:48 UTC
c7e6cec	Felix GV	08 June 2015, 22:38:57 UTC	Fixed VoldemortConfig bug introduced in 3692fa3f493acf717b1431d624af4c997df4f2fd.	08 June 2015, 22:38:57 UTC
5f0cd8b	ARUNACHALAM THIRUPATHI	06 June 2015, 00:28:12 UTC	Merge pull request #265 from gnb/VOLDENG-1912 Unregister the "-streaming-stats" mbean correctly	06 June 2015, 00:28:12 UTC
924c72f	Greg Banks	06 June 2015, 00:17:01 UTC	Unregister the "-streaming-stats" mbean correctly This avoids littering up the logs with JMX exceptions like this 2015/06/04 23:55:58.105 ERROR [JmxUtils] [voldemort-admin-server-t21] [voldemort] [] Error unregistering mbean javax.management.InstanceNotFoundException: voldemort.server.StoreRepository:type=cmp_comparative_insights at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) at voldemort.utils.JmxUtils.unregisterMbean(JmxUtils.java:348) at voldemort.server.StoreRepository.removeStorageEngine(StoreRepository.java:187) at voldemort.server.storage.StorageService.removeEngine(StorageService.java:749) at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleDeleteStore(AdminServiceRequestHandler.java:1487) at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleRequest(AdminServiceRequestHandler.java:238) at voldemort.server.niosocket.AsyncRequestHandler.read(AsyncRequestHandler.java:190) at voldemort.common.nio.SelectorManagerWorker.run(SelectorManagerWorker.java:105) at voldemort.common.nio.SelectorManager.run(SelectorManager.java:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)	06 June 2015, 00:22:19 UTC
e63bc53	ARUNACHALAM THIRUPATHI	06 June 2015, 00:07:15 UTC	Releasing Voldemort build 1.9.15	06 June 2015, 00:07:15 UTC
139e441	Arunachalam Thirupathi	05 June 2015, 23:30:17 UTC	Fix Log message HdfsFile does not have toString method which causes object id to be printed in the log message, it broke the script we had for collecting the download speed. Although speed can be calculated better now using the stats file, but that is a separate project. Added number of directories being downloaded, files in addition to size. This will help to track some more details, as the files if not exist, dummy files are created in place. Renamed HDFSFetcherAdvancedTest to HdfsFetcherAdvancedTest to keep it in sync with other naming conventions.	05 June 2015, 23:58:49 UTC
1592db0	ARUNACHALAM THIRUPATHI	04 June 2015, 18:49:05 UTC	Merge pull request #263 from FelixGV/hung_async_task_mitigation Added SO_TIMEOUT config (default 30 mins) in ConfigurableSocketFactory. Looks good.	04 June 2015, 18:49:05 UTC
3692fa3	Felix GV	03 June 2015, 17:42:52 UTC	Added SO_TIMEOUT config (default 30 mins) in ConfigurableSocketFactory and VoldemortConfig. Added logging to detect hung async jobs in AdminClient.waitForCompletion	04 June 2015, 18:24:00 UTC
13a4b81	ARUNACHALAM THIRUPATHI	31 May 2015, 16:32:33 UTC	HdfsCopyStatsTest fails intermittently The OS returns the expected files in random order. Use set instead of list.	31 May 2015, 16:32:33 UTC
b8d9525	Arunachalam Thirupathi	27 May 2015, 22:50:09 UTC	Add more testing for Serialization. Added more testing for Serialization. I was doing some tests on what is the expected input for the serializers and expected output. I thought it will be a good idea instead of just documenting, if i can write unit tests to validate them. Most of them have very poor testing, so decided to add the unit tests. I will add more testing as I start working more on the expected input/output.	27 May 2015, 22:50:09 UTC
b540533	Arunachalam Thirupathi	22 May 2015, 16:49:55 UTC	Release 1.9.14 Release version 1.9.14	22 May 2015, 16:49:55 UTC
df12409	Arunachalam Thirupathi	18 May 2015, 18:14:36 UTC	RO Hdfs fetcher allocates too much memory 1) Hdfs Fetcher in 1.0.4 uses ByteRangeInputStream. This class does not override the method read(byte[], int , int). So it defaults to this method from InputStream, which reads a character at a time from the input stream. HttpInputStream for this method creates byte arrays for each read. So if you are download 2 TB data, the server will allocate/free 2 TB data before the data is downloaded. This creates too much garbage and new gen gets full in few milliseconds and GC happens. Though GC are fast, this too much GC causes the latency to spike and causes JVM to run out of Memory. 2) http://svn.apache.org/viewvc?view=revision&revision=1330500 fixed this issue on April 2012 rather knowingly/unknowingly. I tried upgrading to Hadoop latest but it brings in ProtoBuf 2.5.0 and Avro 1.7. When I disabled the dependencies it failed at runtime expecting protobuf 2.5.0 . I enabled only protobuf and it has no runtime dependency on Avro 1.7. But I am saving that fix for a later day. The branch is hadoop_Version_Upgrade which uses Hadoop 2.6.0 and ProtoBuf 2.6.1	18 May 2015, 18:49:17 UTC
e2d845c	ARUNACHALAM THIRUPATHI	14 May 2015, 17:24:42 UTC	Output stats file for RO files download .stats directory will be created and will contain last X (default: 50) stats file. If a version-X is fetched a file with the same name as this directory name will contain the stats for this download. The stats file will contain the individual file name, time it took to download and few other information. Added unit tests for the HdfsCopyStatsTest	18 May 2015, 18:49:17 UTC
b5db5ed	Xu Ha	13 May 2015, 20:49:12 UTC	fix slop pusher unit test	15 May 2015, 21:22:29 UTC
45cce9e	Xu Ha	13 May 2015, 22:54:19 UTC	fix store-delete command	13 May 2015, 22:54:19 UTC
705b6ff	Xu Ha	13 May 2015, 17:34:57 UTC	add admin command for meta get-ro and add test config for readonly-two-nodes-cluster	13 May 2015, 20:49:30 UTC
eabb057	Xu Ha	23 January 2015, 21:52:59 UTC	Add Admin API to list/stop/enable scheduled jobs	13 May 2015, 17:48:54 UTC
20f1037	Xu Ha	12 May 2015, 17:52:55 UTC	add storeops.delete and deleteQuotaForNode, fix vector clock for setQuotaForNode	12 May 2015, 21:39:11 UTC
b4fa1cb	ARUNACHALAM THIRUPATHI	11 May 2015, 18:06:21 UTC	Refactor HdfsFetcher 1) Created directory and File class to help me in the future. 2) Cleaned up some code to make for easier readability.	12 May 2015, 17:54:57 UTC
3378d6c	Arunachalam Thirupathi	11 May 2015, 22:30:00 UTC	Code compiled on Java8 fails to run on Java6 Ever witnessed Exception in thread "main" java.lang.NoSuchMethodError: java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView; at voldemort.store.metadata.MetadataStore.updateRoutingStrategies(MetadataStore.java:855) at voldemort.store.metadata.MetadataStore.init(MetadataStore.java:1189) This is because of the issue documented here https://gist.github.com/AlainODea/1375759b8720a3f9f094	11 May 2015, 22:30:00 UTC
20455c7	Arunachalam Thirupathi	11 May 2015, 21:45:23 UTC	Releasing Voldemort 1.9.13	11 May 2015, 21:45:23 UTC
b4674b5	Arunachalam Thirupathi	11 May 2015, 21:09:02 UTC	Suppress obsoleteVersionException on logs During the refactoring of the server buffers, all errors from the stroage engine are logged. Previous code does not log any errors on writes. I looked at the exception stack and could not see other errors that need to be suppressed. Verified that ProtocolBuffer does not log any error, so only Voldemort Native request handler is affected.	11 May 2015, 21:09:02 UTC
1c8e0d4	ARUNACHALAM THIRUPATHI	07 August 2014, 07:50:57 UTC	NIO style connect Problems : 1) Connect blocks the selector. This causes other operations (read/write ) queued on the selector to incur additional latency or timeout. This is worse when you have data centers that are far away. 2) ProtocolNegotiation request is done after the connection establishment which blocks the selector in the same manner. 3) If Exceptions are encountered while getting connections from the queue they are ignored. Solutions : The connection creation is async. Create method is modified to createAsync and it takes in the pool object. for NIO the createAsync triggers an async operation which checks in the connection when it is ready. For Blocking connections the createAsync blocks, creates the connection and checks in the connection to the pool before returning. As the connection creation is async now, exceptions are remembered (for 5 seconds ) in the pool. When some thread asks for a connection and if the exceptions are remembered they will get an exception. There is no ordering in the way connections are handed out, one thread can request a connection and before it could wait, other thread could steal this connection. This is avoided to a certain extent by instead of doing one blocking wait, the thread splits the blocking wait in 2 half and creates connection if required. This should not be a problem in the real world as when you reach steady state ( create required number of connections) this can't happen. Upgrade the source compatibility from java 5 to 6. Most of the code is written with the assumption of Java 6, I don't believe you can run this code on Java 5. So the impact should be minimal, but if it goes in Client V2 branch, it will get benefit of additional testing.	06 May 2015, 22:06:42 UTC
5c03ea6	Bhavani Sudha Saktheeswaran	01 May 2015, 22:21:44 UTC	Releasing Voldemort 1.9.12	01 May 2015, 22:27:47 UTC
706a1f3	Bhavani Sudha Saktheeswaran	01 May 2015, 01:54:19 UTC	Add more tests and fix buffer size of GZIP Streams	01 May 2015, 21:09:59 UTC
495a234	ARUNACHALAM THIRUPATHI	30 April 2015, 18:18:37 UTC	Merge pull request #256 from FelixGV/disable_ant_build Fully disabled the Ant build in favor of the Gradle one. Though the docs task is not yet ported to gradle, we can always fetch the build.xml from an older trunk and generate the docs. Given the amount of confusion it causes, I will merge this change in.	30 April 2015, 18:18:37 UTC
d310bf2	Felix GV	30 April 2015, 18:11:43 UTC	Fully disabled the Ant build in favor of the Gradle one.	30 April 2015, 18:11:43 UTC
8addbf7	Arunachalam Thirupathi	30 April 2015, 17:44:09 UTC	Fix the Readme to use Gradle Remove the ant and fix the readme to use Gradle.	30 April 2015, 17:44:09 UTC
c01f26e	Arunachalam Thirupathi	27 April 2015, 21:42:36 UTC	Rebalance unit tests fail intermittently There are 2 issues. 1) Put is asynchronous, so there needs to be wait time before the put is verified on all the nodes. 2) Repeated puts need to generate different vector clocks.	27 April 2015, 22:01:41 UTC

Newer
Older