fd345f7 | Felix GV | 15 August 2015, 00:20:02 UTC | Releasing Voldemort 1.9.19 | 15 August 2015, 00:20:02 UTC |
07f0f72 | Felix GV | 14 August 2015, 22:33:10 UTC | Removed MemLock class and related configs. It turns out our mlock call always failed and we've been running fine without it anyway, so there's no point in keeping that code around. | 14 August 2015, 22:33:10 UTC |
9e5fe19 | Greg Banks | 13 August 2015, 14:45:24 UTC | Cleanup formatting of system store schema. | 13 August 2015, 14:45:24 UTC |
a6294ce | Arunachalam Thirupathi | 12 August 2015, 22:26:15 UTC | Revert "Clean Set Quota Fix" This reverts commit 3ff1cbe47a2206834c2cd6b77878dcccb3f0be67. The unit tests are broken. On a deeper look, file backed caching storage engine, does not have a version per key, but at a file level. Any updates to the file with lower version will be rejected. So we have to go with super clocks for now. I am reverting the commit to unblock the release of a new version. | 12 August 2015, 22:27:04 UTC |
fa0acf6 | James Lent | 22 July 2015, 19:19:38 UTC | Ensure that all the AbstractStorageEngineTest tests get run by all the subclasses. Add the @Test annotation to several tests. Without that annotation it appears that those subclasses that are parameterized do not run these specific test cases. This is most likely a base gradle issue. Perhaps somewhat related to: https://issues.gradle.org/browse/GRADLE-3112 | 12 August 2015, 19:03:44 UTC |
da0ec6d | Felix GV | 12 August 2015, 01:35:40 UTC | Removed HadoopStoreJobRunner and related scripts. This is to avoid confusion. There is no point in running those scripts. - For building and pushing a read-only store, VoldemortBuildAndPushJobRunner and ./bin/run-bnp.sh should be used. If someone wants to just build without pushing, that can be done by passing push=false to the BnP config. - For swapping a store version, it can be done via the vadmin.sh script. | 12 August 2015, 17:54:11 UTC |
00ad250 | Felix GV | 12 August 2015, 00:09:29 UTC | Changes to @elad's shadowJar build, to ensure all dependencies are bundled up. | 12 August 2015, 17:54:11 UTC |
a4bc78c | Elad Efrat | 04 August 2015, 13:22:17 UTC | Minimal config for read-only store, with Hadoop (BnP) hints. | 12 August 2015, 17:54:11 UTC |
3add1aa | Elad Efrat | 04 August 2015, 13:16:06 UTC | Print number of partitions, useful for debugging. | 12 August 2015, 17:54:11 UTC |
e5eee78 | Elad Efrat | 04 August 2015, 13:12:45 UTC | Add BnP job and script from @FelixGV with slight changes. | 12 August 2015, 17:54:11 UTC |
c974058 | Elad Efrat | 04 August 2015, 13:05:25 UTC | Make this work on Hadoop 2.x by shading a few dependencies. Shade avro (to 1.4.0) and protobuf (to 2.3.0) and always include jdom (1.1). Mostly from #274, also see #284. | 12 August 2015, 17:54:11 UTC |
e3113e3 | Felix GV | 12 August 2015, 01:56:04 UTC | Addressing stylistic comment. | 12 August 2015, 17:41:35 UTC |
98cd411 | ARUNACHALAM THIRUPATHI | 12 August 2015, 01:40:33 UTC | Merge pull request #296 from arunthirupathi/setQuota Clean Set Quota Fix | 12 August 2015, 01:40:33 UTC |
8c068ac | Felix GV | 12 August 2015, 00:46:46 UTC | Allow RO servers to run without Kerberos enabled. | 12 August 2015, 00:46:46 UTC |
3ff1cbe | Arunachalam Thirupathi | 11 August 2015, 19:35:49 UTC | Clean Set Quota Fix Clean up the Set quota to not generate super clocks. Wait for all the nodes to complete, before returning success. | 11 August 2015, 19:35:49 UTC |
2a0c81e | Greg Banks | 06 August 2015, 15:49:56 UTC | Remove unused members and methods from MemLock public setFile() is unused, and also dangerous because there is no point in the lifetime of the object when it can have a useful effect. The descriptor member is not needed after the ctor. | 11 August 2015, 18:30:03 UTC |
dc91e04 | Greg Banks | 06 August 2015, 15:41:08 UTC | Factor out some common code in ChunkedFileSet into a new mapAndRememberIndexFile() method. | 11 August 2015, 18:30:03 UTC |
faa2ca0 | Greg Banks | 06 August 2015, 15:32:14 UTC | Remove unused mapFile() method in ChunkedFileSet. | 11 August 2015, 18:30:03 UTC |
8647f82 | Greg Banks | 06 August 2015, 15:30:11 UTC | Simplify MappedFileReader, close Unix fd early Remove most of the members and make them local variables in the map() function which is the only place that uses them. This simplifies the c'tor and means it cannot throw IOException anymore. It also means we close the Unix file descriptor used to create the mapping much earlier. This will help reduce the load on file descriptors. | 11 August 2015, 18:30:03 UTC |
1eafb88 | Greg Banks | 06 August 2015, 07:41:41 UTC | Remove unused fields from MappedFileReader - fadvise was an unused leftover from previous code - offset was used but it's value was always zero, stop pretending otherwise | 11 August 2015, 18:30:03 UTC |
a4fe60a | Greg Banks | 06 August 2015, 07:38:03 UTC | Merge BaseMappedFile into its only subclass. It had precisely one subclass, did not define any semantically meaningful behavior, and nobody used the base class. It will not be missed. | 11 August 2015, 18:30:03 UTC |
4b74809 | Greg Banks | 06 August 2015, 06:59:32 UTC | Report mmap/munmap errors using IOException which the signature claims can be thrown but never was. Ensure that the only callchain (MappedFileReader -> MemLock -> mman) will actually handle and log IOException correctly. Also, stop claiming that mlock() and munlock() throw IOException. They never did and it would not be helpful, in fact we want to ignore all errors from those functions because their normal behavior in unprivileged processes is to fail. | 11 August 2015, 18:30:03 UTC |
9fb2689 | Greg Banks | 06 August 2015, 05:33:50 UTC | Remove all traces of MAP_ALIGN which doesn't exist on Linux (the binary value we were passing into the kernel was some other harmless option) and even if it did exist it has no effect because we're hardcoding the alignment to 0 which means "let the kernel choose" i.e. the default behavior. | 11 August 2015, 18:30:03 UTC |
be018e3 | Greg Banks | 06 August 2015, 05:31:35 UTC | Remove all traces of MAP_LOCKED which we didn't actually use except in some stale comments, and which is not implemented on any current OS anyway. | 11 August 2015, 18:30:03 UTC |
9202548 | singhsiddharth | 11 August 2015, 06:22:10 UTC | Merge pull request #295 from FelixGV/fixes_to_DataCleanupJobTest Bump EventThrotller window + fixes to data cleanup job test | 11 August 2015, 06:22:10 UTC |
d25d169 | Felix GV | 11 August 2015, 04:51:19 UTC | Upgrade to Hadoop 2.3.0-cdh5.1.5 and Kerberos clean up. | 11 August 2015, 05:13:55 UTC |
9a48c7e | Felix GV | 11 August 2015, 00:51:59 UTC | Added STORAGE_SPACE quota and cleaned up some vadmin stuff. | 11 August 2015, 04:40:56 UTC |
e65b0e0 | Felix GV | 11 August 2015, 03:01:12 UTC | Fixed DataCleanupJobTest. | 11 August 2015, 03:03:05 UTC |
3c6be4b | Felix GV | 11 August 2015, 02:59:28 UTC | Bumped up EventThrottler's default window time to 1000 ms. | 11 August 2015, 03:02:49 UTC |
8d6f06a | singhsiddharth | 07 August 2015, 00:11:19 UTC | Remove sleep after deprecated warning | 07 August 2015, 00:11:19 UTC |
df46fdc | singhsiddharth | 29 July 2015, 19:28:58 UTC | Merge pull request #288 from FelixGV/remove_scala_and_ec2_testing Remove scala, ec2 testing and public-lib directory | 29 July 2015, 19:28:58 UTC |
b38aabe | Felix GV | 29 July 2015, 18:46:01 UTC | Removed public-lib as it was only used by the deprecated ant build. | 29 July 2015, 18:46:01 UTC |
3bfc934 | Felix GV | 29 July 2015, 18:39:54 UTC | Deleted unused cruft (scala shell and ec2-testing contrib). | 29 July 2015, 18:39:54 UTC |
105d1ed | Felix GV | 27 July 2015, 17:33:45 UTC | Removed a duplicate log in AdminClient.waitForCompletion | 27 July 2015, 17:33:45 UTC |
dcbd8c9 | Felix GV | 23 July 2015, 20:35:15 UTC | Improved BnP logging. | 23 July 2015, 20:35:15 UTC |
3a935aa | Felix GV | 20 July 2015, 22:54:37 UTC | Releasing Voldemort 1.9.18 | 20 July 2015, 22:54:37 UTC |
724596d | Felix GV | 20 July 2015, 19:43:28 UTC | Voldemort BnP pushes to all colos in parallel. Also contains many logging improvements to discriminate between hosts and clusters. | 20 July 2015, 19:43:28 UTC |
9c61ada | Felix GV | 14 July 2015, 20:55:01 UTC | Rewrite of the EventThrottler code to use Tehuti. - Makes throttling less vulnerable to spiky traffic sneaking in "between the interval". - Also fixes throttling for the HdfsFetcher when compression is enabled. | 16 July 2015, 01:18:31 UTC |
82f80b6 | Arunachalam Thirupathi | 10 July 2015, 20:25:23 UTC | Fix the white space changes The previous refactor was done from my mac book which did not replace the tabs with spaces. This messed up lot of the editing. Instead of re-doing the change with spaces, I just formatted the code which is easier and no re-verification is required. you can review teh commit by adding ?w=1 on github url or use the git diff -w if you are using the command line to ignore the whitespaces and there are not many changes. | 10 July 2015, 20:25:23 UTC |
168cb69 | ARUNACHALAM THIRUPATHI | 28 June 2015, 05:16:29 UTC | Pass in additional parameters to fetch 1) Currently the AsyncOperationStatus is set for HdfsFetcher if 2 or more fetches are going on, this would produce erroneous results. 2) Add StoreName, Version, Metadatastore for use in future fetches. 3) Enabled the Hadoop* Tests, don't know why they were not run in ant tests. When I ported them for parity reasons I disabled them too, but now enabling it as the test seems valid. 4) made the fetch throw IOException instead of throwable, which seems less reliable and catching more than it is intended. | 01 July 2015, 06:41:08 UTC |
d70ed85 | ARUNACHALAM THIRUPATHI | 28 June 2015, 03:23:40 UTC | Refactor file fetcher to Strategy Interface/class Refactor the file fetcher to Strategy Interface and class In the future this lets you modify the file fetching strategey like having BuildAndPush build only one copy for partition, chunk and the fetcher can fetch them under different names. There is no logic change, just the code is refactored. | 01 July 2015, 06:41:08 UTC |
6a42f59 | Felix GV | 01 July 2015, 00:04:33 UTC | Improved path handling and validation in VoldemortSwapJob | 01 July 2015, 01:04:12 UTC |
aa51b0b | ARUNACHALAM THIRUPATHI | 30 June 2015, 21:47:13 UTC | Merge pull request #271 from dallasmarlow/coordinator-class Thanks for the fix @dallasmarlow update coordinator class name in server script | 30 June 2015, 21:47:13 UTC |
a45fc83 | Felix GV | 30 June 2015, 21:06:29 UTC | Fixed voldemort.cluster.ClusterTest | 30 June 2015, 21:06:29 UTC |
c2db8fd | Felix GV | 30 June 2015, 18:11:45 UTC | First-cut implementation of Build and Push High Availability. This commit introduces a limited form of HA for BnP. The new functionality is disabled by default and can be enabled via the following server-side configurations, all of which are necessary: push.ha.enabled=true push.ha.cluster.id=<some arbitrary name which is unique per physical cluster> push.ha.lock.path=<some arbitrary HDFS path used for shared state> push.ha.lock.implementation=voldemort.store.readonly.swapper.HdfsFailedFetchLock push.ha.max.node.failure=1 The Build and Push job will interrogate each cluster it pushes to and honor each clusters' individual settings (i.e.: one can enable HA on one cluster at a time, if desired). However, even if the server settings enable HA, this should be considered a best effort behavior, since some BnP users may be running older versions of BnP which will not honor HA settings. Furthermore, up-to-date BnP users can also set the following config to disable HA, regardless of server-side settings: push.ha.enabled=false Below is a description of the behavior of BnP HA, when enabled. When a Voldemort server fails to do some fetch(es), the BnP job attempts to acquire a lock by moving a file into a shared directory in HDFS. Once the lock is acquired, it will check the state in HDFS to see if any nodes have already been marked as disabled by other BnP jobs. It then determines if the Voldemort node(s) which failed the current BnP job would bring the total number of unique failed nodes above the configured maximum, with the following outcome in each case: - If the total number of failed nodes is equal or lower than the max allowed, then metadata is added to HDFS to mark the store/version currently being pushed as disabled on the problematic node. Afterwards, if the Voldemort server that failed the fetch is still online, it will be asked to go in offline node (this is best effort, as the server could be down). Finally, BnP proceeds with swapping the new data set version on, as if all nodes had fetched properly. - If, on the other hand, the total number of unique failed nodes is above the configured max, then the BnP job will fail and the nodes that succeeded the fetch will be asked to delete the new data, just like before. In either case, BnP will then release the shared lock by moving the lock file outside of the lock directory, so that other BnP instances can go through the same process one at a time, in a globally coordinated (mutually exclusive) fashion. All HA-related HDFS operations are retried every 10 seconds up to 90 times (thus for a total of 15 minutes). These are configurable in the BnP job via push.ha.lock.hdfs.timeout and push.ha.lock.hdfs.retries respectively. When a Voldemort server is in offline mode, in order for BnP to continue working properly, the BnP jobs must be configured so that push.cluster points to the admin port, not the socket port. Configured in this way, transient HDFS issues may lead to the Voldemort server being put in offline mode, but wouldn't prevent future pushes from populating the newer data organically. External systems can be notified of the occurrences of the BnP HA code getting triggered via two new BuildAndPushStatus passed to the custom BuildAndPushHooks registered with the job: SWAPPED (when things work normally) and SWAPPED_WITH_FAILURES (when a swap occurred despite some failed Voldemort node(s)). BnP jobs that failed because the maximum number of failed Voldemort nodes would have been exceeded still fail normally and trigger the FAILED hook. Future work: - Auro-recovery: Transitioning the server from offline to online mode, as well as cleaning up the shared metadata in HDFS, is not handled automatically as part of this commit (which is the main reason why BnP HA should not be enabled by default). The recovery process currently needs to be handled manually, though it could be automated (at least for the common cases) as part of future work. - Support non-HDFS based locking mechanisms: the HdfsFailedFetchLock is an implementation of a new FailedFetchLock interface, which can serve as the basis for other distributed state/locking mechanisms (such as Zookeeper, or a native Voldemort-based solution). Unrelated minor fixes and clean ups included in this commit: - Cleaned up some dead code. - Cleaned up abusive admin client instantiations in BnP. - Cleaned up the closing of resources at the end of the BnP job. - Fixed a NPE in the ReadOnlyStorageEngine. - Fixed a broken sanity check in Cluster.getNumberOfTags(). - Improved some server-side logging statements. - Fixed exception type thrown in ConfigurationStorageEngine's and FileBackedCachingStorageEngine's getCapability(). | 30 June 2015, 18:11:45 UTC |
050ec92 | ARUNACHALAM THIRUPATHI | 29 June 2015, 21:18:22 UTC | Merge pull request #273 from bitti/master @bitti thanks for the fix, merged it in. Fix SecurityException when running HadoopStoreJobRunner in an oozie java action | 29 June 2015, 21:18:22 UTC |
fb9cab6 | David Ongaro | 17 June 2015, 16:24:58 UTC | Fix SecurityException when running HadoopStoreJobRunner in oozie | 17 June 2015, 16:24:58 UTC |
f3801cf | Arunachalam Thirupathi | 12 June 2015, 23:48:22 UTC | Releasing voldemort 1.9.17 | 12 June 2015, 23:48:22 UTC |
88fcf8d | Arunachalam Thirupathi | 31 May 2015, 08:51:47 UTC | ConnectionException is not catastrophic 1) If a connection timesout or fails during protocol negotiation, they are treated as normal errors instead of catastrophic errors. Connection timeout was a regression from NIO connect fix. Protocol negotiation timeout is a new change to detect the failed servers faster. 2) When a node is marked down, the outstanding queued requests are not failed and let them go through the connection creation cycle. When there is no outstanding requests they can wait infinitely until the next request comes up. 3) UnreachableStoreException is sometimes double wrapped. This causes the catastrophic errors to be not detected accurately. Created an utility method, when you are not sure if the thrown exception could be UnreachableStoreException use this method, which handles this case correctly. 4) In non-blocking connect if the DNS does not resolve the Java throws UnresolvedAddressException instead of UnknownHostException. Probably an issue in java. Also UnresolvedAddressException is not derived from IOException but from IllegalArgumentException which is weird. Fixed the code to handle this. 5) Tuned the remembered exceptions timeout to twice the connection timeout. Previously it was hardcoded to 3 seconds, which was too aggressive when the connection for some use cases where set to more than 5 seconds. Added unit tests to verify all the above cases. | 12 June 2015, 23:23:22 UTC |
2b95f0d | Dallas Marlow | 12 June 2015, 16:37:50 UTC | update coordinator class name in server script | 12 June 2015, 16:37:50 UTC |
d65f7db | Felix GV | 09 June 2015, 13:46:34 UTC | Releasing Voldemort 1.9.16 | 09 June 2015, 13:46:34 UTC |
c574a37 | Felix GV | 09 June 2015, 13:41:07 UTC | Standardized recent release_notes formatting. | 09 June 2015, 13:41:07 UTC |
97d8694 | Felix GV | 09 June 2015, 01:34:42 UTC | Some more AvroUtils and BnP clean ups. | 09 June 2015, 13:33:27 UTC |
e13f6a2 | Greg Banks | 07 June 2015, 20:20:21 UTC | Fix error reporting in AvroUtils.getSchemaFromPath() - report errors with an exception - report errors exactly once - provide the failing pathname - don't generate spurious cascading NPE failures | 09 June 2015, 01:56:31 UTC |
037a0dc | ARUNACHALAM THIRUPATHI | 08 June 2015, 23:01:48 UTC | Merge pull request #269 from FelixGV/VoldemortConfig_bug Fixed VoldemortConfig bug introduced in 3692fa3. | 08 June 2015, 23:01:48 UTC |
c7e6cec | Felix GV | 08 June 2015, 22:38:57 UTC | Fixed VoldemortConfig bug introduced in 3692fa3f493acf717b1431d624af4c997df4f2fd. | 08 June 2015, 22:38:57 UTC |
5f0cd8b | ARUNACHALAM THIRUPATHI | 06 June 2015, 00:28:12 UTC | Merge pull request #265 from gnb/VOLDENG-1912 Unregister the "-streaming-stats" mbean correctly | 06 June 2015, 00:28:12 UTC |
924c72f | Greg Banks | 06 June 2015, 00:17:01 UTC | Unregister the "-streaming-stats" mbean correctly This avoids littering up the logs with JMX exceptions like this 2015/06/04 23:55:58.105 ERROR [JmxUtils] [voldemort-admin-server-t21] [voldemort] [] Error unregistering mbean javax.management.InstanceNotFoundException: voldemort.server.StoreRepository:type=cmp_comparative_insights at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1095) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:427) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:415) at com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:546) at voldemort.utils.JmxUtils.unregisterMbean(JmxUtils.java:348) at voldemort.server.StoreRepository.removeStorageEngine(StoreRepository.java:187) at voldemort.server.storage.StorageService.removeEngine(StorageService.java:749) at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleDeleteStore(AdminServiceRequestHandler.java:1487) at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleRequest(AdminServiceRequestHandler.java:238) at voldemort.server.niosocket.AsyncRequestHandler.read(AsyncRequestHandler.java:190) at voldemort.common.nio.SelectorManagerWorker.run(SelectorManagerWorker.java:105) at voldemort.common.nio.SelectorManager.run(SelectorManager.java:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) | 06 June 2015, 00:22:19 UTC |
e63bc53 | ARUNACHALAM THIRUPATHI | 06 June 2015, 00:07:15 UTC | Releasing Voldemort build 1.9.15 | 06 June 2015, 00:07:15 UTC |
139e441 | Arunachalam Thirupathi | 05 June 2015, 23:30:17 UTC | Fix Log message HdfsFile does not have toString method which causes object id to be printed in the log message, it broke the script we had for collecting the download speed. Although speed can be calculated better now using the stats file, but that is a separate project. Added number of directories being downloaded, files in addition to size. This will help to track some more details, as the files if not exist, dummy files are created in place. Renamed HDFSFetcherAdvancedTest to HdfsFetcherAdvancedTest to keep it in sync with other naming conventions. | 05 June 2015, 23:58:49 UTC |
1592db0 | ARUNACHALAM THIRUPATHI | 04 June 2015, 18:49:05 UTC | Merge pull request #263 from FelixGV/hung_async_task_mitigation Added SO_TIMEOUT config (default 30 mins) in ConfigurableSocketFactory. Looks good. | 04 June 2015, 18:49:05 UTC |
3692fa3 | Felix GV | 03 June 2015, 17:42:52 UTC | Added SO_TIMEOUT config (default 30 mins) in ConfigurableSocketFactory and VoldemortConfig. Added logging to detect hung async jobs in AdminClient.waitForCompletion | 04 June 2015, 18:24:00 UTC |
13a4b81 | ARUNACHALAM THIRUPATHI | 31 May 2015, 16:32:33 UTC | HdfsCopyStatsTest fails intermittently The OS returns the expected files in random order. Use set instead of list. | 31 May 2015, 16:32:33 UTC |
b8d9525 | Arunachalam Thirupathi | 27 May 2015, 22:50:09 UTC | Add more testing for Serialization. Added more testing for Serialization. I was doing some tests on what is the expected input for the serializers and expected output. I thought it will be a good idea instead of just documenting, if i can write unit tests to validate them. Most of them have very poor testing, so decided to add the unit tests. I will add more testing as I start working more on the expected input/output. | 27 May 2015, 22:50:09 UTC |
b540533 | Arunachalam Thirupathi | 22 May 2015, 16:49:55 UTC | Release 1.9.14 Release version 1.9.14 | 22 May 2015, 16:49:55 UTC |
df12409 | Arunachalam Thirupathi | 18 May 2015, 18:14:36 UTC | RO Hdfs fetcher allocates too much memory 1) Hdfs Fetcher in 1.0.4 uses ByteRangeInputStream. This class does not override the method read(byte[], int , int). So it defaults to this method from InputStream, which reads a character at a time from the input stream. HttpInputStream for this method creates byte arrays for each read. So if you are download 2 TB data, the server will allocate/free 2 TB data before the data is downloaded. This creates too much garbage and new gen gets full in few milliseconds and GC happens. Though GC are fast, this too much GC causes the latency to spike and causes JVM to run out of Memory. 2) http://svn.apache.org/viewvc?view=revision&revision=1330500 fixed this issue on April 2012 rather knowingly/unknowingly. I tried upgrading to Hadoop latest but it brings in ProtoBuf 2.5.0 and Avro 1.7. When I disabled the dependencies it failed at runtime expecting protobuf 2.5.0 . I enabled only protobuf and it has no runtime dependency on Avro 1.7. But I am saving that fix for a later day. The branch is hadoop_Version_Upgrade which uses Hadoop 2.6.0 and ProtoBuf 2.6.1 | 18 May 2015, 18:49:17 UTC |
e2d845c | ARUNACHALAM THIRUPATHI | 14 May 2015, 17:24:42 UTC | Output stats file for RO files download .stats directory will be created and will contain last X (default: 50) stats file. If a version-X is fetched a file with the same name as this directory name will contain the stats for this download. The stats file will contain the individual file name, time it took to download and few other information. Added unit tests for the HdfsCopyStatsTest | 18 May 2015, 18:49:17 UTC |
b5db5ed | Xu Ha | 13 May 2015, 20:49:12 UTC | fix slop pusher unit test | 15 May 2015, 21:22:29 UTC |
45cce9e | Xu Ha | 13 May 2015, 22:54:19 UTC | fix store-delete command | 13 May 2015, 22:54:19 UTC |
705b6ff | Xu Ha | 13 May 2015, 17:34:57 UTC | add admin command for meta get-ro and add test config for readonly-two-nodes-cluster | 13 May 2015, 20:49:30 UTC |
eabb057 | Xu Ha | 23 January 2015, 21:52:59 UTC | Add Admin API to list/stop/enable scheduled jobs | 13 May 2015, 17:48:54 UTC |
20f1037 | Xu Ha | 12 May 2015, 17:52:55 UTC | add storeops.delete and deleteQuotaForNode, fix vector clock for setQuotaForNode | 12 May 2015, 21:39:11 UTC |
b4fa1cb | ARUNACHALAM THIRUPATHI | 11 May 2015, 18:06:21 UTC | Refactor HdfsFetcher 1) Created directory and File class to help me in the future. 2) Cleaned up some code to make for easier readability. | 12 May 2015, 17:54:57 UTC |
3378d6c | Arunachalam Thirupathi | 11 May 2015, 22:30:00 UTC | Code compiled on Java8 fails to run on Java6 Ever witnessed Exception in thread "main" java.lang.NoSuchMethodError: java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView; at voldemort.store.metadata.MetadataStore.updateRoutingStrategies(MetadataStore.java:855) at voldemort.store.metadata.MetadataStore.init(MetadataStore.java:1189) This is because of the issue documented here https://gist.github.com/AlainODea/1375759b8720a3f9f094 | 11 May 2015, 22:30:00 UTC |
20455c7 | Arunachalam Thirupathi | 11 May 2015, 21:45:23 UTC | Releasing Voldemort 1.9.13 | 11 May 2015, 21:45:23 UTC |
b4674b5 | Arunachalam Thirupathi | 11 May 2015, 21:09:02 UTC | Suppress obsoleteVersionException on logs During the refactoring of the server buffers, all errors from the stroage engine are logged. Previous code does not log any errors on writes. I looked at the exception stack and could not see other errors that need to be suppressed. Verified that ProtocolBuffer does not log any error, so only Voldemort Native request handler is affected. | 11 May 2015, 21:09:02 UTC |
1c8e0d4 | ARUNACHALAM THIRUPATHI | 07 August 2014, 07:50:57 UTC | NIO style connect Problems : 1) Connect blocks the selector. This causes other operations (read/write ) queued on the selector to incur additional latency or timeout. This is worse when you have data centers that are far away. 2) ProtocolNegotiation request is done after the connection establishment which blocks the selector in the same manner. 3) If Exceptions are encountered while getting connections from the queue they are ignored. Solutions : The connection creation is async. Create method is modified to createAsync and it takes in the pool object. for NIO the createAsync triggers an async operation which checks in the connection when it is ready. For Blocking connections the createAsync blocks, creates the connection and checks in the connection to the pool before returning. As the connection creation is async now, exceptions are remembered (for 5 seconds ) in the pool. When some thread asks for a connection and if the exceptions are remembered they will get an exception. There is no ordering in the way connections are handed out, one thread can request a connection and before it could wait, other thread could steal this connection. This is avoided to a certain extent by instead of doing one blocking wait, the thread splits the blocking wait in 2 half and creates connection if required. This should not be a problem in the real world as when you reach steady state ( create required number of connections) this can't happen. Upgrade the source compatibility from java 5 to 6. Most of the code is written with the assumption of Java 6, I don't believe you can run this code on Java 5. So the impact should be minimal, but if it goes in Client V2 branch, it will get benefit of additional testing. | 06 May 2015, 22:06:42 UTC |
5c03ea6 | Bhavani Sudha Saktheeswaran | 01 May 2015, 22:21:44 UTC | Releasing Voldemort 1.9.12 | 01 May 2015, 22:27:47 UTC |
706a1f3 | Bhavani Sudha Saktheeswaran | 01 May 2015, 01:54:19 UTC | Add more tests and fix buffer size of GZIP Streams | 01 May 2015, 21:09:59 UTC |
495a234 | ARUNACHALAM THIRUPATHI | 30 April 2015, 18:18:37 UTC | Merge pull request #256 from FelixGV/disable_ant_build Fully disabled the Ant build in favor of the Gradle one. Though the docs task is not yet ported to gradle, we can always fetch the build.xml from an older trunk and generate the docs. Given the amount of confusion it causes, I will merge this change in. | 30 April 2015, 18:18:37 UTC |
d310bf2 | Felix GV | 30 April 2015, 18:11:43 UTC | Fully disabled the Ant build in favor of the Gradle one. | 30 April 2015, 18:11:43 UTC |
8addbf7 | Arunachalam Thirupathi | 30 April 2015, 17:44:09 UTC | Fix the Readme to use Gradle Remove the ant and fix the readme to use Gradle. | 30 April 2015, 17:44:09 UTC |
c01f26e | Arunachalam Thirupathi | 27 April 2015, 21:42:36 UTC | Rebalance unit tests fail intermittently There are 2 issues. 1) Put is asynchronous, so there needs to be wait time before the put is verified on all the nodes. 2) Repeated puts need to generate different vector clocks. | 27 April 2015, 22:01:41 UTC |
9af4da4 | Xu Ha | 24 April 2015, 06:58:42 UTC | turn on reset-quota by default for rebalance-controller-cli | 25 April 2015, 01:22:27 UTC |
a831610 | Xu Ha | 24 April 2015, 06:50:54 UTC | split quota-resetting logic to QuotaResetter class and add unit test | 25 April 2015, 01:22:27 UTC |
8e39e55 | Xu Ha | 21 April 2015, 23:36:21 UTC | add reset-quota logic in RebalanceControllerCLI | 25 April 2015, 01:22:27 UTC |
a103fca | Bhavani Sudha Saktheeswaran | 24 April 2015, 00:43:21 UTC | Releasing Voldemort 1.9.11 | 24 April 2015, 00:43:21 UTC |
2ec72c4 | Bhavani Sudha Saktheeswaran | 15 April 2015, 01:00:16 UTC | Adding compression to RO path - first pass commit VoldemortConfig - Added a new config for compression codec. Default value for this property is GZIP. This is used by the AdminServiceRequestHandler to respond to the VoldemortBuildAndPushJob on what codec is supported. VAdminProto - Added a new Request Type for getting the suported compression codecs from RO Voldemort Server AdminServiceRequestHandler - New method to handle the above request type. AdminClient - Provides a method - getSupportedROStorageCompressionCodecs, that supports the above request type.. VoldemortBuildAndPushJob - inside run(), immediately after check cluster equalities, an admin request is issued to the VoldemortServer (specified by the property "push.node") to fetch the RO Compression Codec supported by the Server. - If any of the supported CODEC match the COMPRESSION_CODEC, then compression specific properties are set. Else no compression is enabled. AbstractHadoopJob - This is where the RO compression specific properties are set in Jobconf inside the createJobConf() Method HadoopStoreWriter and HadoopStoreWriterPerBucket - Adding dummy test only constructors - Creating index and value file streams based on compression settings - Got rid of some unused variables - minor movement of code HDFSFetcher - Changed copyFileWithCheckSum() to check if the files are ending with ".gz" and create a GZPIInputStream based on that. - GZPIInputStream (if compression is enabled) wraps the orifinal FSDataInputStream Tests for HadoopStoreWriter and HadoopStoreWriterPerBucket - These ar parameterized tests - takes in a boolean to either save keys or not - Run two tests - compressed and uncompressed - have tighter assumptions and use the test specific constructors in the corresponding classes | 23 April 2015, 23:14:37 UTC |
9de1042 | Siddharth Singh | 20 April 2015, 23:55:36 UTC | Fix mode option in cluster fork lift | 21 April 2015, 22:39:15 UTC |
9e21ccb | Xu Ha | 17 April 2015, 18:32:40 UTC | create admin api for quota operations 1. Get quota by node id 2. Set quota by node id 3. Rebalance quota 4. Unit test for the new admin apis | 20 April 2015, 21:26:50 UTC |
b83e3e7 | Xu Ha | 17 April 2015, 01:38:23 UTC | add metadata key for quota.enforcement.enabled | 20 April 2015, 21:26:50 UTC |
c8e583e | Arunachalam Thirupathi | 16 April 2015, 01:10:50 UTC | Releasing Voldemort 1.9.10 | 16 April 2015, 01:10:50 UTC |
32e2e0b | ARUNACHALAM THIRUPATHI | 30 March 2015, 05:44:38 UTC | Client buffer cleanup and isCompleteResponse 1) Client isCompleteResponse for Get and GetAll allocates the entire key and value. Discards them immediately. Now the byte array is not de-serialized and the validity is verified by advancing the pointers. 2) Put request size is calculated and the buffer is grown to the required size to avoid double allocation. | 16 April 2015, 00:00:40 UTC |
3f425ef | ARUNACHALAM THIRUPATHI | 26 March 2015, 15:06:26 UTC | Vector clock deserializer from Input Stream Avoid double allocating the value size for puts which can be potentially few kilobytes. Vector clock has a deserializer from InputStream and it is used to avoid the double allocation on the hot path. | 16 April 2015, 00:00:40 UTC |
94be1a5 | Arunachalam Thirupathi | 25 March 2015, 21:04:40 UTC | ShareBuffer Refactoring Refactored the Shared Buffer code to eliminate the separate read and write buffers. Now a common buffer is used and the code is refactored into its own classes. running the unit test. | 16 April 2015, 00:00:40 UTC |
b3becf3 | Arunachalam Thirupathi | 23 March 2015, 18:43:01 UTC | Separate Client and Admin Request Handler Separated both Admin and Client Request Handler. Currently the client port will answer admin requests and the admin port will answer client requests. You can bootstrap from one of these ports and client after bootstrapping sends the queries to the correct ports. This is dangerous as most of the security implementations of voldemort relies on blocking the admin port via firewall and an attacker can change the voldemort source code to send the admin requests to client port. My intention for the fix was to make sure that the client answers only client requests. This will help me to make the client request handler share the read and write buffer without touching the admin request handler. Though it can be done for both client and admin, admin requests are too few and there are too many places to touch. So will fix only the client request handler. The AdminClient expects both the client and admin request handler. The admin client does some get remote metadata calls which uses the voldemort native v1 requests on admin port. So leaving the admin request handler unchanged, just moved some code so that client request handlers are isolated. | 16 April 2015, 00:00:40 UTC |
4a87d69 | ARUNACHALAM THIRUPATHI | 24 August 2014, 19:58:38 UTC | client sharing read/write buffer Client either writes/reads from socket, never does them together. So the buffer can be shared which will bring down the memory requirement for the client by half. But the client has to watch for 2 things 1) On Write the buffer expands as necessary. So the buffer needs to be reinitialized if it grows. 2) On Read, if the buffer can't accomodate it grows as necessary, this case also needs to be handled. This works as expected and the unit tests are passing. Will put it through VPL to measure the efficiency of the fixes. Created a new class to hold the Buffer reference. This helps to share the buffer between input and output streams easily. Previously you have to watch out for places where one buffer moves away from the other and need to call an explicit method to update it. Also moved many buffer growing and resetting logic to a common code, so it is more readable and understandable. Should I rename the ByteBufferContainer to MutableByteBuffer this fits the MutableInt pattern nicely where a single int can be shared by multiple classes and updating one is visible to others. | 16 April 2015, 00:00:40 UTC |
298bdc1 | Arunachalam Thirupathi | 13 April 2015, 22:01:53 UTC | Increase the heap size for Tests Increase the heap size for Tests to 8GB ZoneShrinkage tests fails time to time with errors, as it runs out of heap. | 13 April 2015, 22:01:53 UTC |
d546a02 | Felix GV | 10 April 2015, 18:34:22 UTC | Releasing Voldemort 1.9.9 | 10 April 2015, 18:34:22 UTC |
ca08a06 | Greg Banks | 09 April 2015, 21:15:01 UTC | Merge pull request #251 from voldemort/revert-223-master Revert "Steps towards automating cluster zone expansion" | 09 April 2015, 21:15:01 UTC |