https://github.com/voldemort/voldemort

sort by:
Revision Author Date Message Commit Date
a569a2a Merge branch 'coadmin' of github.com:voldemort/voldemort into coadmin 16 September 2014, 01:13:52 UTC
fc8a3c8 some fix on admin client 16 September 2014, 01:13:45 UTC
912e2e7 More progress on the remote coordinator admin capabilities. GET all configs now works. 16 September 2014, 01:09:33 UTC
d62f503 Work in progress of the server-side Coordinator admin work. 12 September 2014, 00:54:56 UTC
8358888 Merge branch 'coadmin' of https://github.com/voldemort/voldemort into coordinator_admin Conflicts: src/java/voldemort/rest/coordinator/CoordinatorProxyService.java src/java/voldemort/rest/coordinator/admin/CoordinatorAdminRequestHandler.java 11 September 2014, 00:10:11 UTC
c5995a7 create coordinator-admin-client 10 September 2014, 21:18:28 UTC
c2e9181 create coord-admin-tool 10 September 2014, 21:18:28 UTC
f4d4e03 Remove unneeded code and add add TODO for send response 10 September 2014, 21:18:28 UTC
3691116 Fix typos, copy paste error 10 September 2014, 21:18:28 UTC
51d2d0f More refactoring - rename classes etc 10 September 2014, 21:18:28 UTC
16255e6 Refactor code for coordinator admin 10 September 2014, 21:18:28 UTC
b8dbbe7 ZoneShrinkage endToendTest fails Zone is shrinked when the nodes receive get/put traffic. The bootstrap URL is set to node 1 and node 0 is removed at the end of zone shrinkage. Zone shrinkage metadata is written to nodes in order. So first node 0 receives the update, that it is no longer part of the cluster. But before node 1 is updated, the bootstrap code retries to bootstrap from node 1. This is a race condition between admin client setting correct metadata on all nodes and the client threads refreshing the old value from other nodes. The problem is update metadataversion properties fail as the node 0 is not longer a part of the cluster. The race condition will be avoided if the bootstrap url and the node in which we first write the metadata change are one and the same. This is what this specific fix does, by making sure node 0 is not removed during zone shrinkage. 05 September 2014, 14:55:41 UTC
e06bd37 Releasing Voldemort 1.8.16 Releasing voldemort 1.8.16 05 September 2014, 00:08:04 UTC
a0ca5c8 QuotaException causes delete failure 1) When QuotaException is thrown by delete, it fails the delete instead of Quorum to decide the failure. 2) Added unit test to cover some part of the delete test case. 05 September 2014, 00:02:45 UTC
19a97db bdb native backup fixups 1. add message for native backup status 2. add bdb env config param: bdb.recovery.force.checkpoint 03 September 2014, 19:30:30 UTC
c2c1473 Releasing Voldemort 1.8.15 02 September 2014, 23:28:40 UTC
60e99e4 Made the read quota take all GET_ALL keys into account 30 August 2014, 00:17:42 UTC
3e16d36 Merge Read and Write Quota Combined GET and GETALL to GET Combined PUT and DELETE to PUT It would be cleaner if we rename the new GET to READ and PUT to WRITE But it might break the backward compatibility so leaving it like this for now. 29 August 2014, 00:10:49 UTC
d5c048b Merge branch 'master' of https://github.com/voldemort/voldemort 28 August 2014, 20:53:16 UTC
c080cfc Removed old parent delegating code in StoreStats which would cause double counting. Conflicts: src/java/voldemort/store/stats/StoreStats.java 28 August 2014, 17:52:24 UTC
d7d7349 Slop for delete operations Problems: 1) Exception handling of delete is very different at 4 places ( on normal response, required failure, quorum failure) and after pipeline is finished. 2) The exceptions are reported again and again ( They are not removed from the map). 3) Some places ignore obsoleteVersionException, some others report it. 4) There is a zombie state abort, there is no way to reach this state. 5) Multiple slops could be sent, because of the issue 2. When the pipeline is aborted, no slops could be sent. 6) Refactored QuotaLimitingStore test to add delete test cases 7) Combined the PUT and GET quotas into 2 quotas. Solution: Defined a common method, so that all 4 places call into the same method. Only the condition for calling is different. Race conditions still exist, after zone failure check but before pipeline finishes the exception will go missing, slops will not happen, but the chances are reduced. Got rid of the state PerformDeletedHintHandoff as QuotaExceededExceptions will not be reported as failures. Now doing it in place. 27 August 2014, 02:41:21 UTC
58ab1dc Merge pull request #208 from readams/master Fix bitrot and make it build again with newer toolchain 27 August 2014, 01:40:09 UTC
453ddc1 Fix the lib.private section of pkg-config file 27 August 2014, 01:05:22 UTC
37e18e7 Fix bitrot and make it build again with newer toolchain 27 August 2014, 00:58:31 UTC
8724f64 Releasing Voldemort 1.8.14 27 August 2014, 00:37:25 UTC
ed86e11 Upgraded to Tehuti 0.5 Tehuti 0.5 includes a fix about Histograms not clearing their samples properly. Also added some trace logging code in StoreStats. 26 August 2014, 23:20:21 UTC
b762a80 Merge pull request #212 from arunthirupathi/CleanUpRebalance DataCleanup job should not run during rebalance 25 August 2014, 23:12:41 UTC
3f0fc7b Releasing Voldemort 1.8.13 25 August 2014, 22:40:42 UTC
eaf9b6f Merge pull request #210 from arunthirupathi/deleteGetVersion Delete should use GetVersion instead of Get 25 August 2014, 22:18:12 UTC
0371771 Log warning for GET ALL and DELETE Log warnings for other than GET and PUT. 24 August 2014, 16:41:13 UTC
90cd6a8 Merge pull request #214 from zhongjiewu/example-config-cleanup Migrate from stores.xml to STORES folder in all example configs 24 August 2014, 14:33:31 UTC
62aa593 Migrate from stores.xml to STORES folder in all example configs 24 August 2014, 00:14:38 UTC
9bbe12a Merge pull request #213 from zhongjiewu/better-example Polish up the Java Client example 23 August 2014, 06:22:20 UTC
5307345 Polish up the Java Client example 23 August 2014, 00:59:16 UTC
8281397 DataCleanup job should not run during rebalance 1) There is a check to not start Data Cleanup job during rebalance, but after it is started, there is no periodic check. This causes the data cleanup to compete with rebalance. 2) Check every 10,000 entries scanned to make sure that there is no rebalance in progress. 3) Added error condition around the data retention to make sure that if it is set to 0 or lesser days the value is ignored and an error is logged. Previously if it is set to 0, it will wipe out your entire data. 22 August 2014, 23:01:38 UTC
d8c9353 Merge pull request #211 from arunthirupathi/node0Issue Tools depend on Node 0 to be available 22 August 2014, 21:20:06 UTC
b1b68a3 Delete should use GetVersion instead of Get 1) Currently Delete uses Get instead of GetVersion before doing delete. This can cause the Get to count towards Quota and is also unnecessary overhead when you want to delete things. 2) Now GetVersion is attempted and if multiple versions are encountered, then a Get is done to do the read repair. 3) Put uses duplicated code, the code is consolidated. Made the methods private. I think I should replace the GetVersion method with the new method I wrote, but not sure what is the use case and saving that for a later date. 22 August 2014, 19:35:20 UTC
105915d Tools depend on Node 0 to be available Added a new override for getRemoteStoreDefList which takes no parameters and identifies one of the nodes, for it to be used. Fixed all the tools (not used in production code path but could be used by SREs) code, which used NodeId 0 as default to use the new overload. VoldemortMultiStoreBuildAndPushJob seems to have the issue, but not sure, whether it is used in production, so just added the comment. 22 August 2014, 00:06:13 UTC
5aabfbe Fixed the test code in CoordinatorAdminRequestHandler so that it responds to requests. Still just test code, not very advanced yet. 21 August 2014, 01:07:54 UTC
c6d419a Merge branch 'coadmin' of https://github.com/voldemort/voldemort into coordinator_admin 21 August 2014, 01:00:40 UTC
c51a598 Fix typos, copy paste error 21 August 2014, 00:49:13 UTC
f73707a Merge branch 'coadmin' of https://github.com/voldemort/voldemort into coordinator_admin 19 August 2014, 22:46:22 UTC
c451ed4 More refactoring - rename classes etc 19 August 2014, 03:21:11 UTC
9bda8f0 Merge branch 'master' of https://github.com/singhsiddharth/voldemort into coordinator_admin 19 August 2014, 00:40:53 UTC
de1ab01 Refactor code for coordinator admin 19 August 2014, 00:31:44 UTC
c07b777 fix duplicate error message bug and add large value size test for coordinator 18 August 2014, 21:23:59 UTC
6644d32 Merge pull request #207 from FelixGV/master Fixed the Show Spurious Values test... 14 August 2014, 21:30:23 UTC
7e774cb Fixed the Show Spurious Values test... 14 August 2014, 20:26:32 UTC
265b4c9 Rename variable 14 August 2014, 17:09:38 UTC
25b1cf7 Remove tehuit from lib as it is auto generated The jar should be copied from private-lib, for some weird reason I thought the lib is checked in and hence I checked in but it is not required. 14 August 2014, 00:41:34 UTC
28e19b2 Merge pull request #206 from arunthirupathi/zoneCheck Zone check 14 August 2014, 00:40:52 UTC
6d8c05d Zone check, store name in error, unit tests 1) When Server fails to start because of an invalid store, the store is logged in the error message. So that it is actionable. 2) Added checks to zone proximity list to avoid same zone and duplicate zone ids. 3) Modified Zones from LinkedList to ArrayList, as get operation is more efficient in ArrayList than the LinkedList. 4) Refactor the common code in zone calculations to common functions. 5) Added unit tests to cover the new checks added to zone proximity list. 13 August 2014, 22:47:55 UTC
044e96f Merge pull request #205 from FelixGV/master Releasing Voldemort 1.8.12 12 August 2014, 01:56:46 UTC
f29a973 Releasing Voldemort 1.8.12 12 August 2014, 01:30:04 UTC
56e199d Tools are failing because of the missing files Added tehuti to the lib directory. 12 August 2014, 00:49:45 UTC
f5e886d Merge conflict was incorrectly resolved Fixed the merge conflict 12 August 2014, 00:39:00 UTC
5b7f23a Merge conflict on StatsTest Resolved the merge conflict 12 August 2014, 00:32:38 UTC
2df417d Upgraded to latest Tehuti, 0.4 12 August 2014, 00:12:21 UTC
d8fd80a Various fixes for the stats code... 12 August 2014, 00:12:21 UTC
9f5b30e Fixed bugs related to ClientSocketStatsTest... 12 August 2014, 00:12:20 UTC
651c67a Fixed a problem with RequestCounter.getNumEmptyResponses(). It required a SampledTotal, not a SampledCount. 12 August 2014, 00:12:20 UTC
56daa0c Fixed a problem in StoreStatsJmxTest where Tehuti's Max would return NEGATIVE_INFINITY. 12 August 2014, 00:12:20 UTC
add55a9 Fixed a test in StatsTest to use the new RequestCounter API. 12 August 2014, 00:12:20 UTC
53d6b0e Initial commit of the Tehuti stats integration. Some unit tests are failing. Probably a few bugs need squashing... 12 August 2014, 00:12:20 UTC
c9a5397 Skewed reporting 12 August 2014, 00:12:19 UTC
c3773f4 Fix a vadmin error 11 August 2014, 22:01:06 UTC
6cab7b1 C++ Client: bump maintenance version number. 07 August 2014, 04:33:55 UTC
e3c4375 Makes Store::get and Store::getName const functions. Also changes InconsistencyResolvingStore, RoutedStore and SocketStore implementations to be const. 07 August 2014, 04:33:55 UTC
559d202 C++ client: Changes VersionedValue::getVersion to return a const Version*. By returning a non-const Version*, the semantic const-correctness was broken. You could not modify the pointer, but that didn't matter. 07 August 2014, 04:33:54 UTC
45bc4ea C++ client: Adds VOLDEMORT_ prefix to all include guards. This should minimize conflicts with other libraries. 07 August 2014, 04:33:54 UTC
9f40015 Changes exception message in Cluster::getNodeById to use boost::format. The previous code would cause a segmentation fault if nodeId was greater than the string length, and the message now actually includes the ID. 07 August 2014, 04:33:54 UTC
a51f328 Changes Cluster constructor to take string by const reference. 07 August 2014, 04:33:54 UTC
f3c6fdb Four node and two node configs two configurations for local testing. I found it helpful for testing some of the scenarios. Editing them on multiple machines is painful and others could use this for testing. 04 August 2014, 21:15:48 UTC
acd1117 gradle build fails on mac some of the mac is forced to use Java 8 and Scala plugin does on Gradle 1.12 does not work with Java 8. Gradle has fixed the issue on 2.0 version. Added a task to cleanup the config directory files so that changes to stores.xml will take effect. Scala build intermittently fails with the error on mac FAILURE: Build failed with an exception. What went wrong: Could not resolve all dependencies for configuration ':zinc'. > Could not find org.scala-lang:scala-library:2.10.2. This is caused by incremental zinc compilation. Given the fact that scala is very rarely used the intermittent failure is very hard to workaround. Added a task for cleaning up config folder so that changes in stores.xml will take effect. 04 August 2014, 21:15:48 UTC
ec2fcfd Reuse objectmapper object to make vector clock serialization faster 04 August 2014, 18:32:49 UTC
0a0ef0c Releasing voldemort 1.8.11 01 August 2014, 22:02:49 UTC
970877c Exceptions goes uncounted Get does parallel requests first and if it fails/times move on to the serial requests. Exceptions are tracked if the parallel requests fail or timesout after the pipeline is finished. But if the Serial requests are processing then the exception goes unnoticed. This causes the node to stay available for a long time affecting the latency and error rates. This fix reduces the time window an exception goes unnoticed. I thought about moving the error handling inside the callback and not have it at 2 places. But not sure what will be the perf impact of the error tracking code, could be negligible but saving that for a later time. 30 July 2014, 17:52:00 UTC
51b2395 ClientRequest Executor fix and log fixes 1) Protocol negotiation request does not have a timeout associated with it . So if the connection fails the Input/Output buffers are leaked and cause Out Of Memory errors. Setting the timeout for the Protocol negotiation requests as the caller expects a timeout. No new logs are introduced. Only the existing log is made more informative. 2) When protocol negotiation timesout, an obscure error message is thrown, IllegalStateException. This happens as the code tries to read the result in case of timeout. Handle this case and report a meaningful error. 3) Serial request times out on a connection, error message does not mention the socket that timed out. 4) Fix other minor logging information. 30 July 2014, 17:41:25 UTC
5708c0c Threshold failure detector issue 1) Threshold failure detector, marks a node as available when it receives a non-catastrophic error after configured number of catastrophic errors. When a node goes unavailable, generally first there will be lots of connectExceptions ( catastrophic) and timeouts for already established connections. This causes failure detector to treat a node that is being down as up and affects the client latency as all the get waits for the connection timeout to happen and goes for the serial requests. 2) Threshold failure detector, marks a node as unavailable after a window rolls over. Threshold failure detector tracks successes/failures in window ( default 5 minutes) and if the successes drop below a configured ( defaul 95 ) percentage and if there are more errors than configured it marks the node as down. When the window rolls over, and if the first request succeeds or when a node comes back up the failure count is not reset. This causes an available node to be marked as down. 3) Enhance the unit tests to cover the above 2 cases. 4) Dump the statistics when a node goes down to reason about it in the logs. The new code is written with the following reasoning. Do the book keeping first. Make decision next. The new code intentionally does not reset the catastrophic errors on a window roll over as it will be reset by the first success anyway. The node can still flap when the failure is above minimum and the percentage oscillates around the configured percentage. But I don't see any good workaround and nor it was an actual issue on 20+ repetitions I created for an internal repro. So leaving them as it is. Previous code mixed both of these and lead to many issues. The previous code left many loose ends ( Boolean represented in int, Generic set methods, when it required reset, Set methods at individual variable level, when only reset method is required, copy pasted code). PS: There is a more serious third issue, Selector drains the parallel queue. When the request requires connection establishment Selector handles the connection too. If the node is dead Selector is going to wait configured time for connection to timeout ( default 5 seconds) . In this time Selector is not pumping read/write and hence all these requests eventually time out. We are discussing the potential issues for this fix and will address this in the next fixes. There are many other minor issues I uncovered as well. 29 July 2014, 21:17:38 UTC
4fd732f Expose http decoder parameters as config 28 July 2014, 19:53:41 UTC
a86898e Releasing Voldemort 1.8.10 23 July 2014, 23:49:52 UTC
ee09314 Register slops if async writes fail due to quota Fix Async puts to succeed even if quota exceeded Please enter the commit message for your changes. Lines starting Adding unit test for Quota failed Parallel puts Fixing unit tests for Async write quota failures Updating PipelineRoutedStats JMXGetters for QuotaExceededException And minor code chagnes 23 July 2014, 23:34:51 UTC
493f500 Remove dated inaccurate comment 16 July 2014, 20:08:03 UTC
30185d4 Releasing Voldemort 1.8.9 02 July 2014, 22:05:25 UTC
0a9b7ed minor code review comments 02 July 2014, 20:49:38 UTC
28f5d9f Adding schema backwards comaptibility in AdminClient for these methods: * updateRemoteMetadata * updateRemoteMetadataPair * addStore 02 July 2014, 18:40:52 UTC
a42ac35 Adding backwards compatibility checks in AdminClient For the methods: * fetchAndUpdateRemoteStore(int nodeId, List<StoreDefinition> updatedStores) * updateRemoteStoreDefList(int nodeId, List<StoreDefinition> storesList) so the exceptions are thrown even before hitting the server. 02 July 2014, 00:18:37 UTC
7ec06ad Add schema backward compatibility check when updating store definitions 25 June 2014, 00:20:17 UTC
cd4c627 Releasing Voldemort 1.8.8 24 June 2014, 20:50:41 UTC
8dbdb63 Incorporating review comments Changing timeout for HttpClientFactory shutdown to 10 seconds 24 June 2014, 20:09:41 UTC
ca6d63c move transport client instantiation to getRawStore adding timeout to HttpClientFactory shutdown added minor log messages 24 June 2014, 20:09:41 UTC
de7cae0 Merge pull request #199 from arunthirupathi/eclipseJunit Junit tests are not running inside eclipse 24 June 2014, 19:17:13 UTC
7c9af85 Junit tests are not running inside eclipse Java builder property is not set on the .project file which causes all the tests to fail with the ClassNotFoundException Some test uses GetContent which causes CoordinatorRestAPITest to fail. 24 June 2014, 17:12:08 UTC
11c8664 Update CONTRIBUTORS Add Arun and Xu 20 June 2014, 20:08:56 UTC
30b0c32 Wrap mime multipart into a mime message to ensure headers are updated 19 June 2014, 22:46:18 UTC
674b152 add socket connection refresh message in AdminClient 19 June 2014, 22:22:44 UTC
2d531e2 Releasing Voldemort 1.8.7 with the Quota metrics fix 19 June 2014, 18:44:14 UTC
9065ec1 Merge pull request #198 from icefury71/mbean_quota_fix Registering individual store quota stats instead of aggregate to the Mbe... 19 June 2014, 18:36:49 UTC
3d63974 hudson run fails to copy mysql 1) There is not libs directory anymore so hudson run fails to copy the mysql.jar to the lib directory. Added the mysql as testRuntime so the jar will be downloaded automatically 2) Added war task to update the documentation 19 June 2014, 17:43:40 UTC
cf92ed7 Registering individual store quota stats instead of aggregate to the Mbean server 19 June 2014, 17:25:32 UTC
back to top