https://github.com/voldemort/voldemort
Revision c2db8fd9b0afe714f6f88908c769b66485d28825 authored by Felix GV on 30 June 2015, 18:11:45 UTC, committed by Felix GV on 30 June 2015, 18:11:45 UTC
This commit introduces a limited form of HA for BnP. The new functionality is disabled by default and can be enabled via the following server-side configurations, all of which are necessary:

push.ha.enabled=true
push.ha.cluster.id=<some arbitrary name which is unique per physical cluster>
push.ha.lock.path=<some arbitrary HDFS path used for shared state>
push.ha.lock.implementation=voldemort.store.readonly.swapper.HdfsFailedFetchLock
push.ha.max.node.failure=1

The Build and Push job will interrogate each cluster it pushes to and honor each clusters' individual settings (i.e.: one can enable HA on one cluster at a time, if desired). However, even if the server settings enable HA, this should be considered a best effort behavior, since some BnP users may be running older versions of BnP which will not honor HA settings. Furthermore, up-to-date BnP users can also set the following config to disable HA, regardless of server-side settings:

push.ha.enabled=false

Below is a description of the behavior of BnP HA, when enabled.

When a Voldemort server fails to do some fetch(es), the BnP job attempts to acquire a lock by moving a file into a shared directory in HDFS. Once the lock is acquired, it will check the state in HDFS to see if any nodes have already been marked as disabled by other BnP jobs. It then determines if the Voldemort node(s) which failed the current BnP job would bring the total number of unique failed nodes above the configured maximum, with the following outcome in each case:

- If the total number of failed nodes is equal or lower than the max allowed, then metadata is added to HDFS to mark the store/version currently being pushed as disabled on the problematic node. Afterwards, if the Voldemort server that failed the fetch is still online, it will be asked to go in offline node (this is best effort, as the server could be down). Finally, BnP proceeds with swapping the new data set version on, as if all nodes had fetched properly.
- If, on the other hand, the total number of unique failed nodes is above the configured max, then the BnP job will fail and the nodes that succeeded the fetch will be asked to delete the new data, just like before.

In either case, BnP will then release the shared lock by moving the lock file outside of the lock directory, so that other BnP instances can go through the same process one at a time, in a globally coordinated (mutually exclusive) fashion. All HA-related HDFS operations are retried every 10 seconds up to 90 times (thus for a total of 15 minutes). These are configurable in the BnP job via push.ha.lock.hdfs.timeout and push.ha.lock.hdfs.retries respectively.

When a Voldemort server is in offline mode, in order for BnP to continue working properly, the BnP jobs must be configured so that push.cluster points to the admin port, not the socket port. Configured in this way, transient HDFS issues may lead to the Voldemort server being put in offline mode, but wouldn't prevent future pushes from populating the newer data organically.

External systems can be notified of the occurrences of the BnP HA code getting triggered via two new BuildAndPushStatus passed to the custom BuildAndPushHooks registered with the job: SWAPPED (when things work normally) and SWAPPED_WITH_FAILURES (when a swap occurred despite some failed Voldemort node(s)). BnP jobs that failed because the maximum number of failed Voldemort nodes would have been exceeded still fail normally and trigger the FAILED hook.

Future work:

- Auro-recovery: Transitioning the server from offline to online mode, as well as cleaning up the shared metadata in HDFS, is not handled automatically as part of this commit (which is the main reason why BnP HA should not be enabled by default). The recovery process currently needs to be handled manually, though it could be automated (at least for the common cases) as part of future work.
- Support non-HDFS based locking mechanisms: the HdfsFailedFetchLock is an implementation of a new FailedFetchLock interface, which can serve as the basis for other distributed state/locking mechanisms (such as Zookeeper, or a native Voldemort-based solution).

Unrelated minor fixes and clean ups included in this commit:

- Cleaned up some dead code.
- Cleaned up abusive admin client instantiations in BnP.
- Cleaned up the closing of resources at the end of the BnP job.
- Fixed a NPE in the ReadOnlyStorageEngine.
- Fixed a broken sanity check in Cluster.getNumberOfTags().
- Improved some server-side logging statements.
- Fixed exception type thrown in ConfigurationStorageEngine's and FileBackedCachingStorageEngine's getCapability().
1 parent 050ec92
History
Tip revision: c2db8fd9b0afe714f6f88908c769b66485d28825 authored by Felix GV on 30 June 2015, 18:11:45 UTC
First-cut implementation of Build and Push High Availability.
Tip revision: c2db8fd
File Mode Size
.settings
bin
clients
config
contrib
docs
example
gradle
private-lib
public-lib
src
test
voldemort-contrib
.gitignore -rw-r--r-- 242 bytes
CONTRIBUTORS -rw-r--r-- 659 bytes
LICENSE -rw-r--r-- 11.1 KB
NOTES -rw-r--r-- 2.5 KB
NOTICE -rw-r--r-- 8.1 KB
README.md -rw-r--r-- 4.4 KB
build.gradle -rw-r--r-- 19.8 KB
build.xml -rw-r--r-- 1.7 KB
gradle.properties -rw-r--r-- 1.2 KB
gradlew -rwxr-xr-x 5.0 KB
gradlew.bat -rw-r--r-- 2.3 KB
release_notes.txt -rw-r--r-- 43.2 KB
settings.gradle -rw-r--r-- 149 bytes
tomcat-tasks.properties -rw-r--r-- 420 bytes
web.xml -rw-r--r-- 1.1 KB

README.md

back to top