Revision aba52cc0832dc24330807747ec057bdaeb5f3a7c authored by Boaz Leskes on 17 April 2014, 09:12:36 UTC, committed by Boaz Leskes on 18 April 2014, 16:58:27 UTC
When a replication operation (index/delete/update) fails to be executed properly, we fail the replica and allow master to allocate a new copy of it. At the moment, the node hosting the primary shard is responsible of notifying the master of a failed replica. However, if the replica shard is initializing (`POST_RECOVERY` state), we have a racing condition between the failed shard message and moving the shard into the `STARTED` state. If the latter happen first, master will fail to resolve the fail shard message. This commit builds on #5800 and fails the engine of the replica shard if a replication operation fails. This protects us against the above as the shard will reject the `STARTED` command from master. It also makes us more resilient to other racing conditions in this area. Closes #5847
1 parent b18114b
File | Mode | Size |
---|---|---|
.settings | ||
bin | ||
config | ||
dev-tools | ||
docs | ||
lib | ||
rest-api-spec | ||
src | ||
.gitignore | -rw-r--r-- | 816 bytes |
.travis.yml | -rw-r--r-- | 145 bytes |
CONTRIBUTING.md | -rw-r--r-- | 6.1 KB |
LICENSE.txt | -rw-r--r-- | 11.1 KB |
NOTICE.txt | -rw-r--r-- | 150 bytes |
README.textile | -rw-r--r-- | 8.2 KB |
TESTING.asciidoc | -rw-r--r-- | 6.9 KB |
core-signatures.txt | -rw-r--r-- | 2.6 KB |
pom.xml | -rw-r--r-- | 65.8 KB |
Computing file changes ...