https://github.com/apache/spark
Revision 86bf93e65481b8fe5d7532ca6d4cd29cafc9e9dd authored by Shixiong Zhu on 29 February 2016, 19:02:45 UTC, committed by Andrew Or on 11 May 2016, 18:29:06 UTC
## What changes were proposed in this pull request?

Sometimes, network disconnection event won't be triggered for other potential race conditions that we may not have thought of, then the executor will keep sending heartbeats to driver and won't exit.

This PR adds a new configuration `spark.executor.heartbeat.maxFailures` to kill Executor when it's unable to heartbeat to the driver more than `spark.executor.heartbeat.maxFailures` times.

## How was this patch tested?

unit tests

Author: Shixiong Zhu <shixiong@databricks.com>

Closes #11401 from zsxwing/SPARK-13522.
1 parent c433c0a
History
Tip revision: 86bf93e65481b8fe5d7532ca6d4cd29cafc9e9dd authored by Shixiong Zhu on 29 February 2016, 19:02:45 UTC
[SPARK-13522][CORE] Executor should kill itself when it's unable to heartbeat to driver more than N times
Tip revision: 86bf93e

README.md

back to top