https://github.com/apache/spark
Revision 432ea6924142c9688d8b6c64b46a531810691a8c authored by Liupengcheng on 12 March 2019, 20:53:42 UTC, committed by Marcelo Vanzin on 12 March 2019, 21:13:20 UTC
There is a race condition in the `ExecutorAllocationManager` that the `SparkListenerExecutorRemoved` event is posted before the `SparkListenerTaskStart` event, which will cause the incorrect result of `executorIds`. Then, when some executor idles, the real executors will be removed even actual executor number is equal to `minNumExecutors` due to the incorrect computation of `newExecutorTotal`(may greater than the `minNumExecutors`), thus may finally causing zero available executors but a wrong positive number of executorIds was kept in memory. What's more, even the `SparkListenerTaskEnd` event can not make the fake `executorIds` released, because later idle event for the fake executors can not cause the real removal of these executors, as they are already removed and they are not exist in the `executorDataMap` of `CoaseGrainedSchedulerBackend`, so that the `onExecutorRemoved` method will never be called again. For details see https://issues.apache.org/jira/browse/SPARK-26927 This PR is to fix this problem. existUT and added UT Closes #23842 from liupc/Fix-race-condition-that-casues-dyanmic-allocation-not-working. Lead-authored-by: Liupengcheng <liupengcheng@xiaomi.com> Co-authored-by: liupengcheng <liupengcheng@xiaomi.com> Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com> (cherry picked from commit d5cfe08fdc7ad07e948f329c0bdeeca5c2574a18) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
1 parent dba5bac
Tip revision: 432ea6924142c9688d8b6c64b46a531810691a8c authored by Liupengcheng on 12 March 2019, 20:53:42 UTC
[SPARK-26927][CORE] Ensure executor is active when processing events in dynamic allocation manager.
[SPARK-26927][CORE] Ensure executor is active when processing events in dynamic allocation manager.
Tip revision: 432ea69
File | Mode | Size |
---|---|---|
.github | ||
R | ||
assembly | ||
bin | ||
build | ||
common | ||
conf | ||
core | ||
data | ||
dev | ||
docs | ||
examples | ||
external | ||
graphx | ||
hadoop-cloud | ||
launcher | ||
licenses | ||
licenses-binary | ||
mllib | ||
mllib-local | ||
project | ||
python | ||
repl | ||
resource-managers | ||
sbin | ||
sql | ||
streaming | ||
tools | ||
.gitattributes | -rw-r--r-- | 40 bytes |
.gitignore | -rw-r--r-- | 1.3 KB |
CONTRIBUTING.md | -rw-r--r-- | 995 bytes |
LICENSE | -rw-r--r-- | 13.0 KB |
LICENSE-binary | -rw-r--r-- | 20.9 KB |
NOTICE | -rw-r--r-- | 1.5 KB |
NOTICE-binary | -rw-r--r-- | 41.9 KB |
README.md | -rw-r--r-- | 3.9 KB |
appveyor.yml | -rw-r--r-- | 2.2 KB |
pom.xml | -rw-r--r-- | 100.6 KB |
scalastyle-config.xml | -rw-r--r-- | 18.0 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...