Revision 432ea6924142c9688d8b6c64b46a531810691a8c authored by Liupengcheng on 12 March 2019, 20:53:42 UTC, committed by Marcelo Vanzin on 12 March 2019, 21:13:20 UTC
There is a race condition in the `ExecutorAllocationManager` that the `SparkListenerExecutorRemoved` event is posted before the `SparkListenerTaskStart` event, which will cause the incorrect result of `executorIds`. Then, when some executor idles, the real executors will be removed even actual executor number is equal to `minNumExecutors` due to the incorrect computation of `newExecutorTotal`(may greater than the `minNumExecutors`), thus may finally causing zero available executors but a wrong positive number of executorIds was kept in memory.

What's more, even the `SparkListenerTaskEnd` event can not make the fake `executorIds` released, because later idle event for the fake executors can not cause the real removal of these executors, as they are already removed and they are not exist in the `executorDataMap`  of `CoaseGrainedSchedulerBackend`, so that the `onExecutorRemoved` method will never be called again.

For details see https://issues.apache.org/jira/browse/SPARK-26927

This PR is to fix this problem.

existUT and added UT

Closes #23842 from liupc/Fix-race-condition-that-casues-dyanmic-allocation-not-working.

Lead-authored-by: Liupengcheng <liupengcheng@xiaomi.com>
Co-authored-by: liupengcheng <liupengcheng@xiaomi.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(cherry picked from commit d5cfe08fdc7ad07e948f329c0bdeeca5c2574a18)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
1 parent dba5bac
History
File Mode Size
LICENSE-AnchorJS.txt -rw-r--r-- 1.1 KB
LICENSE-CC0.txt -rw-r--r-- 6.9 KB
LICENSE-antlr.txt -rw-r--r-- 1.5 KB
LICENSE-arpack.txt -rw-r--r-- 1.5 KB
LICENSE-automaton.txt -rw-r--r-- 1.4 KB
LICENSE-bootstrap.txt -rw-r--r-- 550 bytes
LICENSE-cloudpickle.txt -rw-r--r-- 1.6 KB
LICENSE-d3.min.js.txt -rw-r--r-- 1.4 KB
LICENSE-dagre-d3.txt -rw-r--r-- 1.0 KB
LICENSE-datatables.txt -rw-r--r-- 1.0 KB
LICENSE-f2j.txt -rw-r--r-- 1.5 KB
LICENSE-graphlib-dot.txt -rw-r--r-- 1.0 KB
LICENSE-heapq.txt -rw-r--r-- 14.2 KB
LICENSE-janino.txt -rw-r--r-- 1.6 KB
LICENSE-javassist.html -rw-r--r-- 25.1 KB
LICENSE-javolution.txt -rw-r--r-- 1.4 KB
LICENSE-jline.txt -rw-r--r-- 1.5 KB
LICENSE-jodd.txt -rw-r--r-- 1.3 KB
LICENSE-join.txt -rw-r--r-- 1.5 KB
LICENSE-jquery.txt -rw-r--r-- 1.1 KB
LICENSE-json-formatter.txt -rw-r--r-- 547 bytes
LICENSE-jtransforms.html -rw-r--r-- 28.7 KB
LICENSE-kryo.txt -rw-r--r-- 1.5 KB
LICENSE-leveldbjni.txt -rw-r--r-- 1.4 KB
LICENSE-machinist.txt -rw-r--r-- 1.0 KB
LICENSE-matchMedia-polyfill.txt -rw-r--r-- 149 bytes
LICENSE-minlog.txt -rw-r--r-- 1.5 KB
LICENSE-modernizr.txt -rw-r--r-- 1.1 KB
LICENSE-netlib.txt -rw-r--r-- 2.2 KB
LICENSE-paranamer.txt -rw-r--r-- 1.6 KB
LICENSE-pmml-model.txt -rw-r--r-- 1.5 KB
LICENSE-protobuf.txt -rw-r--r-- 2.1 KB
LICENSE-py4j.txt -rw-r--r-- 1.4 KB
LICENSE-pyrolite.txt -rw-r--r-- 1.3 KB
LICENSE-reflectasm.txt -rw-r--r-- 1.5 KB
LICENSE-respond.txt -rw-r--r-- 1.0 KB
LICENSE-sbt-launch-lib.txt -rw-r--r-- 1.5 KB
LICENSE-scala.txt -rw-r--r-- 1.5 KB
LICENSE-scopt.txt -rw-r--r-- 1.1 KB
LICENSE-slf4j.txt -rw-r--r-- 1.1 KB
LICENSE-sorttable.js.txt -rw-r--r-- 937 bytes
LICENSE-spire.txt -rw-r--r-- 1.0 KB
LICENSE-vis.txt -rw-r--r-- 406 bytes
LICENSE-xmlenc.txt -rw-r--r-- 1.5 KB
LICENSE-zstd-jni.txt -rw-r--r-- 1.3 KB
LICENSE-zstd.txt -rw-r--r-- 1.5 KB

back to top