Revision - 4259a28 - [SPARK-15736][CORE][BRANCH-1.6] Gracefully handle loss of [...] - origin: https://github.com/apache/spark

visit type:

https://github.com/apache/spark

28 July 2024, 20:17:12 UTC

Revision 4259a28588a4dceb55d7bf1bf9327065dd751863 authored by Josh Rosen on 03 June 2016, 00:47:31 UTC, committed by Andrew Or on 03 June 2016, 00:47:31 UTC

[SPARK-15736][CORE][BRANCH-1.6] Gracefully handle loss of DiskStore files

If an RDD partition is cached on disk and the DiskStore file is lost, then reads of that cached partition will fail and the missing partition is supposed to be recomputed by a new task attempt. In the current BlockManager implementation, however, the missing file does not trigger any metadata updates / does not invalidate the cache, so subsequent task attempts will be scheduled on the same executor and the doomed read will be repeatedly retried, leading to repeated task failures and eventually a total job failure.

In order to fix this problem, the executor with the missing file needs to properly mark the corresponding block as missing so that it stops advertising itself as a cache location for that block.

This patch fixes this bug and adds an end-to-end regression test (in `FailureSuite`) and a set of unit tests (`in BlockManagerSuite`).

This is a branch-1.6 backport of #13473.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #13479 from JoshRosen/handle-missing-cache-files-branch-1.6.

1 parent 0a13e4c

Files
Changes

Permalinks

Tip revision: 4259a28588a4dceb55d7bf1bf9327065dd751863 authored by Josh Rosen on 03 June 2016, 00:47:31 UTC
[SPARK-15736][CORE][BRANCH-1.6] Gracefully handle loss of DiskStore files

Tip revision: 4259a28

File	Mode	Size
R
assembly
bagel
bin
build
conf
core
data
dev
docker
docker-integration-tests
docs
ec2
examples
external
extras
graphx
launcher
licenses
mllib
network
project
python
repl
sbin
sbt
sql
streaming
tags
tools
unsafe
yarn
.gitattributes	-rw-r--r--	40 bytes
.gitignore	-rw-r--r--	1.1 KB
.rat-excludes	-rw-r--r--	1.2 KB
CHANGES.txt	-rw-r--r--	1.3 MB
CONTRIBUTING.md	-rw-r--r--	988 bytes
LICENSE	-rw-r--r--	16.9 KB
NOTICE	-rw-r--r--	23.0 KB
README.md	-rw-r--r--	3.3 KB
make-distribution.sh	-rwxr-xr-x	8.4 KB
pom.xml	-rw-r--r--	89.7 KB
pylintrc	-rw-r--r--	13.7 KB
scalastyle-config.xml	-rw-r--r--	12.9 KB
tox.ini	-rw-r--r--	848 bytes

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

https://github.com/apache/spark

[SPARK-15736][CORE][BRANCH-1.6] Gracefully handle loss of DiskStore files

README.md