https://github.com/apache/spark
Revision 9ed64048a740fbcd15d2b830b1edbb728f87c423 authored by Sergei Lebedev on 25 October 2017, 21:15:44 UTC, committed by Wenchen Fan on 25 October 2017, 21:17:40 UTC
Prior to this commit getAllBlocks implicitly assumed that the directories
managed by the DiskBlockManager contain only the files corresponding to
valid block IDs. In reality, this assumption was violated during shuffle,
which produces temporary files in the same directory as the resulting
blocks. As a result, calls to getAllBlocks during shuffle were unreliable.

The fix could be made more efficient, but this is probably good enough.

`DiskBlockManagerSuite`

Author: Sergei Lebedev <s.lebedev@criteo.com>

Closes #19458 from superbobry/block-id-option.

(cherry picked from commit b377ef133cdc38d49b460b2cc6ece0b5892804cc)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 4c1a868
History
Tip revision: 9ed64048a740fbcd15d2b830b1edbb728f87c423 authored by Sergei Lebedev on 25 October 2017, 21:15:44 UTC
[SPARK-22227][CORE] DiskBlockManager.getAllBlocks now tolerates temp files
Tip revision: 9ed6404

README.md

back to top