Revision cd51e2c321d793ac714566efb30f8957f9c5e117 authored by Eyal Farago on 12 October 2017, 12:56:04 UTC, committed by Herman van Hovell on 12 October 2017, 12:56:04 UTC
back-port #19181 to branch-2.2.

## What changes were proposed in this pull request?
1. a test reproducing [SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907)
2. a fix for the root cause of the issue.

`org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill` calls `org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset` which may trigger another spill,
when this happens the `array` member is already de-allocated but still referenced by the code, this causes the nested spill to fail with an NPE in `org.apache.spark.memory.TaskMemoryManager.getPage`.
This patch introduces a reproduction in a test case and a fix, the fix simply sets the in-mem sorter's array member to an empty array before actually performing the allocation. This prevents the spilling code from 'touching' the de-allocated array.

## How was this patch tested?
introduced a new test case: `org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorterSuite#testOOMDuringSpill`.

Author: Eyal Farago <eyal@nrgene.com>

Closes #19481 from eyalfa/SPARK-21907__oom_during_spill__BACKPORT-2.2.
1 parent c5889b5
Raw File
.gitignore
*#*#
*.#*
*.iml
*.ipr
*.iws
*.pyc
*.pyo
*.swp
*~
.DS_Store
.cache
.classpath
.ensime
.ensime_cache/
.ensime_lucene
.generated-mima*
.idea/
.idea_modules/
.project
.pydevproject
.scala_dependencies
.settings
/lib/
R-unit-tests.log
R/unit-tests.out
R/cran-check.out
R/pkg/vignettes/sparkr-vignettes.html
build/*.jar
build/apache-maven*
build/scala*
build/zinc*
cache
checkpoint
conf/*.cmd
conf/*.conf
conf/*.properties
conf/*.sh
conf/*.xml
conf/java-opts
conf/slaves
dependency-reduced-pom.xml
derby.log
dev/create-release/*final
dev/create-release/*txt
dev/pr-deps/
dist/
docs/_site
docs/api
lib_managed/
lint-r-report.log
log/
logs/
out/
project/boot/
project/build/target/
project/plugins/lib_managed/
project/plugins/project/build.properties
project/plugins/src_managed/
project/plugins/target/
python/lib/pyspark.zip
python/deps
python/pyspark/python
reports/
scalastyle-on-compile.generated.xml
scalastyle-output.xml
scalastyle.txt
spark-*-bin-*.tgz
spark-tests.log
src_managed/
streaming-tests.log
target/
unit-tests.log
work/

# For Hive
TempStatsStore/
metastore/
metastore_db/
sql/hive-thriftserver/test_warehouses
warehouse/
spark-warehouse/

# For R session data
.RData
.RHistory
.Rhistory
*.Rproj
*.Rproj.*

.Rproj.user
back to top