https://github.com/apache/spark
Revision 9caf3a9659821a3b4fd4394c9f4134adff9caf88 authored by Hossein on 26 July 2014, 08:04:56 UTC, committed by Matei Zaharia on 26 July 2014, 08:05:05 UTC
The current default value of spark.serializer.objectStreamReset is 10,000. When trying to re-partition (e.g., to 64 partitions) a large file (e.g., 500MB), containing 1MB records, the serializer will cache 10000 x 1MB x 64 ~= 640 GB which will cause out of memory errors. This patch sets the default value to a more reasonable default value (100). Author: Hossein <hossein@databricks.com> Closes #1595 from falaki/objectStreamReset and squashes the following commits: 650a935 [Hossein] Updated documentation 1aa0df8 [Hossein] Reduce default value of spark.serializer.objectStreamReset (cherry picked from commit 66f26a4610aede57322cb7e193a50aecb6c57d22) Signed-off-by: Matei Zaharia <matei@databricks.com>
1 parent 1d4103a
Tip revision: 9caf3a9659821a3b4fd4394c9f4134adff9caf88 authored by Hossein on 26 July 2014, 08:04:56 UTC
[SPARK-2696] Reduce default value of spark.serializer.objectStreamReset
[SPARK-2696] Reduce default value of spark.serializer.objectStreamReset
Tip revision: 9caf3a9
File | Mode | Size |
---|---|---|
assembly | ||
bagel | ||
bin | ||
conf | ||
core | ||
data | ||
dev | ||
docker | ||
docs | ||
ec2 | ||
examples | ||
external | ||
extras | ||
graphx | ||
mllib | ||
project | ||
python | ||
repl | ||
sbin | ||
sbt | ||
sql | ||
streaming | ||
tools | ||
yarn | ||
.gitignore | -rw-r--r-- | 810 bytes |
.rat-excludes | -rw-r--r-- | 671 bytes |
.travis.yml | -rw-r--r-- | 1.1 KB |
CHANGES.txt | -rw-r--r-- | 306.8 KB |
LICENSE | -rw-r--r-- | 29.3 KB |
NOTICE | -rw-r--r-- | 22.0 KB |
README.md | -rw-r--r-- | 4.1 KB |
make-distribution.sh | -rwxr-xr-x | 7.9 KB |
pom.xml | -rw-r--r-- | 35.2 KB |
scalastyle-config.xml | -rw-r--r-- | 7.5 KB |
tox.ini | -rw-r--r-- | 805 bytes |
Computing file changes ...