https://github.com/apache/spark
Revision 9caf3a9659821a3b4fd4394c9f4134adff9caf88 authored by Hossein on 26 July 2014, 08:04:56 UTC, committed by Matei Zaharia on 26 July 2014, 08:05:05 UTC
The current default value of spark.serializer.objectStreamReset is 10,000.
When trying to re-partition (e.g., to 64 partitions) a large file (e.g., 500MB), containing 1MB records, the serializer will cache 10000 x 1MB x 64 ~= 640 GB which will cause out of memory errors.

This patch sets the default value to a more reasonable default value (100).

Author: Hossein <hossein@databricks.com>

Closes #1595 from falaki/objectStreamReset and squashes the following commits:

650a935 [Hossein] Updated documentation
1aa0df8 [Hossein] Reduce default value of spark.serializer.objectStreamReset

(cherry picked from commit 66f26a4610aede57322cb7e193a50aecb6c57d22)
Signed-off-by: Matei Zaharia <matei@databricks.com>
1 parent 1d4103a
History
Tip revision: 9caf3a9659821a3b4fd4394c9f4134adff9caf88 authored by Hossein on 26 July 2014, 08:04:56 UTC
[SPARK-2696] Reduce default value of spark.serializer.objectStreamReset
Tip revision: 9caf3a9

README.md

back to top