Revision 7076ab40f86fe606cd9b813dad506e921501383e authored by Yash Sharma on 16 May 2017, 22:08:05 UTC, committed by Burak Yavuz on 16 May 2017, 22:08:46 UTC
## What changes were proposed in this pull request?

The pull requests proposes to remove the hardcoded values for Amazon Kinesis - MIN_RETRY_WAIT_TIME_MS, MAX_RETRIES.

This change is critical for kinesis checkpoint recovery when the kinesis backed rdd is huge.
Following happens in a typical kinesis recovery :
- kinesis throttles large number of requests while recovering
- retries in case of throttling are not able to recover due to the small wait period
- kinesis throttles per second, the wait period should be configurable for recovery

The patch picks the spark kinesis configs from:
- spark.streaming.kinesis.retry.wait.time
- spark.streaming.kinesis.retry.max.attempts

Jira : https://issues.apache.org/jira/browse/SPARK-20140

## How was this patch tested?

Modified the KinesisBackedBlockRDDSuite.scala to run kinesis tests with the modified configurations. Wasn't able to test the patch with actual throttling.

Author: Yash Sharma <ysharma@atlassian.com>

Closes #17467 from yssharma/ysharma/spark-kinesis-retries.

(cherry picked from commit 38f4e8692ce3b6cbcfe0c1aff9b5e662f7a308b7)
Signed-off-by: Burak Yavuz <brkyvz@gmail.com>
1 parent 75e5ea2
History
File Mode Size
mvn -rwxr-xr-x 5.8 KB
sbt -rwxr-xr-x 5.3 KB
sbt-launch-lib.bash -rwxr-xr-x 5.0 KB
spark-build-info -rwxr-xr-x 1.5 KB

back to top