https://github.com/apache/spark
Revision 46a9018b3b29c36f33e4113984a7f43f91ac12fc authored by Jungtaek Lim (HeartSaVioR) on 25 January 2019, 22:58:03 UTC, committed by Dongjoon Hyun on 25 January 2019, 23:25:38 UTC
## What changes were proposed in this pull request?

This patch proposes to fix issue on adding `current_timestamp` / `current_date` with streaming query.

The root reason is that Spark transforms `CurrentTimestamp`/`CurrentDate` to `CurrentBatchTimestamp` in MicroBatchExecution which makes transformed attributes not-yet-resolved. They will be resolved by IncrementalExecution.
(In ContinuousExecution, Spark doesn't allow using `current_timestamp` and `current_date` so it has been OK.)

It's OK for DataSource V1 sink because it simply leverages transformed logical plan and don't evaluate until they're resolved, but for DataSource V2 sink, Spark tries to extract the schema of transformed logical plan in prior to IncrementalExecution, and unresolved attributes will raise errors.

This patch fixes the issue via having separate pre-resolved logical plan to pass the schema to StreamingWriteSupport safely.

## How was this patch tested?

Added UT.

Closes #23609 from HeartSaVioR/SPARK-26379.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent 08b6379
History
Tip revision: 46a9018b3b29c36f33e4113984a7f43f91ac12fc authored by Jungtaek Lim (HeartSaVioR) on 25 January 2019, 22:58:03 UTC
[SPARK-26379][SS] Fix issue on adding current_timestamp/current_date to streaming query
Tip revision: 46a9018

README.md

back to top