Revision - 46a9018 - [SPARK-26379][SS] Fix issue on adding [...] - origin: https://github.com/apache/spark

visit type:

https://github.com/apache/spark

17 June 2024, 00:28:55 UTC

Revision 46a9018b3b29c36f33e4113984a7f43f91ac12fc authored by Jungtaek Lim (HeartSaVioR) on 25 January 2019, 22:58:03 UTC, committed by Dongjoon Hyun on 25 January 2019, 23:25:38 UTC

[SPARK-26379][SS] Fix issue on adding current_timestamp/current_date to streaming query

## What changes were proposed in this pull request?

This patch proposes to fix issue on adding `current_timestamp` / `current_date` with streaming query.

The root reason is that Spark transforms `CurrentTimestamp`/`CurrentDate` to `CurrentBatchTimestamp` in MicroBatchExecution which makes transformed attributes not-yet-resolved. They will be resolved by IncrementalExecution.
(In ContinuousExecution, Spark doesn't allow using `current_timestamp` and `current_date` so it has been OK.)

It's OK for DataSource V1 sink because it simply leverages transformed logical plan and don't evaluate until they're resolved, but for DataSource V2 sink, Spark tries to extract the schema of transformed logical plan in prior to IncrementalExecution, and unresolved attributes will raise errors.

This patch fixes the issue via having separate pre-resolved logical plan to pass the schema to StreamingWriteSupport safely.

## How was this patch tested?

Added UT.

Closes #23609 from HeartSaVioR/SPARK-26379.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

1 parent 08b6379

Files
Changes

Permalinks

Tip revision: 46a9018b3b29c36f33e4113984a7f43f91ac12fc authored by Jungtaek Lim (HeartSaVioR) on 25 January 2019, 22:58:03 UTC
[SPARK-26379][SS] Fix issue on adding current_timestamp/current_date to streaming query

Tip revision: 46a9018

File	Mode	Size
.github
R
assembly
bin
build
common
conf
core
data
dev
docs
examples
external
graphx
hadoop-cloud
launcher
licenses
licenses-binary
mllib
mllib-local
project
python
repl
resource-managers
sbin
sql
streaming
tools
.gitattributes	-rw-r--r--	40 bytes
.gitignore	-rw-r--r--	1.3 KB
CONTRIBUTING.md	-rw-r--r--	995 bytes
LICENSE	-rw-r--r--	13.0 KB
LICENSE-binary	-rw-r--r--	20.9 KB
NOTICE	-rw-r--r--	1.5 KB
NOTICE-binary	-rw-r--r--	41.9 KB
README.md	-rw-r--r--	3.9 KB
appveyor.yml	-rw-r--r--	2.2 KB
pom.xml	-rw-r--r--	100.6 KB
scalastyle-config.xml	-rw-r--r--	18.0 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

https://github.com/apache/spark

[SPARK-26379][SS] Fix issue on adding current_timestamp/current_date to streaming query

README.md