Revision 25431d79f7daf2a68298701154eb505c2a4add80 authored by Jungtaek Lim (HeartSaVioR) on 06 December 2019, 05:46:28 UTC, committed by Shixiong Zhu on 06 December 2019, 05:46:28 UTC
### What changes were proposed in this pull request?

This patch prevents the cleanup operation in FileStreamSource if the source files belong to the FileStreamSink. This is needed because the output of FileStreamSink can be read with multiple Spark queries and queries will read the files based on the metadata log, which won't reflect the cleanup.

To simplify the logic, the patch only takes care of the case of when the source path without glob pattern refers to the output directory of FileStreamSink, via checking FileStreamSource to see whether it leverages metadata directory or not to list the source files.

### Why are the changes needed?

Without this patch, if end users turn on cleanup option with the path which is the output of FileStreamSink, there may be out of sync between metadata and available files which may break other queries reading the path.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Added UT.

Closes #26590 from HeartSaVioR/SPARK-29953.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
1 parent 755d889
History
File Mode Size
create-release
deps
sparktestsupport
tests
.gitignore -rw-r--r-- 25 bytes
.rat-excludes -rw-r--r-- 1.8 KB
.scalafmt.conf -rw-r--r-- 988 bytes
README.md -rw-r--r-- 197 bytes
appveyor-guide.md -rw-r--r-- 8.7 KB
appveyor-install-dependencies.ps1 -rw-r--r-- 3.8 KB
change-scala-version.sh -rwxr-xr-x 2.0 KB
check-license -rwxr-xr-x 2.5 KB
checkstyle-suppressions.xml -rw-r--r-- 2.4 KB
checkstyle.xml -rw-r--r-- 8.1 KB
github_jira_sync.py -rwxr-xr-x 7.2 KB
lint-java -rwxr-xr-x 1.2 KB
lint-python -rwxr-xr-x 8.3 KB
lint-r -rwxr-xr-x 1.4 KB
lint-r.R -rw-r--r-- 1.5 KB
lint-scala -rwxr-xr-x 925 bytes
make-distribution.sh -rwxr-xr-x 8.7 KB
merge_spark_pr.py -rwxr-xr-x 23.1 KB
mima -rwxr-xr-x 1.7 KB
pip-sanity-check.py -rw-r--r-- 1.3 KB
requirements.txt -rw-r--r-- 85 bytes
run-pip-tests -rwxr-xr-x 4.4 KB
run-tests -rwxr-xr-x 1.1 KB
run-tests-jenkins -rwxr-xr-x 1.4 KB
run-tests-jenkins.py -rwxr-xr-x 9.6 KB
run-tests.py -rwxr-xr-x 26.2 KB
sbt-checkstyle -rwxr-xr-x 1.3 KB
scalafmt -rwxr-xr-x 992 bytes
scalastyle -rwxr-xr-x 1.4 KB
test-dependencies.sh -rwxr-xr-x 4.1 KB
tox.ini -rw-r--r-- 1.2 KB

README.md

back to top