Revision 25431d79f7daf2a68298701154eb505c2a4add80 authored by Jungtaek Lim (HeartSaVioR) on 06 December 2019, 05:46:28 UTC, committed by Shixiong Zhu on 06 December 2019, 05:46:28 UTC
### What changes were proposed in this pull request?

This patch prevents the cleanup operation in FileStreamSource if the source files belong to the FileStreamSink. This is needed because the output of FileStreamSink can be read with multiple Spark queries and queries will read the files based on the metadata log, which won't reflect the cleanup.

To simplify the logic, the patch only takes care of the case of when the source path without glob pattern refers to the output directory of FileStreamSink, via checking FileStreamSource to see whether it leverages metadata directory or not to list the source files.

### Why are the changes needed?

Without this patch, if end users turn on cleanup option with the path which is the output of FileStreamSink, there may be out of sync between metadata and available files which may break other queries reading the path.

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

Added UT.

Closes #26590 from HeartSaVioR/SPARK-29953.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>
1 parent 755d889
Raw File
CONTRIBUTING.md
## Contributing to Spark

*Before opening a pull request*, review the 
[Contributing to Spark guide](https://spark.apache.org/contributing.html). 
It lists steps that are required before creating a PR. In particular, consider:

- Is the change important and ready enough to ask the community to spend time reviewing?
- Have you searched for existing, related JIRAs and pull requests?
- Is this a new feature that can stand alone as a [third party project](https://spark.apache.org/third-party-projects.html) ?
- Is the change being proposed clearly explained and motivated?

When you contribute code, you affirm that the contribution is your original work and that you 
license the work to the project under the project's open source license. Whether or not you 
state this explicitly, by submitting any copyrighted material via pull request, email, or 
other means you agree to license the material under the project's open source license and 
warrant that you have the legal authority to do so.
back to top