Revision - 075ae1e - [SPARK-29537][SQL] throw exception when user defined a wrong base path

Revision 075ae1eeaf198792650287cd5b3f607a05c574bf authored by wuyi on 03 December 2019, 09:02:50 UTC, committed by Wenchen Fan on 03 December 2019, 09:02:50 UTC

[SPARK-29537][SQL] throw exception when user defined a wrong base path

### What changes were proposed in this pull request?

When user defined a base path which is not an ancestor directory for all the input paths,
throw exception immediately.

### Why are the changes needed?

Assuming that we have a DataFrame[c1, c2] be written out in parquet and partitioned by c1.

When using `spark.read.parquet("/path/to/data/c1=1")` to read the data, we'll have a DataFrame with column c2 only.

But if we use `spark.read.option("basePath", "/path/from").parquet("/path/to/data/c1=1")` to
read the data, we'll have a DataFrame with column c1 and c2.

This's happens because a wrong base path does not actually work in `parsePartition()`, so paring would continue until it reaches a directory without "=".

And I think the result of the second read way doesn't make sense.

### Does this PR introduce any user-facing change?

Yes, with this change, user would hit `IllegalArgumentException ` when given a wrong base path while previous behavior doesn't.

### How was this patch tested?

Added UT.

Closes #26195 from Ngone51/dev-wrong-basePath.

Lead-authored-by: wuyi <ngone_5451@163.com>
Co-authored-by: wuyi <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

1 parent 4021354

Files
Changes

Permalinks

File	Mode	Size
src
pom.xml	-rw-r--r--	2.7 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...