https://github.com/apache/spark
Revision 7658f77a613c91364c4b6c986e1861c7bd5487db authored by Tigran Manasyan on 08 February 2024, 12:29:09 UTC, committed by Wenchen Fan on 08 February 2024, 12:30:05 UTC
In current version `DataSource#checkAndGlobPathIfNecessary` qualifies paths via `Path#makeQualified` and `PartitioningAwareFileIndex` qualifies via `FileSystem#makeQualified`. Most `FileSystem` implementations simply delegate to `Path#makeQualified`, but others, like `HarFileSystem` contain fs-specific logic, that can produce different result. Such inconsistencies can lead to a situation, when spark can't find partitions of the source file, because qualified paths, built by `Path` and `FileSystem` are different. Therefore, for uniformity, the `FileSystem` path qualification should be used in `DataSource#checkAndGlobPathIfNecessary`. Allow users to read files from hadoop archives (.har) using DataFrameReader API No New tests were added in `DataSourceSuite` and `DataFrameReaderWriterSuite` No Closes #43463 from tigrulya-exe/SPARK-39910-use-fs-path-qualification. Authored-by: Tigran Manasyan <t.manasyan@arenadata.io> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b7edc5fac0f4e479cbc869d54a9490c553ba2613) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 77f8b38
Tip revision: 7658f77a613c91364c4b6c986e1861c7bd5487db authored by Tigran Manasyan on 08 February 2024, 12:29:09 UTC
[SPARK-39910][SQL] Delegate path qualification to filesystem during DataSource file path globbing
[SPARK-39910][SQL] Delegate path qualification to filesystem during DataSource file path globbing
Tip revision: 7658f77
File | Mode | Size |
---|---|---|
.github | ||
R | ||
assembly | ||
bin | ||
binder | ||
build | ||
common | ||
conf | ||
connector | ||
core | ||
data | ||
dev | ||
docs | ||
examples | ||
graphx | ||
hadoop-cloud | ||
launcher | ||
licenses | ||
licenses-binary | ||
mllib | ||
mllib-local | ||
project | ||
python | ||
repl | ||
resource-managers | ||
sbin | ||
sql | ||
streaming | ||
tools | ||
.asf.yaml | -rw-r--r-- | 1.3 KB |
.gitattributes | -rw-r--r-- | 130 bytes |
.gitignore | -rw-r--r-- | 1.8 KB |
CONTRIBUTING.md | -rw-r--r-- | 997 bytes |
LICENSE | -rw-r--r-- | 13.0 KB |
LICENSE-binary | -rw-r--r-- | 22.4 KB |
NOTICE | -rw-r--r-- | 2.0 KB |
NOTICE-binary | -rw-r--r-- | 56.5 KB |
README.md | -rw-r--r-- | 4.5 KB |
appveyor.yml | -rw-r--r-- | 2.8 KB |
pom.xml | -rw-r--r-- | 139.1 KB |
scalastyle-config.xml | -rw-r--r-- | 23.7 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...