https://github.com/apache/spark
Revision 7658f77a613c91364c4b6c986e1861c7bd5487db authored by Tigran Manasyan on 08 February 2024, 12:29:09 UTC, committed by Wenchen Fan on 08 February 2024, 12:30:05 UTC
In current version `DataSource#checkAndGlobPathIfNecessary` qualifies paths via `Path#makeQualified` and `PartitioningAwareFileIndex` qualifies via `FileSystem#makeQualified`. Most `FileSystem` implementations simply delegate to `Path#makeQualified`, but others, like `HarFileSystem` contain fs-specific logic, that can produce different result. Such inconsistencies can lead to a situation, when spark can't find partitions of the source file, because qualified paths, built by `Path` and `FileSystem` are different. Therefore, for uniformity, the `FileSystem` path qualification should be used in `DataSource#checkAndGlobPathIfNecessary`.

Allow users to read files from hadoop archives (.har) using DataFrameReader API

No

New tests were added in `DataSourceSuite` and `DataFrameReaderWriterSuite`

No

Closes #43463 from tigrulya-exe/SPARK-39910-use-fs-path-qualification.

Authored-by: Tigran Manasyan <t.manasyan@arenadata.io>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b7edc5fac0f4e479cbc869d54a9490c553ba2613)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 77f8b38
History
Tip revision: 7658f77a613c91364c4b6c986e1861c7bd5487db authored by Tigran Manasyan on 08 February 2024, 12:29:09 UTC
[SPARK-39910][SQL] Delegate path qualification to filesystem during DataSource file path globbing
Tip revision: 7658f77

README.md

back to top