Revision cc087a3ac5591c43d6b861b69b10647594d21b89 authored by HyukjinKwon on 12 December 2019, 16:33:33 UTC, committed by Dongjoon Hyun on 12 December 2019, 16:33:33 UTC
### What changes were proposed in this pull request? This PR proposes to add `PushedFilters` into metadata to show the pushed filters in Parquet DSv2 implementation. In case of ORC, it is already added at https://github.com/apache/spark/pull/24719/files#diff-0fc82694b20da3cd2cbb07206920eef7R62-R64 ### Why are the changes needed? In order for users to be able to debug, and to match with ORC. ### Does this PR introduce any user-facing change? ```scala spark.range(10).write.mode("overwrite").parquet("/tmp/foo") spark.read.parquet("/tmp/foo").filter("5 > id").explain() ``` **Before:** ``` == Physical Plan == *(1) Project [id#20L] +- *(1) Filter (isnotnull(id#20L) AND (5 > id#20L)) +- *(1) ColumnarToRow +- BatchScan[id#20L] ParquetScan Location: InMemoryFileIndex[file:/tmp/foo], ReadSchema: struct<id:bigint> ``` **After:** ``` == Physical Plan == *(1) Project [id#13L] +- *(1) Filter (isnotnull(id#13L) AND (5 > id#13L)) +- *(1) ColumnarToRow +- BatchScan[id#13L] ParquetScan Location: InMemoryFileIndex[file:/tmp/foo], ReadSchema: struct<id:bigint>, PushedFilters: [IsNotNull(id), LessThan(id,5)] ``` ### How was this patch tested? Unittest were added and manually tested. Closes #26857 from HyukjinKwon/SPARK-30162. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
1 parent fd39b6d
File | Mode | Size |
---|---|---|
mvn | -rwxr-xr-x | 6.7 KB |
sbt | -rwxr-xr-x | 5.3 KB |
sbt-launch-lib.bash | -rwxr-xr-x | 5.0 KB |
spark-build-info | -rwxr-xr-x | 1.5 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...