Revision - cc087a3 - [SPARK-30162][SQL] Add PushedFilters to metadata in Parquet DSv2 [...]

Revision cc087a3ac5591c43d6b861b69b10647594d21b89 authored by HyukjinKwon on 12 December 2019, 16:33:33 UTC, committed by Dongjoon Hyun on 12 December 2019, 16:33:33 UTC

[SPARK-30162][SQL] Add PushedFilters to metadata in Parquet DSv2 implementation

### What changes were proposed in this pull request?

This PR proposes to add `PushedFilters` into metadata to show the pushed filters in Parquet DSv2 implementation. In case of ORC, it is already added at https://github.com/apache/spark/pull/24719/files#diff-0fc82694b20da3cd2cbb07206920eef7R62-R64

### Why are the changes needed?

In order for users to be able to debug, and to match with ORC.

### Does this PR introduce any user-facing change?

```scala
spark.range(10).write.mode("overwrite").parquet("/tmp/foo")
spark.read.parquet("/tmp/foo").filter("5 > id").explain()
```

**Before:**

```
== Physical Plan ==
*(1) Project [id#20L]
+- *(1) Filter (isnotnull(id#20L) AND (5 > id#20L))
   +- *(1) ColumnarToRow
      +- BatchScan[id#20L] ParquetScan Location: InMemoryFileIndex[file:/tmp/foo], ReadSchema: struct<id:bigint>
```

**After:**

```
== Physical Plan ==
*(1) Project [id#13L]
+- *(1) Filter (isnotnull(id#13L) AND (5 > id#13L))
   +- *(1) ColumnarToRow
      +- BatchScan[id#13L] ParquetScan Location: InMemoryFileIndex[file:/tmp/foo], ReadSchema: struct<id:bigint>, PushedFilters: [IsNotNull(id), LessThan(id,5)]
```

### How was this patch tested?
Unittest were added and manually tested.

Closes #26857 from HyukjinKwon/SPARK-30162.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

1 parent fd39b6d

Files
Changes

Permalinks

File	Mode	Size
.github
R
assembly
bin
build
common
conf
core
data
dev
docs
examples
external
graph
graphx
hadoop-cloud
launcher
licenses
licenses-binary
mllib
mllib-local
project
python
repl
resource-managers
sbin
sql
streaming
tools
.gitattributes	-rw-r--r--	40 bytes
.gitignore	-rw-r--r--	1.4 KB
CONTRIBUTING.md	-rw-r--r--	997 bytes
LICENSE	-rw-r--r--	13.1 KB
LICENSE-binary	-rw-r--r--	22.8 KB
NOTICE	-rw-r--r--	2.0 KB
NOTICE-binary	-rw-r--r--	56.3 KB
README.md	-rw-r--r--	4.6 KB
appveyor.yml	-rw-r--r--	3.0 KB
pom.xml	-rw-r--r--	111.9 KB
scalastyle-config.xml	-rw-r--r--	19.1 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

[SPARK-30162][SQL] Add PushedFilters to metadata in Parquet DSv2 implementation

README.md