Revision - c8922d9 - [SPARK-30113][SQL][PYTHON] Expose mergeSchema option in [...]

Revision c8922d9145a9bc60c0f423a6c1b7d4f0bfa2e585 authored by Nicholas Chammas on 04 December 2019, 02:44:24 UTC, committed by HyukjinKwon on 04 December 2019, 02:44:24 UTC

[SPARK-30113][SQL][PYTHON] Expose mergeSchema option in PySpark's ORC APIs

### What changes were proposed in this pull request?

This PR is a follow-up to #24043 and cousin of #26730. It exposes the `mergeSchema` option directly in the ORC APIs.

### Why are the changes needed?

So the Python API matches the Scala API.

### Does this PR introduce any user-facing change?

Yes, it adds a new option directly in the ORC reader method signatures.

### How was this patch tested?

I tested this manually as follows:

```
>>> spark.range(3).write.orc('test-orc')
>>> spark.range(3).withColumnRenamed('id', 'name').write.orc('test-orc/nested')
>>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=True)
DataFrame[id: bigint, name: bigint]
>>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=False)
DataFrame[id: bigint]
>>> spark.conf.set('spark.sql.orc.mergeSchema', True)
>>> spark.read.orc('test-orc', recursiveFileLookup=True)
DataFrame[id: bigint, name: bigint]
>>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=False)
DataFrame[id: bigint]
```

Closes #26755 from nchammas/SPARK-30113-ORC-mergeSchema.

Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>

1 parent e766a32

Files
Changes

Permalinks

File	Mode	Size
.github
R
assembly
bin
build
common
conf
core
data
dev
docs
examples
external
graph
graphx
hadoop-cloud
launcher
licenses
licenses-binary
mllib
mllib-local
project
python
repl
resource-managers
sbin
sql
streaming
tools
.gitattributes	-rw-r--r--	40 bytes
.gitignore	-rw-r--r--	1.4 KB
CONTRIBUTING.md	-rw-r--r--	997 bytes
LICENSE	-rw-r--r--	13.1 KB
LICENSE-binary	-rw-r--r--	22.8 KB
NOTICE	-rw-r--r--	2.0 KB
NOTICE-binary	-rw-r--r--	56.3 KB
README.md	-rw-r--r--	4.6 KB
appveyor.yml	-rw-r--r--	3.0 KB
pom.xml	-rw-r--r--	111.9 KB
scalastyle-config.xml	-rw-r--r--	19.1 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

[SPARK-30113][SQL][PYTHON] Expose mergeSchema option in PySpark's ORC APIs

README.md