Revision c8922d9145a9bc60c0f423a6c1b7d4f0bfa2e585 authored by Nicholas Chammas on 04 December 2019, 02:44:24 UTC, committed by HyukjinKwon on 04 December 2019, 02:44:24 UTC
### What changes were proposed in this pull request?

This PR is a follow-up to #24043 and cousin of #26730. It exposes the `mergeSchema` option directly in the ORC APIs.

### Why are the changes needed?

So the Python API matches the Scala API.

### Does this PR introduce any user-facing change?

Yes, it adds a new option directly in the ORC reader method signatures.

### How was this patch tested?

I tested this manually as follows:

```
>>> spark.range(3).write.orc('test-orc')
>>> spark.range(3).withColumnRenamed('id', 'name').write.orc('test-orc/nested')
>>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=True)
DataFrame[id: bigint, name: bigint]
>>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=False)
DataFrame[id: bigint]
>>> spark.conf.set('spark.sql.orc.mergeSchema', True)
>>> spark.read.orc('test-orc', recursiveFileLookup=True)
DataFrame[id: bigint, name: bigint]
>>> spark.read.orc('test-orc', recursiveFileLookup=True, mergeSchema=False)
DataFrame[id: bigint]
```

Closes #26755 from nchammas/SPARK-30113-ORC-mergeSchema.

Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
1 parent e766a32
History

README.md

back to top