Revision - 30d5c9f - [SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark schema to [...] - origin: https://github.com/apache/spark

visit type:

https://github.com/apache/spark

05 April 2024, 20:24:39 UTC

Revision 30d5c9fd8ae1944a94ddedae83433368a02e55e6 authored by Dongjoon Hyun on 13 October 2017, 15:09:12 UTC, committed by Wenchen Fan on 13 October 2017, 15:11:50 UTC

[SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark schema to read ORC table instead of ORC file schema

Before Hive 2.0, ORC File schema has invalid column names like `_col1` and `_col2`. This is a well-known limitation and there are several Apache Spark issues with `spark.sql.hive.convertMetastoreOrc=true`. This PR ignores ORC File schema and use Spark schema.

Pass the newly added test case.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #19470 from dongjoon-hyun/SPARK-18355.

(cherry picked from commit e6e36004afc3f9fc8abea98542248e9de11b4435)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>

1 parent c9187db

Files
Changes

Permalinks

Tip revision: 30d5c9fd8ae1944a94ddedae83433368a02e55e6 authored by Dongjoon Hyun on 13 October 2017, 15:09:12 UTC
[SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark schema to read ORC table instead of ORC file schema

Tip revision: 30d5c9f

File	Mode	Size
.github
R
assembly
bin
build
common
conf
core
data
dev
docs
examples
external
graphx
launcher
licenses
mllib
mllib-local
project
python
repl
resource-managers
sbin
sql
streaming
tools
.gitattributes	-rw-r--r--	40 bytes
.gitignore	-rw-r--r--	1.2 KB
.travis.yml	-rw-r--r--	1.7 KB
CONTRIBUTING.md	-rw-r--r--	995 bytes
LICENSE	-rw-r--r--	17.5 KB
NOTICE	-rw-r--r--	24.1 KB
README.md	-rw-r--r--	3.7 KB
appveyor.yml	-rw-r--r--	1.9 KB
pom.xml	-rw-r--r--	94.8 KB
scalastyle-config.xml	-rw-r--r--	17.4 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

https://github.com/apache/spark

[SPARK-14387][SPARK-16628][SPARK-18355][SQL] Use Spark schema to read ORC table instead of ORC file schema

README.md