Revision 4f69c98ae95681cf972fa6701c94dbbb28e40d80 authored by sychen on 09 September 2022, 21:36:39 UTC, committed by Dongjoon Hyun on 09 September 2022, 21:36:39 UTC
### What changes were proposed in this pull request?
Increase ORC test coverage.
[ORC-1205](https://issues.apache.org/jira/browse/ORC-1205) Size of batches in some ConvertTreeReaders should be ensured before using

### Why are the changes needed?

When spark reads an orc with type promotion, an `ArrayIndexOutOfBoundsException` may be thrown, which has been fixed in version 1.7.6 and 1.8.0.

```java
java.lang.ArrayIndexOutOfBoundsException: 1
        at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextVector(TreeReaderFactory.java:387)
        at org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:740)
        at org.apache.orc.impl.ConvertTreeReaderFactory$StringGroupFromAnyIntegerTreeReader.nextVector(ConvertTreeReaderFactory.java:1069)
        at org.apache.orc.impl.reader.tree.StructBatchReader.readBatchColumn(StructBatchReader.java:65)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add UT

Closes #37808 from cxzl25/SPARK-39830-3.3.

Authored-by: sychen <sychen@ctrip.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent aaa8292
Raw File
.gitignore
*#*#
*.#*
*.iml
*.ipr
*.iws
*.pyc
*.pyo
*.swp
*~
.java-version
.DS_Store
.ammonite
.bloop
.bsp/
.cache
.classpath
.ensime
.ensime_cache/
.ensime_lucene
.generated-mima*
# All the files under .idea/ are ignore. To add new files under ./idea that are not in the VCS yet, please use `git add -f`
.idea/
# SPARK-35223: Add IssueNavigationLink to make IDEA support hyperlink on JIRA Ticket and GitHub PR on Git plugin.
!.idea/vcs.xml
.idea_modules/
.metals
.project
.pydevproject
.scala_dependencies
.settings
.vscode
/lib/
R-unit-tests.log
R/unit-tests.out
R/cran-check.out
R/pkg/vignettes/sparkr-vignettes.html
R/pkg/tests/fulltests/Rplots.pdf
build/*.jar
build/apache-maven*
build/scala*
cache
checkpoint
conf/*.cmd
conf/*.conf
conf/*.properties
conf/*.sh
conf/*.xml
conf/java-opts
conf/slaves
dependency-reduced-pom.xml
derby.log
dev/create-release/*final
dev/create-release/*txt
dev/pr-deps/
dist/
docs/_site/
docs/api
docs/.local_ruby_bundle
sql/docs
sql/site
lib_managed/
lint-r-report.log
lint-js-report.log
log/
logs/
metals.sbt
out/
project/boot/
project/build/target/
project/plugins/lib_managed/
project/plugins/project/build.properties
project/plugins/src_managed/
project/plugins/target/
python/lib/pyspark.zip
python/.eggs/
python/coverage.xml
python/deps
python/docs/_site/
python/docs/source/reference/**/api/
python/test_coverage/coverage_data
python/test_coverage/htmlcov
python/pyspark/python
.mypy_cache/
reports/
scalastyle-on-compile.generated.xml
scalastyle-output.xml
scalastyle.txt
spark-*-bin-*.tgz
spark-tests.log
src_managed/
streaming-tests.log
target/
unit-tests.log
work/
docs/.jekyll-metadata
docs/.jekyll-cache

# For Hive
TempStatsStore/
metastore/
metastore_db/
sql/hive-thriftserver/test_warehouses
warehouse/
spark-warehouse/

# For R session data
.RData
.RHistory
.Rhistory
*.Rproj
*.Rproj.*

.Rproj.user

# For SBT
.jvmopts

# For Node.js
node_modules

# For Antlr
sql/catalyst/gen/
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseLexer.tokens
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/gen/
back to top