Revision - 90a14d5 - [SPARK-26422][R] Support to disable Hive support in SparkR even [...] - origin: https://github.com/apache/spark

visit type:

https://github.com/apache/spark

17 June 2024, 00:28:55 UTC

Revision 90a14d58b4e87a603e35a9ab679f6049b10e9c7b authored by Hyukjin Kwon on 21 December 2018, 08:09:30 UTC, committed by Hyukjin Kwon on 21 December 2018, 08:10:14 UTC

[SPARK-26422][R] Support to disable Hive support in SparkR even for Hadoop versions unsupported by Hive fork

## What changes were proposed in this pull request?

Currently,  even if I explicitly disable Hive support in SparkR session as below:

```r
sparkSession <- sparkR.session("local[4]", "SparkR", Sys.getenv("SPARK_HOME"),
                               enableHiveSupport = FALSE)
```

produces when the Hadoop version is not supported by our Hive fork:

```
java.lang.reflect.InvocationTargetException
...
Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.1.3.1.0.0-78
	at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
	at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
	at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
	at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
	... 43 more
Error in handleErrors(returnStatus, conn) :
  java.lang.ExceptionInInitializerError
	at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:193)
	at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1116)
	at org.apache.spark.sql.api.r.SQLUtils$.getOrCreateSparkSession(SQLUtils.scala:52)
	at org.apache.spark.sql.api.r.SQLUtils.getOrCreateSparkSession(SQLUtils.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
```

The root cause is that:

```
SparkSession.hiveClassesArePresent
```

check if the class is loadable or not to check if that's in classpath but `org.apache.hadoop.hive.conf.HiveConf` has a check for Hadoop version as static logic which is executed right away. This throws an `IllegalArgumentException` and that's not caught:

https://github.com/apache/spark/blob/36edbac1c8337a4719f90e4abd58d38738b2e1fb/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L1113-L1121

So, currently, if users have a Hive built-in Spark with unsupported Hadoop version by our fork (namely 3+), there's no way to use SparkR even though it could work.

This PR just propose to change the order of bool comparison so that we can don't execute `SparkSession.hiveClassesArePresent` when:

  1. `enableHiveSupport` is explicitly disabled
  2. `spark.sql.catalogImplementation` is `in-memory`

so that we **only** check `SparkSession.hiveClassesArePresent` when Hive support is explicitly enabled by short circuiting.

## How was this patch tested?

It's difficult to write a test since we don't run tests against Hadoop 3 yet. See https://github.com/apache/spark/pull/21588. Manually tested.

Closes #23356 from HyukjinKwon/SPARK-26422.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 305e9b5ad22b428501fd42d3730d73d2e09ad4c5)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

1 parent c29f7e2

Files
Changes

Permalinks

Tip revision: 90a14d58b4e87a603e35a9ab679f6049b10e9c7b authored by Hyukjin Kwon on 21 December 2018, 08:09:30 UTC
[SPARK-26422][R] Support to disable Hive support in SparkR even for Hadoop versions unsupported by Hive fork

Tip revision: 90a14d5

File	Mode	Size
.github
R
assembly
bin
build
common
conf
core
data
dev
docs
examples
external
graphx
hadoop-cloud
launcher
licenses
licenses-binary
mllib
mllib-local
project
python
repl
resource-managers
sbin
sql
streaming
tools
.gitattributes	-rw-r--r--	40 bytes
.gitignore	-rw-r--r--	1.3 KB
CONTRIBUTING.md	-rw-r--r--	995 bytes
LICENSE	-rw-r--r--	13.0 KB
LICENSE-binary	-rw-r--r--	20.9 KB
NOTICE	-rw-r--r--	1.5 KB
NOTICE-binary	-rw-r--r--	41.9 KB
README.md	-rw-r--r--	3.9 KB
appveyor.yml	-rw-r--r--	2.2 KB
pom.xml	-rw-r--r--	100.4 KB
scalastyle-config.xml	-rw-r--r--	18.0 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

https://github.com/apache/spark

[SPARK-26422][R] Support to disable Hive support in SparkR even for Hadoop versions unsupported by Hive fork

README.md