https://github.com/apache/spark
Revision 90a14d58b4e87a603e35a9ab679f6049b10e9c7b authored by Hyukjin Kwon on 21 December 2018, 08:09:30 UTC, committed by Hyukjin Kwon on 21 December 2018, 08:10:14 UTC
## What changes were proposed in this pull request? Currently, even if I explicitly disable Hive support in SparkR session as below: ```r sparkSession <- sparkR.session("local[4]", "SparkR", Sys.getenv("SPARK_HOME"), enableHiveSupport = FALSE) ``` produces when the Hadoop version is not supported by our Hive fork: ``` java.lang.reflect.InvocationTargetException ... Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major version number: 3.1.1.3.1.0.0-78 at org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174) at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139) at org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100) at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368) ... 43 more Error in handleErrors(returnStatus, conn) : java.lang.ExceptionInInitializerError at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:193) at org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1116) at org.apache.spark.sql.api.r.SQLUtils$.getOrCreateSparkSession(SQLUtils.scala:52) at org.apache.spark.sql.api.r.SQLUtils.getOrCreateSparkSession(SQLUtils.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ``` The root cause is that: ``` SparkSession.hiveClassesArePresent ``` check if the class is loadable or not to check if that's in classpath but `org.apache.hadoop.hive.conf.HiveConf` has a check for Hadoop version as static logic which is executed right away. This throws an `IllegalArgumentException` and that's not caught: https://github.com/apache/spark/blob/36edbac1c8337a4719f90e4abd58d38738b2e1fb/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L1113-L1121 So, currently, if users have a Hive built-in Spark with unsupported Hadoop version by our fork (namely 3+), there's no way to use SparkR even though it could work. This PR just propose to change the order of bool comparison so that we can don't execute `SparkSession.hiveClassesArePresent` when: 1. `enableHiveSupport` is explicitly disabled 2. `spark.sql.catalogImplementation` is `in-memory` so that we **only** check `SparkSession.hiveClassesArePresent` when Hive support is explicitly enabled by short circuiting. ## How was this patch tested? It's difficult to write a test since we don't run tests against Hadoop 3 yet. See https://github.com/apache/spark/pull/21588. Manually tested. Closes #23356 from HyukjinKwon/SPARK-26422. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 305e9b5ad22b428501fd42d3730d73d2e09ad4c5) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent c29f7e2
Tip revision: 90a14d58b4e87a603e35a9ab679f6049b10e9c7b authored by Hyukjin Kwon on 21 December 2018, 08:09:30 UTC
[SPARK-26422][R] Support to disable Hive support in SparkR even for Hadoop versions unsupported by Hive fork
[SPARK-26422][R] Support to disable Hive support in SparkR even for Hadoop versions unsupported by Hive fork
Tip revision: 90a14d5
File | Mode | Size |
---|---|---|
.github | ||
R | ||
assembly | ||
bin | ||
build | ||
common | ||
conf | ||
core | ||
data | ||
dev | ||
docs | ||
examples | ||
external | ||
graphx | ||
hadoop-cloud | ||
launcher | ||
licenses | ||
licenses-binary | ||
mllib | ||
mllib-local | ||
project | ||
python | ||
repl | ||
resource-managers | ||
sbin | ||
sql | ||
streaming | ||
tools | ||
.gitattributes | -rw-r--r-- | 40 bytes |
.gitignore | -rw-r--r-- | 1.3 KB |
CONTRIBUTING.md | -rw-r--r-- | 995 bytes |
LICENSE | -rw-r--r-- | 13.0 KB |
LICENSE-binary | -rw-r--r-- | 20.9 KB |
NOTICE | -rw-r--r-- | 1.5 KB |
NOTICE-binary | -rw-r--r-- | 41.9 KB |
README.md | -rw-r--r-- | 3.9 KB |
appveyor.yml | -rw-r--r-- | 2.2 KB |
pom.xml | -rw-r--r-- | 100.4 KB |
scalastyle-config.xml | -rw-r--r-- | 18.0 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...