Revision history – Software Heritage archive

Newer
Older

Revision	Author	Date	Message	Commit Date
a9f1809	Anton Okolnychyi	10 December 2019, 15:49:22 UTC	[SPARK-30206][SQL] Rename normalizeFilters in DataSourceStrategy to be generic ### What changes were proposed in this pull request? This PR renames `normalizeFilters` in `DataSourceStrategy` to be more generic as the logic is not specific to filters. ### Why are the changes needed? These changes are needed to support PR #26751. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests. Closes #26830 from aokolnychyi/rename-normalize-exprs. Authored-by: Anton Okolnychyi <aokolnychyi@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	10 December 2019, 15:49:22 UTC
1cac9b2	Huaxin Gao	10 December 2019, 15:33:06 UTC	[SPARK-29967][ML][PYTHON] KMeans support instance weighting ### What changes were proposed in this pull request? add weight support in KMeans ### Why are the changes needed? KMeans should support weighting ### Does this PR introduce any user-facing change? Yes. ```KMeans.setWeightCol``` ### How was this patch tested? Unit Tests Closes #26739 from huaxingao/spark-29967. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	10 December 2019, 15:33:06 UTC
aa9da93	yi.wu	10 December 2019, 12:56:21 UTC	[SPARK-30151][SQL] Issue better error message when user-specified schema mismatched ### What changes were proposed in this pull request? Issue better error message when user-specified schema and not match relation schema ### Why are the changes needed? Inspired by https://github.com/apache/spark/pull/25248#issuecomment-559594305, user could get a weird error message when type mapping behavior change between Spark schema and datasource schema(e.g. JDBC). Instead of saying "SomeProvider does not allow user-specified schemas.", we'd better tell user what is really happening here to make user be more clearly about the error. ### Does this PR introduce any user-facing change? Yes, user will see error message changes. ### How was this patch tested? Updated existed tests. Closes #26781 from Ngone51/dev-mismatch-schema. Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	10 December 2019, 12:56:21 UTC
be867e8	Takeshi Yamamuro	10 December 2019, 03:22:03 UTC	[SPARK-30196][BUILD] Bump lz4-java version to 1.7.0 ### What changes were proposed in this pull request? This pr intends to upgrade lz4-java from 1.6.0 to 1.7.0. ### Why are the changes needed? This release includes a performance bug (https://github.com/lz4/lz4-java/pull/143) fixed by JoshRosen and some improvements (e.g., LZ4 binary update). You can see the link below for the changes; https://github.com/lz4/lz4-java/blob/master/CHANGES.md#170 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #26823 from maropu/LZ4_1_7_0. Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	10 December 2019, 03:22:03 UTC
3d98c9f	Luan	10 December 2019, 01:57:32 UTC	[SPARK-30179][SQL][TESTS] Improve test in SingleSessionSuite ### What changes were proposed in this pull request? improve the temporary functions test in SingleSessionSuite by verifying the result in a query ### Why are the changes needed? ### Does this PR introduce any user-facing change? ### How was this patch tested? Closes #26812 from leoluan2009/SPARK-30179. Authored-by: Luan <xuluan@ebay.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	10 December 2019, 01:57:32 UTC
36fa198	Sean Owen	09 December 2019, 20:41:48 UTC	[SPARK-30158][SQL][CORE] Seq -> Array for sc.parallelize for 2.13 compatibility; remove WrappedArray ### What changes were proposed in this pull request? Use Seq instead of Array in sc.parallelize, with reference types. Remove usage of WrappedArray. ### Why are the changes needed? These both enable building on Scala 2.13. ### Does this PR introduce any user-facing change? None ### How was this patch tested? Existing tests Closes #26787 from srowen/SPARK-30158. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Sean Owen <srowen@gmail.com>	09 December 2019, 20:41:48 UTC
8a9cccf	Huaxin Gao	09 December 2019, 19:39:33 UTC	[SPARK-30146][ML][PYSPARK] Add setWeightCol to GBTs in PySpark ### What changes were proposed in this pull request? add ```setWeightCol``` and ```setMinWeightFractionPerNode``` in Python side of ```GBTClassifier``` and ```GBTRegressor``` ### Why are the changes needed? https://github.com/apache/spark/pull/25926 added ```setWeightCol``` and ```setMinWeightFractionPerNode``` in GBTs on scala side. This PR will add ```setWeightCol``` and ```setMinWeightFractionPerNode``` in GBTs on python side ### Does this PR introduce any user-facing change? Yes ### How was this patch tested? doc test Closes #26774 from huaxingao/spark-30146. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Sean Owen <srowen@gmail.com>	09 December 2019, 19:39:33 UTC
538b8d1	Jungtaek Lim (HeartSaVioR)	09 December 2019, 16:57:20 UTC	[SPARK-30159][SQL][FOLLOWUP] Fix lint-java via removing unnecessary imports ### What changes were proposed in this pull request? This patch fixes the Java code style violations in SPARK-30159 (#26788) which are caught by lint-java (Github Action caught it and I can reproduce it locally). Looks like Jenkins build may have different policy on checking Java style check or less accurate. ### Why are the changes needed? Java linter starts complaining. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? lint-java passed locally This closes #26819 Closes #26818 from HeartSaVioR/SPARK-30159-FOLLOWUP. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	09 December 2019, 16:57:20 UTC
729f43f	Luca Canali	09 December 2019, 14:55:30 UTC	[SPARK-27189][CORE] Add Executor metrics and memory usage instrumentation to the metrics system ## What changes were proposed in this pull request? This PR proposes to add instrumentation of memory usage via the Spark Dropwizard/Codahale metrics system. Memory usage metrics are available via the Executor metrics, recently implemented as detailed in https://issues.apache.org/jira/browse/SPARK-23206. Additional notes: This takes advantage of the metrics poller introduced in #23767. ## Why are the changes needed? Executor metrics bring have many useful insights on memory usage, in particular on the usage of storage memory and executor memory. This is useful for troubleshooting. Having the information in the metrics systems allows to add those metrics to Spark performance dashboards and study memory usage as a function of time, as in the example graph https://issues.apache.org/jira/secure/attachment/12962810/Example_dashboard_Spark_Memory_Metrics.PNG ## Does this PR introduce any user-facing change? Adds `ExecutorMetrics` source to publish executor metrics via the Dropwizard metrics system. Details of the available metrics in docs/monitoring.md Adds configuration parameter `spark.metrics.executormetrics.source.enabled` ## How was this patch tested? Tested on YARN cluster and with an existing setup for a Spark dashboard based on InfluxDB and Grafana. Closes #24132 from LucaCanali/memoryMetricsSource. Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Imran Rashid <irashid@cloudera.com>	09 December 2019, 14:55:30 UTC
a717d21	Gengliang Wang	09 December 2019, 13:19:08 UTC	[SPARK-30159][SQL][TESTS] Fix the method calls of `QueryTest.checkAnswer` ### What changes were proposed in this pull request? Before this PR, the method `checkAnswer` in Object `QueryTest` returns an optional string. It doesn't throw exceptions when errors happen. The actual exceptions are thrown in the trait `QueryTest`. However, there are some test suites(`StreamSuite`, `SessionStateSuite`, `BinaryFileFormatSuite`, etc.) that use the no-op method `QueryTest.checkAnswer` and expect it to fail test cases when the execution results don't match the expected answers. After this PR: 1. the method `checkAnswer` in Object `QueryTest` will fail tests on errors or unexpected results. 2. add a new method `getErrorMessageInCheckAnswer`, which is exactly the same as the previous version of `checkAnswer`. There are some test suites use this one to customize the test failure message. 3. for the test suites that extend the trait `QueryTest`, we should use the method `checkAnswer` directly, instead of calling the method from Object `QueryTest`. ### Why are the changes needed? We should fix these method calls to perform actual validations in test suites. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing unit tests. Closes #26788 from gengliangwang/fixCheckAnswer. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	09 December 2019, 13:19:08 UTC
c2f29d5	fuwhu	09 December 2019, 10:43:32 UTC	[SPARK-30138][SQL] Separate configuration key of max iterations for analyzer and optimizer ### What changes were proposed in this pull request? separate the configuration keys "spark.sql.optimizer.maxIterations" and "spark.sql.analyzer.maxIterations". ### Why are the changes needed? Currently, both Analyzer and Optimizer use conf "spark.sql.optimizer.maxIterations" to set the max iterations to run, which is a little confusing. It is clearer to add a new conf "spark.sql.analyzer.maxIterations" for analyzer max iterations. ### Does this PR introduce any user-facing change? no ### How was this patch tested? Existing unit tests. Closes #26766 from fuwhu/SPARK-30138. Authored-by: fuwhu <bestwwg@163.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	09 December 2019, 10:43:32 UTC
dcea7a4	Aman Omer	09 December 2019, 05:23:16 UTC	[SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or() ### What changes were proposed in this pull request? This PR introduces a method `expressionWithAlias` in class `FunctionRegistry` which is used to register function's constructor. Currently, `expressionWithAlias` is used to register `BoolAnd` & `BoolOr`. ### Why are the changes needed? Error message is wrong when alias name is used for `BoolAnd` & `BoolOr`. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Tested manually. For query, `select every('true');` Output before this PR, > Error in query: cannot resolve 'bool_and('true')' due to data type mismatch: Input to function 'bool_and' should have been boolean, but it's [string].; line 1 pos 7; After this PR, > Error in query: cannot resolve 'every('true')' due to data type mismatch: Input to function 'every' should have been boolean, but it's [string].; line 1 pos 7; Closes #26712 from amanomer/29883. Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	09 December 2019, 05:23:16 UTC
a57bbf2	HyukjinKwon	09 December 2019, 04:15:49 UTC	[SPARK-30164][TESTS][DOCS] Exclude Hive domain in Unidoc build explicitly ### What changes were proposed in this pull request? This PR proposes to exclude Unidoc checking in Hive domain. We don't publish this as a part of Spark documentation (see also https://github.com/apache/spark/blob/master/docs/_plugins/copy_api_dirs.rb#L30) and most of them are copy of Hive thrift server so that we can officially use Hive 2.3 release. It doesn't much make sense to check the documentation generation against another domain, and that we don't use in documentation publish. ### Why are the changes needed? To avoid unnecessary computation. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? By Jenkins: ``` ======================================================================== Building Spark ======================================================================== [info] Building Spark using SBT with these arguments: -Phadoop-2.7 -Phive-2.3 -Phive -Pmesos -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pspark-ganglia-lgpl -Pyarn test:package streaming-kinesis-asl-assembly/assembly ... ======================================================================== Building Unidoc API Documentation ======================================================================== [info] Building Spark unidoc using SBT with these arguments: -Phadoop-2.7 -Phive-2.3 -Phive -Pmesos -Pkubernetes -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pspark-ganglia-lgpl -Pyarn unidoc ... [info] Main Java API documentation successful. ... [info] Main Scala API documentation successful. ``` Closes #26800 from HyukjinKwon/do-not-merge. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	09 December 2019, 04:15:49 UTC
bca9de6	Pablo Langa	09 December 2019, 04:15:09 UTC	[SPARK-29922][SQL] SHOW FUNCTIONS should do multi-catalog resolution ### What changes were proposed in this pull request? Add ShowFunctionsStatement and make SHOW FUNCTIONS go through the same catalog/table resolution framework of v2 commands. We don’t have this methods in the catalog to implement an V2 command * catalog.listFunctions ### Why are the changes needed? It's important to make all the commands have the same table resolution behavior, to avoid confusing `SHOW FUNCTIONS LIKE namespace.function` ### Does this PR introduce any user-facing change? Yes. When running SHOW FUNCTIONS LIKE namespace.function Spark fails the command if the current catalog is set to a v2 catalog. ### How was this patch tested? Unit tests. Closes #26667 from planga82/feature/SPARK-29922_ShowFunctions_V2Catalog. Authored-by: Pablo Langa <soypab@gmail.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>	09 December 2019, 04:15:09 UTC
16f1b23	Dongjoon Hyun	07 December 2019, 20:58:00 UTC	[SPARK-30163][INFRA][FOLLOWUP] Make `.m2` directory for cold start without cache ### What changes were proposed in this pull request? This PR is a follow-up of https://github.com/apache/spark/pull/26793 and aims to initialize `~/.m2` directory. ### Why are the changes needed? In case of cache reset, `~/.m2` directory doesn't exist. It causes a failure. - `master` branch has a cache as of now. So, we missed this. - `branch-2.4` has no cache as of now, and we hit this failure. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This PR is tested against personal `branch-2.4`. - https://github.com/dongjoon-hyun/spark/pull/12 Closes #26794 from dongjoon-hyun/SPARK-30163-2. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	07 December 2019, 20:58:00 UTC
1068b8b	Dongjoon Hyun	07 December 2019, 20:04:10 UTC	[SPARK-30163][INFRA] Use Google Maven mirror in GitHub Action ### What changes were proposed in this pull request? This PR aims to use [Google Maven mirror](https://cloudplatform.googleblog.com/2015/11/faster-builds-for-Java-developers-with-Maven-Central-mirror.html) in `GitHub Action` jobs to improve the stability. ```xml <settings> <mirrors> <mirror> <id>google-maven-central</id> <name>GCS Maven Central mirror</name> <url>https://maven-central.storage-download.googleapis.com/repos/central/data/</url> <mirrorOf>central</mirrorOf> </mirror> </mirrors> </settings> ``` ### Why are the changes needed? Although we added Maven cache inside `GitHub Action`, the timeouts happen too frequently during access `artifact descriptor`. ``` [ERROR] Failed to execute goal on project spark-mllib_2.12: ... Failed to read artifact descriptor for ... ... Connection timed out (Read failed) -> [Help 1] ``` ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This PR is irrelevant to Jenkins. This is tested on the personal repository first. `GitHub Action` of this PR should pass. - https://github.com/dongjoon-hyun/spark/pull/11 Closes #26793 from dongjoon-hyun/SPARK-30163. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	07 December 2019, 20:04:10 UTC
e88d740	Kent Yao	07 December 2019, 06:03:51 UTC	[SPARK-30147][SQL] Trim the string when cast string type to booleans ### What changes were proposed in this pull request? Now, we trim the string when casting string value to those `canCast` types values, e.g. int, double, decimal, interval, date, timestamps, except for boolean. This behavior makes type cast and coercion inconsistency in Spark. Not fitting ANSI SQL standard either. ``` If TD is boolean, then Case: a) If SD is character string, then SV is replaced by TRIM ( BOTH ' ' FROM VE ) Case: i) If the rules for literal in Subclause 5.3, “literal”, can be applied to SV to determine a valid value of the data type TD, then let TV be that value. ii) Otherwise, an exception condition is raised: data exception — invalid character value for cast. b) If SD is boolean, then TV is SV ``` In this pull request, we trim all the whitespaces from both ends of the string before converting it to a bool value. This behavior is as same as others, but a bit different from sql standard, which trim only spaces. ### Why are the changes needed? Type cast/coercion consistency ### Does this PR introduce any user-facing change? yes, string with whitespaces in both ends will be trimmed before converted to booleans. e.g. `select cast('\t true' as boolean)` results `true` now, before this pr it's `null` ### How was this patch tested? add unit tests Closes #26776 from yaooqinn/SPARK-30147. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>	07 December 2019, 06:03:51 UTC
afc4fa0	Dongjoon Hyun	07 December 2019, 02:49:43 UTC	[SPARK-30156][BUILD] Upgrade Jersey from 2.29 to 2.29.1 ### What changes were proposed in this pull request? This PR aims to upgrade `Jersey` from 2.29 to 2.29.1. ### Why are the changes needed? This will bring several bug fixes and important dependency upgrades. - https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.29.1.html ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26785 from dongjoon-hyun/SPARK-30156. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	07 December 2019, 02:49:43 UTC
1e0037b	Dongjoon Hyun	07 December 2019, 01:59:10 UTC	[SPARK-30157][BUILD][TEST-HADOOP3.2][TEST-JAVA11] Upgrade Apache HttpCore from 4.4.10 to 4.4.12 ### What changes were proposed in this pull request? This PR aims to upgrade `Apache HttpCore` from 4.4.10 to 4.4.12. ### Why are the changes needed? `Apache HttpCore v4.4.11` is the first official release for JDK11. > This is a maintenance release that corrects a number of defects in non-blocking SSL session code that caused compatibility issues with TLSv1.3 protocol implementation shipped with Java 11. For the full release note, please see the following. - https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins. Closes #26786 from dongjoon-hyun/SPARK-30157. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	07 December 2019, 01:59:10 UTC
51aa7a9	Aman Omer	07 December 2019, 01:58:02 UTC	[SPARK-30148][SQL] Optimize writing plans if there is an analysis exception ### What changes were proposed in this pull request? Optimized QueryExecution.scala#writePlans(). ### Why are the changes needed? If any query fails in Analysis phase and gets AnalysisException, there is no need to execute further phases since those will return a same result i.e, AnalysisException. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually Closes #26778 from amanomer/optExplain. Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	07 December 2019, 01:58:02 UTC
a30ec19	Sean Owen	07 December 2019, 00:16:28 UTC	[SPARK-30155][SQL] Rename parse() to parseString() to avoid conflict in Scala 2.13 ### What changes were proposed in this pull request? Rename internal method LegacyTypeStringParser.parse() to parseString(). ### Why are the changes needed? In Scala 2.13, the parse() definition clashes with supertype declarations. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Existing tests. Closes #26784 from srowen/SPARK-30155. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	07 December 2019, 00:16:28 UTC
81996f9	Dongjoon Hyun	06 December 2019, 20:01:36 UTC	[SPARK-30152][INFRA] Enable Hadoop-2.7/JDK11 build at GitHub Action ### What changes were proposed in this pull request? This PR enables JDK11 build with `hadoop-2.7` profile at `GitHub Action`. BEFORE (6 jobs including one JDK11 job) ![before](https://user-images.githubusercontent.com/9700541/70342731-7763f300-180a-11ea-859f-69038b88451f.png) AFTER (7 jobs including two JDK11 jobs) ![after](https://user-images.githubusercontent.com/9700541/70342658-54d1da00-180a-11ea-9fba-507fc087dc62.png) ### Why are the changes needed? SPARK-29957 makes JDK11 test work with `hadoop-2.7` profile. We need to protect it. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? This is `GitHub Action` only PR. See the result of `GitHub Action` on this PR. Closes #26782 from dongjoon-hyun/SPARK-GHA-HADOOP-2.7. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	06 December 2019, 20:01:36 UTC
58be82a	wuyi	06 December 2019, 18:15:25 UTC	[SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE syntax ### What changes were proposed in this pull request? In this PR, we propose to use the value of `spark.sql.source.default` as the provider for `CREATE TABLE` syntax instead of `hive` in Spark 3.0. And to help the migration, we introduce a legacy conf `spark.sql.legacy.respectHiveDefaultProvider.enabled` and set its default to `false`. ### Why are the changes needed? 1. Currently, `CREATE TABLE` syntax use hive provider to create table while `DataFrameWriter.saveAsTable` API using the value of `spark.sql.source.default` as a provider to create table. It would be better to make them consistent. 2. User may gets confused in some cases. For example: ``` CREATE TABLE t1 (c1 INT) USING PARQUET; CREATE TABLE t2 (c1 INT); ``` In these two DDLs, use may think that `t2` should also use parquet as default provider since Spark always advertise parquet as the default format. However, it's hive in this case. On the other hand, if we omit the USING clause in a CTAS statement, we do pick parquet by default if `spark.sql.hive.convertCATS=true`: ``` CREATE TABLE t3 USING PARQUET AS SELECT 1 AS VALUE; CREATE TABLE t4 AS SELECT 1 AS VALUE; ``` And these two cases together can be really confusing. 3. Now, Spark SQL is very independent and popular. We do not need to be fully consistent with Hive's behavior. ### Does this PR introduce any user-facing change? Yes, before this PR, using `CREATE TABLE` syntax will use hive provider. But now, it use the value of `spark.sql.source.default` as its provider. ### How was this patch tested? Added tests in `DDLParserSuite` and `HiveDDlSuite`. Closes #26736 from Ngone51/dev-create-table-using-parquet-by-default. Lead-authored-by: wuyi <yi.wu@databricks.com> Co-authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	06 December 2019, 18:15:25 UTC
c1a5f94	Liang-Chi Hsieh	06 December 2019, 17:22:16 UTC	[SPARK-30112][SQL] Allow insert overwrite same table if using dynamic partition overwrite ### What changes were proposed in this pull request? This patch proposes to allow insert overwrite same table if using dynamic partition overwrite. ### Why are the changes needed? Currently, Insert overwrite cannot overwrite to same table even it is dynamic partition overwrite. But for dynamic partition overwrite, we do not delete partition directories ahead. We write to staging directories and move data to final partition directories. We should be able to insert overwrite to same table under dynamic partition overwrite. This enables users to read data from a table and insert overwrite to same table by using dynamic partition overwrite. Because this is not allowed for now, users need to write to other temporary location and move it back to the table. ### Does this PR introduce any user-facing change? Yes. Users can insert overwrite same table if using dynamic partition overwrite. ### How was this patch tested? Unit test. Closes #26752 from viirya/dynamic-overwrite-same-table. Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	06 December 2019, 17:22:16 UTC
c8ed71b	Sean Owen	06 December 2019, 16:15:38 UTC	[SPARK-30011][SQL] Inline 2.12 "AsIfIntegral" classes, not present in 2.13 ### What changes were proposed in this pull request? Classes like DoubleAsIfIntegral are not found in Scala 2.13, but used in the current build. This change 'inlines' the 2.12 implementation and makes it work with both 2.12 and 2.13. ### Why are the changes needed? To cross-compile with 2.13. ### Does this PR introduce any user-facing change? It should not as it copies in 2.12's existing behavior. ### How was this patch tested? Existing tests. Closes #26769 from srowen/SPARK-30011. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	06 December 2019, 16:15:38 UTC
1595e46	Dongjoon Hyun	06 December 2019, 14:41:59 UTC	[SPARK-30142][TEST-MAVEN][BUILD] Upgrade Maven to 3.6.3 ### What changes were proposed in this pull request? This PR aims to upgrade Maven from 3.6.2 to 3.6.3. ### Why are the changes needed? This will bring bug fixes like the following. - MNG-6759 Maven fails to use <repositories> section from dependency when resolving transitive dependencies in some cases - MNG-6760 ExclusionArtifactFilter result invalid when wildcard exclusion is followed by other exclusions The following is the full release note. - https://maven.apache.org/docs/3.6.3/release-notes.html ### Does this PR introduce any user-facing change? No. (This is a dev-environment change.) ### How was this patch tested? Pass the Jenkins with both SBT and Maven. Closes #26770 from dongjoon-hyun/SPARK-30142. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>	06 December 2019, 14:41:59 UTC
187f3c1	gengjiaan	06 December 2019, 08:07:38 UTC	[SPARK-28083][SQL] Support LIKE ... ESCAPE syntax ## What changes were proposed in this pull request? The syntax 'LIKE predicate: ESCAPE clause' is a ANSI SQL. For example: ``` select 'abcSpark_13sd' LIKE '%Spark\\_%'; //true select 'abcSpark_13sd' LIKE '%Spark/_%'; //false select 'abcSpark_13sd' LIKE '%Spark"_%'; //false select 'abcSpark_13sd' LIKE '%Spark/_%' ESCAPE '/'; //true select 'abcSpark_13sd' LIKE '%Spark"_%' ESCAPE '"'; //true select 'abcSpark%13sd' LIKE '%Spark\\%%'; //true select 'abcSpark%13sd' LIKE '%Spark/%%'; //false select 'abcSpark%13sd' LIKE '%Spark"%%'; //false select 'abcSpark%13sd' LIKE '%Spark/%%' ESCAPE '/'; //true select 'abcSpark%13sd' LIKE '%Spark"%%' ESCAPE '"'; //true select 'abcSpark\\13sd' LIKE '%Spark\\\\_%'; //true select 'abcSpark/13sd' LIKE '%Spark//_%'; //false select 'abcSpark"13sd' LIKE '%Spark""_%'; //false select 'abcSpark/13sd' LIKE '%Spark//_%' ESCAPE '/'; //true select 'abcSpark"13sd' LIKE '%Spark""_%' ESCAPE '"'; //true ``` But Spark SQL only supports 'LIKE predicate'. Note: If the input string or pattern string is null, then the result is null too. There are some mainstream database support the syntax. PostgreSQL: https://www.postgresql.org/docs/11/functions-matching.html Vertica: https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/LanguageElements/Predicates/LIKE-predicate.htm?zoom_highlight=like%20escape MySQL: https://dev.mysql.com/doc/refman/5.6/en/string-comparison-functions.html Oracle: https://docs.oracle.com/en/database/oracle/oracle-database/19/jjdbc/JDBC-reference-information.html#GUID-5D371A5B-D7F6-42EB-8C0D-D317F3C53708 https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/Pattern-matching-Conditions.html#GUID-0779657B-06A8-441F-90C5-044B47862A0A ## How was this patch tested? Exists UT and new UT. This PR merged to my production environment and runs above sql: ``` spark-sql> select 'abcSpark_13sd' LIKE '%Spark\\_%'; true Time taken: 0.119 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark_13sd' LIKE '%Spark/_%'; false Time taken: 0.103 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark_13sd' LIKE '%Spark"_%'; false Time taken: 0.096 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark_13sd' LIKE '%Spark/_%' ESCAPE '/'; true Time taken: 0.096 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark_13sd' LIKE '%Spark"_%' ESCAPE '"'; true Time taken: 0.092 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark%13sd' LIKE '%Spark\\%%'; true Time taken: 0.109 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark%13sd' LIKE '%Spark/%%'; false Time taken: 0.1 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark%13sd' LIKE '%Spark"%%'; false Time taken: 0.081 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark%13sd' LIKE '%Spark/%%' ESCAPE '/'; true Time taken: 0.095 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark%13sd' LIKE '%Spark"%%' ESCAPE '"'; true Time taken: 0.113 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark\\13sd' LIKE '%Spark\\\\_%'; true Time taken: 0.078 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark/13sd' LIKE '%Spark//_%'; false Time taken: 0.067 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark"13sd' LIKE '%Spark""_%'; false Time taken: 0.084 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark/13sd' LIKE '%Spark//_%' ESCAPE '/'; true Time taken: 0.091 seconds, Fetched 1 row(s) spark-sql> select 'abcSpark"13sd' LIKE '%Spark""_%' ESCAPE '"'; true Time taken: 0.091 seconds, Fetched 1 row(s) ``` I create a table and its schema is: ``` spark-sql> desc formatted gja_test; key string NULL value string NULL other string NULL # Detailed Table Information Database test Table gja_test Owner test Created Time Wed Apr 10 11:06:15 CST 2019 Last Access Thu Jan 01 08:00:00 CST 1970 Created By Spark 2.4.1-SNAPSHOT Type MANAGED Provider hive Table Properties [transient_lastDdlTime=1563443838] Statistics 26 bytes Location hdfs://namenode.xxx:9000/home/test/hive/warehouse/test.db/gja_test Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.TextInputFormat OutputFormat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Storage Properties [field.delim= , serialization.format= ] Partition Provider Catalog Time taken: 0.642 seconds, Fetched 21 row(s) ``` Table `gja_test` exists three rows of data. ``` spark-sql> select * from gja_test; a A ao b B bo "__ """__ " Time taken: 0.665 seconds, Fetched 3 row(s) ``` At finally, I test this function: ``` spark-sql> select * from gja_test where key like value escape '"'; "__ """__ " Time taken: 0.687 seconds, Fetched 1 row(s) ``` Closes #25001 from beliefer/ansi-sql-like. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>	06 December 2019, 08:07:38 UTC
b86d4bb	Terry Kim	06 December 2019, 07:45:13 UTC	[SPARK-30001][SQL] ResolveRelations should handle both V1 and V2 tables ### What changes were proposed in this pull request? This PR makes `Analyzer.ResolveRelations` responsible for looking up both v1 and v2 tables from the session catalog and create an appropriate relation. ### Why are the changes needed? Currently there are two issues: 1. As described in [SPARK-29966](https://issues.apache.org/jira/browse/SPARK-29966), the logic for resolving relation can load a table twice, which is a perf regression (e.g., Hive metastore can be accessed twice). 2. As described in [SPARK-30001](https://issues.apache.org/jira/browse/SPARK-30001), if a catalog name is specified for v1 tables, the query fails: ``` scala> sql("create table t using csv as select 1 as i") res2: org.apache.spark.sql.DataFrame = [] scala> sql("select * from t").show +---+ \| i\| +---+ \| 1\| +---+ scala> sql("select * from spark_catalog.t").show org.apache.spark.sql.AnalysisException: Table or view not found: spark_catalog.t; line 1 pos 14; 'Project [] +- 'UnresolvedRelation [spark_catalog, t] ``` ### Does this PR introduce any user-facing change? Yes. Now the catalog name is resolved correctly: ``` scala> sql("create table t using csv as select 1 as i") res0: org.apache.spark.sql.DataFrame = [] scala> sql("select from t").show +---+ \| i\| +---+ \| 1\| +---+ scala> sql("select * from spark_catalog.t").show +---+ \| i\| +---+ \| 1\| +---+ ``` ### How was this patch tested? Added new tests. Closes #26684 from imback82/resolve_relation. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	06 December 2019, 07:45:13 UTC
a5ccbce	madianjun	06 December 2019, 07:39:49 UTC	[SPARK-30067][CORE] Fix fragment offset comparison in getBlockHosts ### What changes were proposed in this pull request? A bug fixed about the code in getBlockHosts() function. In the case "The fragment ends at a position within this block", the end of fragment should be before the end of block，where the "end of block" means `b.getOffset + b.getLength`，not `b.getLength`. ### Why are the changes needed? When comparing the fragment end and the block end，we should use fragment's `offset + length`，and then compare to the block's `b.getOffset + b.getLength`, not the block's length. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? No test. Closes #26650 from mdianjun/fix-getBlockHosts. Authored-by: madianjun <madianjun@jd.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	06 December 2019, 07:39:49 UTC
da27f91	angerszhu	06 December 2019, 07:12:45 UTC	[SPARK-29957][TEST] Reset MiniKDC's default enctypes to fit jdk8/jdk11 ### What changes were proposed in this pull request? Hadoop jira: https://issues.apache.org/jira/browse/HADOOP-12911 In this jira, the author said to replace origin Apache Directory project which is not maintained (but not said it won't work well in jdk11) to Apache Kerby which is java binding(fit java version). And in Flink: https://github.com/apache/flink/pull/9622 Author show the reason why hadoop-2.7.2's `MminiKdc` failed with jdk11. Because new encryption types of `es128-cts-hmac-sha256-128` and `aes256-cts-hmac-sha384-192` (for Kerberos 5) enabled by default were added in Java 11. Spark with `hadoop-2.7's MiniKdc`does not support these encryption types and does not work well when these encryption types are enabled, which results in the authentication failure. And when I test hadoop-2.7.2's minikdc in local, the kerberos 's debug error message is read message stream failed, message can't match. ### Why are the changes needed? Support jdk11 with hadoop-2.7 ### Does this PR introduce any user-facing change? NO ### How was this patch tested? Existed UT Closes #26594 from AngersZhuuuu/minikdc-3.2.0. Lead-authored-by: angerszhu <angers.zhu@gmail.com> Co-authored-by: AngersZhuuuu <angers.zhu@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	06 December 2019, 07:12:45 UTC
25431d7	Jungtaek Lim (HeartSaVioR)	06 December 2019, 05:46:28 UTC	[SPARK-29953][SS] Don't clean up source files for FileStreamSource if the files belong to the output of FileStreamSink ### What changes were proposed in this pull request? This patch prevents the cleanup operation in FileStreamSource if the source files belong to the FileStreamSink. This is needed because the output of FileStreamSink can be read with multiple Spark queries and queries will read the files based on the metadata log, which won't reflect the cleanup. To simplify the logic, the patch only takes care of the case of when the source path without glob pattern refers to the output directory of FileStreamSink, via checking FileStreamSource to see whether it leverages metadata directory or not to list the source files. ### Why are the changes needed? Without this patch, if end users turn on cleanup option with the path which is the output of FileStreamSink, there may be out of sync between metadata and available files which may break other queries reading the path. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added UT. Closes #26590 from HeartSaVioR/SPARK-29953. Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com> Signed-off-by: Shixiong Zhu <zsxwing@gmail.com>	06 December 2019, 05:46:28 UTC
755d889	Liang-Chi Hsieh	06 December 2019, 00:32:33 UTC	[SPARK-24666][ML] Fix infinity vectors produced by Word2Vec when numIterations are large ### What changes were proposed in this pull request? This patch adds normalization to word vectors when fitting dataset in Word2Vec. ### Why are the changes needed? Running Word2Vec on some datasets, when numIterations is large, can produce infinity word vectors. ### Does this PR introduce any user-facing change? Yes. After this patch, Word2Vec won't produce infinity word vectors. ### How was this patch tested? Manually. This issue is not always reproducible on any dataset. The dataset known to reproduce it is too large (925M) to upload. ```scala case class Sentences(name: String, words: Array[String]) val dataset = spark.read .option("header", "true").option("sep", "\t") .option("quote", "").option("nullValue", "\\N") .csv("/tmp/title.akas.tsv") .filter("region = 'US' or language = 'en'") .select("title") .as[String] .map(s => Sentences(s, s.split(' '))) .persist() println("Training model...") val word2Vec = new Word2Vec() .setInputCol("words") .setOutputCol("vector") .setVectorSize(64) .setWindowSize(4) .setNumPartitions(50) .setMinCount(5) .setMaxIter(30) val model = word2Vec.fit(dataset) model.getVectors.show() ``` Before: ``` Training model... +-------------+--------------------+ \| word\| vector\| +-------------+--------------------+ \| Unspoken\|[-Infinity,-Infin...\| \| Talent\|[-Infinity,Infini...\| \| Hourglass\|[2.02805806500023...\| \|Nickelodeon's\|[-4.2918617120906...\| \| Priests\|[-1.3570403355926...\| \| Religion:\|[-6.7049072282803...\| \| Bu\|[5.05591774315586...\| \| Totoro:\|[-1.0539840178632...\| \| Trouble,\|[-3.5363592836003...\| \| Hatter\|[4.90413981352826...\| \| '79\|[7.50436471285412...\| \| Vile\|[-2.9147142985312...\| \| 9/11\|[-Infinity,Infini...\| \| Santino\|[1.30005911270850...\| \| Motives\|[-1.2538958306253...\| \| '13\|[-4.5040152427657...\| \| Fierce\|[Infinity,Infinit...\| \| Stover\|[-2.6326895394029...\| \| 'It\|[1.66574533864436...\| \| Butts\|[Infinity,Infinit...\| +-------------+--------------------+ only showing top 20 rows ``` After: ``` Training model... +-------------+--------------------+ \| word\| vector\| +-------------+--------------------+ \| Unspoken\|[-0.0454501919448...\| \| Talent\|[-0.2657704949378...\| \| Hourglass\|[-0.1399687677621...\| \|Nickelodeon's\|[-0.1767119318246...\| \| Priests\|[-0.0047509293071...\| \| Religion:\|[-0.0411605164408...\| \| Bu\|[0.11837736517190...\| \| Totoro:\|[0.05258282646536...\| \| Trouble,\|[0.09482011198997...\| \| Hatter\|[0.06040831282734...\| \| '79\|[0.04783720895648...\| \| Vile\|[-0.0017210749210...\| \| 9/11\|[-0.0713915303349...\| \| Santino\|[-0.0412711687386...\| \| Motives\|[-0.0492418706417...\| \| '13\|[-0.0073119504377...\| \| Fierce\|[-0.0565455369651...\| \| Stover\|[0.06938160210847...\| \| 'It\|[0.01117012929171...\| \| Butts\|[0.05374567210674...\| +-------------+--------------------+ only showing top 20 rows ``` Closes #26722 from viirya/SPARK-24666-2. Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Liang-Chi Hsieh <liangchi@uber.com>	06 December 2019, 00:32:33 UTC
7782b61	Sean Owen	05 December 2019, 21:48:29 UTC	[SPARK-29392][CORE][SQL][FOLLOWUP] Avoid deprecated (in 2.13) Symbol syntax 'foo in favor of simpler expression, where it generated deprecation warnings TL;DR - this is more of the same change in https://github.com/apache/spark/pull/26748 I told you it'd be iterative! Closes #26765 from srowen/SPARK-29392.3. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>	05 December 2019, 21:48:29 UTC
5892bbf	Aman Omer	05 December 2019, 17:54:45 UTC	[SPARK-30124][MLLIB] unnecessary persist in PythonMLLibAPI.scala ### What changes were proposed in this pull request? Removed unnecessary persist. ### Why are the changes needed? Persist in `PythonMLLibAPI.scala` is unnecessary because later in `run()` of `gmmAlg` is caching the data. https://github.com/apache/spark/blob/710ddab39e20f49e917311c3e27d142b5a2bcc71/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala#L167-L171 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually Closes #26758 from amanomer/improperPersist. Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	05 December 2019, 17:54:45 UTC
35bab33	Kent Yao	05 December 2019, 17:50:55 UTC	[SPARK-30121][BUILD] Fix memory usage in sbt build script ### What changes were proposed in this pull request? 1. the default memory setting is missing in usage instructions ``` build/sbt -h ``` before ``` -mem <integer> set memory options (default: , which is -Xms2048m -Xmx2048m -XX:ReservedCodeCacheSize=256m) ``` after ``` -mem <integer> set memory options (default: 2048, which is -Xms2048m -Xmx2048m -XX:ReservedCodeCacheSize=256m) ``` 2. the Perm space is not needed anymore, since java7 is removed. the changes in this pr are based on the main sbt script of the newest stable version 1.3.4. ### Why are the changes needed? bug fix ### Does this PR introduce any user-facing change? no ### How was this patch tested? manually Closes #26757 from yaooqinn/SPARK-30121. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>	05 December 2019, 17:50:55 UTC
b9cae37	Kent Yao	05 December 2019, 14:03:44 UTC	[SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres # What changes were proposed in this pull request? Add an analyzer rule to convert unresolved `Add`, `Subtract`, etc. to `TimeAdd`, `DateAdd`, etc. according to the following policy: ```scala /** * For [[Add]]: * 1. if both side are interval, stays the same; * 2. else if one side is interval, turns it to [[TimeAdd]]; * 3. else if one side is date, turns it to [[DateAdd]] ; * 4. else stays the same. * * For [[Subtract]]: * 1. if both side are interval, stays the same; * 2. else if the right side is an interval, turns it to [[TimeSub]]; * 3. else if one side is timestamp, turns it to [[SubtractTimestamps]]; * 4. else if the right side is date, turns it to [[DateDiff]]/[[SubtractDates]]; * 5. else if the left side is date, turns it to [[DateSub]]; * 6. else turns it to stays the same. * * For [[Multiply]]: * 1. If one side is interval, turns it to [[MultiplyInterval]]; * 2. otherwise, stays the same. * * For [[Divide]]: * 1. If the left side is interval, turns it to [[DivideInterval]]; * 2. otherwise, stays the same. */ ``` Besides, we change datetime functions from implicit cast types to strict ones, all available type coercions happen in `DateTimeOperations` coercion rule. ### Why are the changes needed? Feature Parity between PostgreSQL and Spark, and make the null semantic consistent with Spark. ### Does this PR introduce any user-facing change? 1. date_add/date_sub functions only accept int/tinynit/smallint as the second arg, double/string etc, are forbidden like hive, which produce weird results. ### How was this patch tested? add ut Closes #26412 from yaooqinn/SPARK-29774. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	05 December 2019, 14:03:44 UTC
332e252	Kent Yao	05 December 2019, 08:14:27 UTC	[SPARK-29425][SQL] The ownership of a database should be respected ### What changes were proposed in this pull request? Keep the owner of a database when executing alter database commands ### Why are the changes needed? Spark will inadvertently delete the owner of a database for executing databases ddls ### Does this PR introduce any user-facing change? NO ### How was this patch tested? add and modify uts Closes #26080 from yaooqinn/SPARK-29425. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	05 December 2019, 08:14:27 UTC
0ab922c	turbofei	05 December 2019, 08:00:16 UTC	[SPARK-29860][SQL] Fix dataType mismatch issue for InSubquery ### What changes were proposed in this pull request? There is an issue for InSubquery expression. For example, there are two tables `ta` and `tb` created by the below statements. ``` sql("create table ta(id Decimal(18,0)) using parquet") sql("create table tb(id Decimal(19,0)) using parquet") ``` This statement below would thrown dataType mismatch exception. ``` sql("select * from ta where id in (select id from tb)").show() ``` However, this similar statement could execute successfully. ``` sql("select * from ta where id in ((select id from tb))").show() ``` The root cause is that, for `InSubquery` expression, it does not find a common type for two decimalType like `In` expression. Besides that, for `InSubquery` expression, it also does not find a common type for DecimalType and double/float/bigInt. In this PR, I fix this issue by finding widerType for `InSubquery` expression when DecimalType is involved. ### Why are the changes needed? Some InSubquery would throw dataType mismatch exception. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Unit test. Closes #26485 from turboFei/SPARK-29860-in-subquery. Authored-by: turbofei <fwang12@ebay.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	05 December 2019, 08:00:16 UTC
0bd8b99	Aman Omer	05 December 2019, 07:28:07 UTC	[SPARK-30093][SQL] Improve error message for creating view ### What changes were proposed in this pull request? Improved error message while creating views. ### Why are the changes needed? Error message should suggest user to use TEMPORARY keyword while creating permanent view referred by temporary view. https://github.com/apache/spark/pull/26317#discussion_r352377363 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Updated test case. Closes #26731 from amanomer/imp_err_msg. Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	05 December 2019, 07:28:07 UTC
ebd83a5	Sean Owen	05 December 2019, 03:27:25 UTC	[SPARK-30009][CORE][SQL][FOLLOWUP] Remove OrderingUtil and Utils.nanSafeCompare{Doubles,Floats} and use java.lang.{Double,Float}.compare directly ### What changes were proposed in this pull request? Follow up on https://github.com/apache/spark/pull/26654#discussion_r353826162 Instead of OrderingUtil or Utils.nanSafeCompare{Doubles,Floats}, just use java.lang.{Double,Float}.compare directly. All work identically w.r.t. NaN when used to `compare`. ### Why are the changes needed? Simplification of the previous change, which existed to support Scala 2.13 migration. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing tests Closes #26761 from srowen/SPARK-30009.2. Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>	05 December 2019, 03:27:25 UTC