https://github.com/apache/spark
Revision 53e2e7bdd618e2a7dec5a84b9d5ae965fb136179 authored by Bruce Robbins on 01 December 2023, 02:28:33 UTC, committed by Ruifeng Zheng on 01 December 2023, 02:29:00 UTC
### What changes were proposed in this pull request? In various Pandas aggregate functions, remove each comparison or arithmetic operation between `DoubleType` and `IntergerType` in `evaluateExpression` and replace with a comparison or arithmetic operation between `DoubleType` and `DoubleType`. Affected functions are `PandasStddev`, `PandasVariance`, `PandasSkewness`, `PandasKurtosis`, and `PandasCovar`. ### Why are the changes needed? These functions fail in interpreted mode. For example, `evaluateExpression` in `PandasKurtosis` compares a double to an integer: ``` If(n < 4, Literal.create(null, DoubleType) ... ``` This results in a boxed double and a boxed integer getting passed to `SQLOrderingUtil.compareDoubles` which expects two doubles as arguments. The scala runtime tries to unbox the boxed integer as a double, resulting in an error. Reproduction example: ``` spark.sql("set spark.sql.codegen.wholeStage=false") spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN") import numpy as np import pandas as pd import pyspark.pandas as ps pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a") psser = ps.from_pandas(pser) psser.kurt() ``` See Jira (SPARK-46189) for the other reproduction cases. This works fine in codegen mode because the integer is already unboxed and the Java runtime will implictly cast it to a double. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #44099 from bersprockets/unboxing_error. Authored-by: Bruce Robbins <bersprockets@gmail.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org> (cherry picked from commit 042d8546be5d160e203ad78a8aa2e12e74142338) Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
1 parent 00bb4ad
Tip revision: 53e2e7bdd618e2a7dec5a84b9d5ae965fb136179 authored by Bruce Robbins on 01 December 2023, 02:28:33 UTC
[SPARK-46189][PS][SQL] Perform comparisons and arithmetic between same types in various Pandas aggregate functions to avoid interpreted mode errors
[SPARK-46189][PS][SQL] Perform comparisons and arithmetic between same types in various Pandas aggregate functions to avoid interpreted mode errors
Tip revision: 53e2e7b
File | Mode | Size |
---|---|---|
.github | ||
R | ||
assembly | ||
bin | ||
binder | ||
build | ||
common | ||
conf | ||
connector | ||
core | ||
data | ||
dev | ||
docs | ||
examples | ||
graphx | ||
hadoop-cloud | ||
launcher | ||
licenses | ||
licenses-binary | ||
mllib | ||
mllib-local | ||
project | ||
python | ||
repl | ||
resource-managers | ||
sbin | ||
sql | ||
streaming | ||
tools | ||
.asf.yaml | -rw-r--r-- | 1.3 KB |
.gitattributes | -rw-r--r-- | 130 bytes |
.gitignore | -rw-r--r-- | 1.8 KB |
CONTRIBUTING.md | -rw-r--r-- | 997 bytes |
LICENSE | -rw-r--r-- | 13.0 KB |
LICENSE-binary | -rw-r--r-- | 22.4 KB |
NOTICE | -rw-r--r-- | 2.0 KB |
NOTICE-binary | -rw-r--r-- | 56.5 KB |
README.md | -rw-r--r-- | 4.5 KB |
appveyor.yml | -rw-r--r-- | 2.8 KB |
pom.xml | -rw-r--r-- | 139.1 KB |
scalastyle-config.xml | -rw-r--r-- | 23.7 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...