https://github.com/apache/spark
Revision 53e2e7bdd618e2a7dec5a84b9d5ae965fb136179 authored by Bruce Robbins on 01 December 2023, 02:28:33 UTC, committed by Ruifeng Zheng on 01 December 2023, 02:29:00 UTC
### What changes were proposed in this pull request?

In various Pandas aggregate functions, remove each comparison or arithmetic operation between `DoubleType` and `IntergerType` in `evaluateExpression` and replace with a comparison or arithmetic operation between `DoubleType` and `DoubleType`.

Affected functions are `PandasStddev`, `PandasVariance`, `PandasSkewness`, `PandasKurtosis`, and `PandasCovar`.

### Why are the changes needed?

These functions fail in interpreted mode. For example, `evaluateExpression` in `PandasKurtosis` compares a double to an integer:
```
If(n < 4, Literal.create(null, DoubleType) ...
```
This results in a boxed double and a boxed integer getting passed to `SQLOrderingUtil.compareDoubles` which expects two doubles as arguments. The scala runtime tries to unbox the boxed integer as a double, resulting in an error.

Reproduction example:
```
spark.sql("set spark.sql.codegen.wholeStage=false")
spark.sql("set spark.sql.codegen.factoryMode=NO_CODEGEN")

import numpy as np
import pandas as pd

import pyspark.pandas as ps

pser = pd.Series([1, 2, 3, 7, 9, 8], index=np.random.rand(6), name="a")
psser = ps.from_pandas(pser)

psser.kurt()
```
See Jira (SPARK-46189) for the other reproduction cases.

This works fine in codegen mode because the integer is already unboxed and the Java runtime will implictly cast it to a double.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44099 from bersprockets/unboxing_error.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
(cherry picked from commit 042d8546be5d160e203ad78a8aa2e12e74142338)
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
1 parent 00bb4ad
History
Tip revision: 53e2e7bdd618e2a7dec5a84b9d5ae965fb136179 authored by Bruce Robbins on 01 December 2023, 02:28:33 UTC
[SPARK-46189][PS][SQL] Perform comparisons and arithmetic between same types in various Pandas aggregate functions to avoid interpreted mode errors
Tip revision: 53e2e7b

README.md

back to top