https://github.com/apache/spark
Revision bd3f36f6626f0fb71ab0ceb9bbe7fa4d05c628f5 authored by Hyukjin Kwon on 03 August 2022, 07:11:20 UTC, committed by Hyukjin Kwon on 03 August 2022, 07:11:33 UTC
### What changes were proposed in this pull request?

This PR proposes to apply the projection to respect the reordered columns in its child when group attributes are empty.

### Why are the changes needed?

To respect the column order in the child.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a bug as below:

```python
import pandas as pd
from pyspark.sql import functions as f

f.pandas_udf("double")
def AVG(x: pd.Series) -> float:
    return x.mean()

abc = spark.createDataFrame([(1.0, 5.0, 17.0)], schema=["a", "b", "c"])
abc.agg(AVG("a"), AVG("c")).show()
abc.select("c", "a").agg(AVG("a"), AVG("c")).show()
```

**Before**

```
+------+------+
|AVG(a)|AVG(c)|
+------+------+
|  17.0|   1.0|
+------+------+
```

**After**

```
+------+------+
|AVG(a)|AVG(c)|
+------+------+
|   1.0|  17.0|
+------+------+
```

### How was this patch tested?

Manually tested, and added an unittest.

Closes #37390 from HyukjinKwon/SPARK-39962.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 5335c784ae76c9cc0aaa7a4b57b3cd6b3891ad9a)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 2254240
History
Tip revision: bd3f36f6626f0fb71ab0ceb9bbe7fa4d05c628f5 authored by Hyukjin Kwon on 03 August 2022, 07:11:20 UTC
[SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty
Tip revision: bd3f36f

README.md

back to top