Revision - bd3f36f - [SPARK-39962][PYTHON][SQL] Apply projection when group [...] - origin: https://github.com/apache/spark

visit type:

https://github.com/apache/spark

05 April 2024, 20:24:39 UTC

Revision bd3f36f6626f0fb71ab0ceb9bbe7fa4d05c628f5 authored by Hyukjin Kwon on 03 August 2022, 07:11:20 UTC, committed by Hyukjin Kwon on 03 August 2022, 07:11:33 UTC

[SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty

### What changes were proposed in this pull request?

This PR proposes to apply the projection to respect the reordered columns in its child when group attributes are empty.

### Why are the changes needed?

To respect the column order in the child.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes a bug as below:

```python
import pandas as pd
from pyspark.sql import functions as f

f.pandas_udf("double")
def AVG(x: pd.Series) -> float:
    return x.mean()

abc = spark.createDataFrame([(1.0, 5.0, 17.0)], schema=["a", "b", "c"])
abc.agg(AVG("a"), AVG("c")).show()
abc.select("c", "a").agg(AVG("a"), AVG("c")).show()
```

**Before**

```
+------+------+
|AVG(a)|AVG(c)|
+------+------+
|  17.0|   1.0|
+------+------+
```

**After**

```
+------+------+
|AVG(a)|AVG(c)|
+------+------+
|   1.0|  17.0|
+------+------+
```

### How was this patch tested?

Manually tested, and added an unittest.

Closes #37390 from HyukjinKwon/SPARK-39962.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 5335c784ae76c9cc0aaa7a4b57b3cd6b3891ad9a)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

1 parent 2254240

Files
Changes

Permalinks

Tip revision: bd3f36f6626f0fb71ab0ceb9bbe7fa4d05c628f5 authored by Hyukjin Kwon on 03 August 2022, 07:11:20 UTC
[SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty

Tip revision: bd3f36f

File	Mode	Size
.github
.idea
R
assembly
bin
binder
build
common
conf
core
data
dev
docs
examples
external
graphx
hadoop-cloud
launcher
licenses
licenses-binary
mllib
mllib-local
project
python
repl
resource-managers
sbin
sql
streaming
tools
.asf.yaml	-rw-r--r--	1.1 KB
.gitattributes	-rw-r--r--	130 bytes
.gitignore	-rw-r--r--	2.0 KB
CONTRIBUTING.md	-rw-r--r--	997 bytes
LICENSE	-rw-r--r--	13.1 KB
LICENSE-binary	-rw-r--r--	22.4 KB
NOTICE	-rw-r--r--	2.0 KB
NOTICE-binary	-rw-r--r--	56.5 KB
README.md	-rw-r--r--	4.4 KB
appveyor.yml	-rw-r--r--	2.7 KB
pom.xml	-rw-r--r--	137.4 KB
scalastyle-config.xml	-rw-r--r--	22.0 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

https://github.com/apache/spark

[SPARK-39962][PYTHON][SQL] Apply projection when group attributes are empty

README.md