https://github.com/apache/spark
Revision de68152f01c13ff69d61dca31db1a516e7145bfe authored by Peter Toth on 15 August 2022, 13:45:01 UTC, committed by Wenchen Fan on 15 August 2022, 13:45:01 UTC
### What changes were proposed in this pull request?
Keep the output attributes of a `Union` node's first child in the `RemoveRedundantAliases` rule to avoid correctness issues.

### Why are the changes needed?
To fix the result of the following query:
```
SELECT a, b AS a FROM (
  SELECT a, a AS b FROM (SELECT a FROM VALUES (1) AS t(a))
  UNION ALL
  SELECT a, b FROM (SELECT a, b FROM VALUES (1, 2) AS t(a, b))
)
```
Before this PR the query returns the incorrect result:
```
+---+---+
|  a|  a|
+---+---+
|  1|  1|
|  2|  2|
+---+---+
```
After this PR it returns the expected result:
```
+---+---+
|  a|  a|
+---+---+
|  1|  1|
|  1|  2|
+---+---+
```

### Does this PR introduce _any_ user-facing change?
Yes, fixes a correctness issue.

### How was this patch tested?
Added new UTs.

Closes #37496 from peter-toth/SPARK-39887-keep-attributes-of-unions-first-child-3.1.

Authored-by: Peter Toth <ptoth@cloudera.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent f2453e8
History
Tip revision: de68152f01c13ff69d61dca31db1a516e7145bfe authored by Peter Toth on 15 August 2022, 13:45:01 UTC
[SPARK-39887][SQL][3.1] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique
Tip revision: de68152

README.md

back to top