https://github.com/apache/spark
Revision 18fc8e8e023868f6e7fab3422c5ce57e690d7834 authored by ulysses-you on 09 September 2022, 21:43:19 UTC, committed by Dongjoon Hyun on 09 September 2022, 21:43:19 UTC
### What changes were proposed in this pull request?

backport https://github.com/apache/spark/pull/37706 for branch-3.3

Skip optimize the root user-specified repartition in `PropagateEmptyRelation`.

### Why are the changes needed?

Spark should preserve the final repatition which can affect the final output partition which is user-specified.

For example:

```scala
spark.sql("select * from values(1) where 1 < rand()").repartition(1)

// before:
== Optimized Logical Plan ==
LocalTableScan <empty>, [col1#0]

// after:
== Optimized Logical Plan ==
Repartition 1, true
+- LocalRelation <empty>, [col1#0]
```

### Does this PR introduce _any_ user-facing change?

yes, the empty plan may change

### How was this patch tested?

add test

Closes #37730 from ulysses-you/empty-3.3.

Authored-by: ulysses-you <ulyssesyou18@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent 4f69c98
History
Tip revision: 18fc8e8e023868f6e7fab3422c5ce57e690d7834 authored by ulysses-you on 09 September 2022, 21:43:19 UTC
[SPARK-39915][SQL][3.3] Dataset.repartition(N) may not create N partitions Non-AQE part
Tip revision: 18fc8e8

README.md

back to top