https://github.com/apache/spark
Revision b3e31107277cea5e441eb3708535e740712027a6 authored by Bruce Robbins on 27 March 2022, 00:31:49 UTC, committed by Hyukjin Kwon on 27 March 2022, 00:32:55 UTC
Backport of #35837.

When building the project list from an aggregate sequence in `ExtractGenerator`, convert the aggregate sequence to an `IndexedSeq` before performing the flatMap operation.

This query fails with a `NullPointerException`:
```
val df = Seq(1, 2, 3).toDF("v")
df.select(Stream(explode(array(min($"v"), max($"v"))), sum($"v")): _*).collect
```
If you change `Stream` to `Seq`, then it succeeds.

`ExtractGenerator` uses a flatMap operation over `aggList` for two purposes:

- To produce a new aggregate list
- to update `projectExprs` (which is initialized as an array of nulls).

When `aggList` is a `Stream`, the flatMap operation evaluates lazily, so all entries in `projectExprs` after the first will still be null when the rule completes.

Changing `aggList` to an `IndexedSeq` forces the flatMap to evaluate eagerly.

No

New unit test

Closes #35851 from bersprockets/generator_aggregate_issue_32.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 7842621ff50001e1cde8e2e6a2fc48c2cdcaf3d4)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 789ec13
History
Tip revision: b3e31107277cea5e441eb3708535e740712027a6 authored by Bruce Robbins on 27 March 2022, 00:31:49 UTC
[SPARK-38528][SQL][3.2] Eagerly iterate over aggregate sequence when building project list in `ExtractGenerator`
Tip revision: b3e3110

README.md

back to top