https://github.com/apache/spark
Revision 373a627e99666ff047d41c8797a21deee84e23b9 authored by Bruce Robbins on 25 January 2019, 03:41:31 UTC, committed by Dongjoon Hyun on 25 January 2019, 03:41:31 UTC
## What changes were proposed in this pull request?

Back port of #22789 and #23617 to branch-2.3

When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs.

This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths.

## How was this patch tested?

SQL unit tests
new test
python tests

Closes #23642 from bersprockets/SPARK-26680_branch23.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent ded902c
History
Tip revision: 373a627e99666ff047d41c8797a21deee84e23b9 authored by Bruce Robbins on 25 January 2019, 03:41:31 UTC
[SPARK-26680][SPARK-25767][SQL][BACKPORT-2.3] Eagerly create inputVars while conditions are appropriate
Tip revision: 373a627

README.md

back to top