Revision - 373a627 - [SPARK-26680][SPARK-25767][SQL][BACKPORT-2.3] Eagerly create [...] - origin: https://github.com/apache/spark

visit type:

https://github.com/apache/spark

18 August 2024, 22:37:34 UTC

Revision 373a627e99666ff047d41c8797a21deee84e23b9 authored by Bruce Robbins on 25 January 2019, 03:41:31 UTC, committed by Dongjoon Hyun on 25 January 2019, 03:41:31 UTC

[SPARK-26680][SPARK-25767][SQL][BACKPORT-2.3] Eagerly create inputVars while conditions are appropriate

## What changes were proposed in this pull request?

Back port of #22789 and #23617 to branch-2.3

When a user passes a Stream to groupBy, ```CodegenSupport.consume``` ends up lazily generating ```inputVars``` from a Stream, since the field ```output``` will be a Stream. At the time ```output.zipWithIndex.map``` is called, conditions are correct. However, by the time the map operation actually executes, conditions are no longer appropriate. The closure used by the map operation ends up using a reference to the partially created ```inputVars```. As a result, a StackOverflowError occurs.

This PR ensures that ```inputVars``` is eagerly created while conditions are appropriate. It seems this was also an issue with the code path for creating ```inputVars``` from ```outputVars``` (SPARK-25767). I simply extended the solution for that code path to encompass both code paths.

## How was this patch tested?

SQL unit tests
new test
python tests

Closes #23642 from bersprockets/SPARK-26680_branch23.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

1 parent ded902c

Files
Changes

Permalinks

Tip revision: 373a627e99666ff047d41c8797a21deee84e23b9 authored by Bruce Robbins on 25 January 2019, 03:41:31 UTC
[SPARK-26680][SPARK-25767][SQL][BACKPORT-2.3] Eagerly create inputVars while conditions are appropriate

Tip revision: 373a627

File	Mode	Size
.github
R
assembly
bin
build
common
conf
core
data
dev
docs
examples
external
graphx
hadoop-cloud
launcher
licenses
mllib
mllib-local
project
python
repl
resource-managers
sbin
sql
streaming
tools
.gitattributes	-rw-r--r--	40 bytes
.gitignore	-rw-r--r--	1.3 KB
.travis.yml	-rw-r--r--	1.7 KB
CONTRIBUTING.md	-rw-r--r--	995 bytes
LICENSE	-rw-r--r--	17.6 KB
NOTICE	-rw-r--r--	25.7 KB
README.md	-rw-r--r--	3.7 KB
appveyor.yml	-rw-r--r--	2.3 KB
pom.xml	-rw-r--r--	99.3 KB
scalastyle-config.xml	-rw-r--r--	17.2 KB

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...

https://github.com/apache/spark

[SPARK-26680][SPARK-25767][SQL][BACKPORT-2.3] Eagerly create inputVars while conditions are appropriate

README.md