https://github.com/apache/spark
Revision 205766ee1e92d8fec1731c2d7f9a37e46c957e05 authored by Bruce Robbins on 21 June 2022, 23:31:56 UTC, committed by Hyukjin Kwon on 21 June 2022, 23:31:56 UTC
### What changes were proposed in this pull request?

Backport of #36903

Change `Inline.eval` to return a row of null values rather than a null row in the case of a null input struct.

### Why are the changes needed?

Consider the following query:
```
set spark.sql.codegen.wholeStage=false;
select inline(array(named_struct('a', 1, 'b', 2), null));
```
This query fails with a `NullPointerException`:
```
22/06/16 15:10:06 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NullPointerException
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$11(GenerateExec.scala:122)
```
(In Spark 3.1.x, you don't need to set `spark.sql.codegen.wholeStage` to false to reproduce the error, since Spark 3.1.x has no codegen path for `Inline`).

This query fails regardless of the setting of `spark.sql.codegen.wholeStage`:
```
val dfWide = (Seq((1))
  .toDF("col0")
  .selectExpr(Seq.tabulate(99)(x => s"$x as col${x + 1}"): _*))

val df = (dfWide
  .selectExpr("*", "array(named_struct('a', 1, 'b', 2), null) as struct_array"))

df.selectExpr("*", "inline(struct_array)").collect
```
It fails with
```
22/06/16 15:18:55 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]
java.lang.NullPointerException
	at org.apache.spark.sql.catalyst.expressions.JoinedRow.isNullAt(JoinedRow.scala:80)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_8$(Unknown Source)
```
When `Inline.eval` returns a null row in the collection, GenerateExec gets a NullPointerException either when joining the null row with required child output, or projecting the null row.

This PR avoids producing the null row and produces a row of null values instead:
```
spark-sql> set spark.sql.codegen.wholeStage=false;
spark.sql.codegen.wholeStage	false
Time taken: 3.095 seconds, Fetched 1 row(s)
spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null));
1	2
NULL	NULL
Time taken: 1.214 seconds, Fetched 2 row(s)
spark-sql>
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit test.

Closes #36949 from bersprockets/inline_eval_null_struct_issue_31.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 14bc8ec
History
Tip revision: 205766ee1e92d8fec1731c2d7f9a37e46c957e05 authored by Bruce Robbins on 21 June 2022, 23:31:56 UTC
[SPARK-39496][SQL][3.1] Handle null struct in Inline.eval
Tip revision: 205766e

README.md

back to top