Revision 497d17f38a13fa1ac883c1f628b53b85c8a35085 authored by Bruce Robbins on 18 June 2022, 00:25:11 UTC, committed by Hyukjin Kwon on 18 June 2022, 00:27:07 UTC
Change `Inline.eval` to return a row of null values rather than a null row in the case of a null input struct.

Consider the following query:
```
set spark.sql.codegen.wholeStage=false;
select inline(array(named_struct('a', 1, 'b', 2), null));
```
This query fails with a `NullPointerException`:
```
22/06/16 15:10:06 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NullPointerException
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.execution.GenerateExec.$anonfun$doExecute$11(GenerateExec.scala:122)
```
(In Spark 3.1.3, you don't need to set `spark.sql.codegen.wholeStage` to false to reproduce the error, since Spark 3.1.3 has no codegen path for `Inline`).

This query fails regardless of the setting of `spark.sql.codegen.wholeStage`:
```
val dfWide = (Seq((1))
  .toDF("col0")
  .selectExpr(Seq.tabulate(99)(x => s"$x as col${x + 1}"): _*))

val df = (dfWide
  .selectExpr("*", "array(named_struct('a', 1, 'b', 2), null) as struct_array"))

df.selectExpr("*", "inline(struct_array)").collect
```
It fails with
```
22/06/16 15:18:55 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)/ 1]
java.lang.NullPointerException
	at org.apache.spark.sql.catalyst.expressions.JoinedRow.isNullAt(JoinedRow.scala:80)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.writeFields_0_8$(Unknown Source)
```
When `Inline.eval` returns a null row in the collection, GenerateExec gets a NullPointerException either when joining the null row with required child output, or projecting the null row.

This PR avoids producing the null row and produces a row of null values instead:
```
spark-sql> set spark.sql.codegen.wholeStage=false;
spark.sql.codegen.wholeStage	false
Time taken: 3.095 seconds, Fetched 1 row(s)
spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null));
1	2
NULL	NULL
Time taken: 1.214 seconds, Fetched 2 row(s)
spark-sql>
```

No.

New unit test.

Closes #36903 from bersprockets/inline_eval_null_struct_issue.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit c4d5390dd032d17a40ad50e38f0ed7bd9bbd4698)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 07edae9
History

README.md

back to top