https://github.com/apache/spark
Revision 0e2758c9955c2ae102e37e0b49aa9446bbe6fecf authored by Max Gekk on 14 July 2022, 14:45:39 UTC, committed by Max Gekk on 14 July 2022, 14:45:39 UTC
### What changes were proposed in this pull request?
In the PR, I propose to catch `PatternSyntaxException` while compiling the regexp pattern by the `regexp_extract`, `regexp_extract_all` and `regexp_instr`, and substitute the exception by Spark's exception w/ the error class `INVALID_PARAMETER_VALUE`. In this way, Spark SQL will output the error in the form:
```sql
org.apache.spark.SparkRuntimeException
[INVALID_PARAMETER_VALUE] The value of parameter(s) 'regexp' in `regexp_instr` is invalid: ) ?
```
instead of (on Spark 3.3.0):
```java
java.lang.NullPointerException: null
```
Also I propose to set `lastRegex` only after the compilation of the regexp pattern completes successfully.

This is a backport of https://github.com/apache/spark/pull/37171.

### Why are the changes needed?
The changes fix NPE portrayed by the code on Spark 3.3.0:
```sql
spark-sql> SELECT regexp_extract('1a 2b 14m', '(?l)');
22/07/12 19:07:21 ERROR SparkSQLDriver: Failed in [SELECT regexp_extract('1a 2b 14m', '(?l)')]
java.lang.NullPointerException: null
	at org.apache.spark.sql.catalyst.expressions.RegExpExtractBase.getLastMatcher(regexpExpressions.scala:768) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0]
```
This should improve user experience with Spark SQL.

### Does this PR introduce _any_ user-facing change?
No. In regular cases, the behavior is the same but users will observe different exceptions (error messages) after the changes.

### How was this patch tested?
By running new tests:
```
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z regexp-functions.sql"
$ build/sbt "test:testOnly *.RegexpExpressionsSuite"
$ build/sbt "sql/test:testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite"
```

Authored-by: Max Gekk <max.gekkgmail.com>
Signed-off-by: Max Gekk <max.gekkgmail.com>
(cherry picked from commit 5b96bd5cf8f44eee7a16cd027d37dec552ed5a6a)
Signed-off-by: Max Gekk <max.gekkgmail.com>

Closes #37181 from MaxGekk/pattern-syntax-exception-3.3.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
1 parent 2fe1601
History
Tip revision: 0e2758c9955c2ae102e37e0b49aa9446bbe6fecf authored by Max Gekk on 14 July 2022, 14:45:39 UTC
[SPARK-39758][SQL][3.3] Fix NPE from the regexp functions on invalid patterns
Tip revision: 0e2758c

README.md

back to top