https://github.com/apache/spark
Revision 0e2758c9955c2ae102e37e0b49aa9446bbe6fecf authored by Max Gekk on 14 July 2022, 14:45:39 UTC, committed by Max Gekk on 14 July 2022, 14:45:39 UTC
### What changes were proposed in this pull request? In the PR, I propose to catch `PatternSyntaxException` while compiling the regexp pattern by the `regexp_extract`, `regexp_extract_all` and `regexp_instr`, and substitute the exception by Spark's exception w/ the error class `INVALID_PARAMETER_VALUE`. In this way, Spark SQL will output the error in the form: ```sql org.apache.spark.SparkRuntimeException [INVALID_PARAMETER_VALUE] The value of parameter(s) 'regexp' in `regexp_instr` is invalid: ) ? ``` instead of (on Spark 3.3.0): ```java java.lang.NullPointerException: null ``` Also I propose to set `lastRegex` only after the compilation of the regexp pattern completes successfully. This is a backport of https://github.com/apache/spark/pull/37171. ### Why are the changes needed? The changes fix NPE portrayed by the code on Spark 3.3.0: ```sql spark-sql> SELECT regexp_extract('1a 2b 14m', '(?l)'); 22/07/12 19:07:21 ERROR SparkSQLDriver: Failed in [SELECT regexp_extract('1a 2b 14m', '(?l)')] java.lang.NullPointerException: null at org.apache.spark.sql.catalyst.expressions.RegExpExtractBase.getLastMatcher(regexpExpressions.scala:768) ~[spark-catalyst_2.12-3.3.0.jar:3.3.0] ``` This should improve user experience with Spark SQL. ### Does this PR introduce _any_ user-facing change? No. In regular cases, the behavior is the same but users will observe different exceptions (error messages) after the changes. ### How was this patch tested? By running new tests: ``` $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z regexp-functions.sql" $ build/sbt "test:testOnly *.RegexpExpressionsSuite" $ build/sbt "sql/test:testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite" ``` Authored-by: Max Gekk <max.gekkgmail.com> Signed-off-by: Max Gekk <max.gekkgmail.com> (cherry picked from commit 5b96bd5cf8f44eee7a16cd027d37dec552ed5a6a) Signed-off-by: Max Gekk <max.gekkgmail.com> Closes #37181 from MaxGekk/pattern-syntax-exception-3.3. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
1 parent 2fe1601
Tip revision: 0e2758c9955c2ae102e37e0b49aa9446bbe6fecf authored by Max Gekk on 14 July 2022, 14:45:39 UTC
[SPARK-39758][SQL][3.3] Fix NPE from the regexp functions on invalid patterns
[SPARK-39758][SQL][3.3] Fix NPE from the regexp functions on invalid patterns
Tip revision: 0e2758c
File | Mode | Size |
---|---|---|
.github | ||
.idea | ||
R | ||
assembly | ||
bin | ||
binder | ||
build | ||
common | ||
conf | ||
core | ||
data | ||
dev | ||
docs | ||
examples | ||
external | ||
graphx | ||
hadoop-cloud | ||
launcher | ||
licenses | ||
licenses-binary | ||
mllib | ||
mllib-local | ||
project | ||
python | ||
repl | ||
resource-managers | ||
sbin | ||
sql | ||
streaming | ||
tools | ||
.asf.yaml | -rw-r--r-- | 1.1 KB |
.gitattributes | -rw-r--r-- | 130 bytes |
.gitignore | -rw-r--r-- | 2.0 KB |
CONTRIBUTING.md | -rw-r--r-- | 997 bytes |
LICENSE | -rw-r--r-- | 13.1 KB |
LICENSE-binary | -rw-r--r-- | 22.4 KB |
NOTICE | -rw-r--r-- | 2.0 KB |
NOTICE-binary | -rw-r--r-- | 56.5 KB |
README.md | -rw-r--r-- | 4.4 KB |
appveyor.yml | -rw-r--r-- | 2.7 KB |
pom.xml | -rw-r--r-- | 137.4 KB |
scalastyle-config.xml | -rw-r--r-- | 22.0 KB |
Computing file changes ...