https://github.com/apache/spark
Revision 63f20c526bed8346fe3399aff6c0b2f7a78b441e authored by Gengliang Wang on 13 May 2022, 05:50:27 UTC, committed by Gengliang Wang on 13 May 2022, 05:50:40 UTC
### What changes were proposed in this pull request? Currently, for most of the cases, the project https://issues.apache.org/jira/browse/SPARK-38615 is able to show where the runtime errors happen within the original query. However, after trying on production, I found that the following queries won't show where the divide by 0 error happens ``` create table aggTest(i int, j int, k int, d date) using parquet insert into aggTest values(1, 2, 0, date'2022-01-01') select sum(j)/sum(k),percentile(i, 0.9) from aggTest group by d ``` With `percentile` function in the query, the plan can't execute with whole stage codegen. Thus the child plan of `Project` is serialized to executors for execution, from ProjectExec: ``` protected override def doExecute(): RDD[InternalRow] = { child.execute().mapPartitionsWithIndexInternal { (index, iter) => val project = UnsafeProjection.create(projectList, child.output) project.initialize(index) iter.map(project) } } ``` Note that the `TreeNode.origin` is not serialized to executors since `TreeNode` doesn't extend the trait `Serializable`, which results in an empty query context on errors. For more details, please read https://issues.apache.org/jira/browse/SPARK-39140 A dummy fix is to make `TreeNode` extend the trait `Serializable`. However, it can be performance regression if the query text is long (every `TreeNode` carries it for serialization). A better fix is to introduce a new trait `SupportQueryContext` and materialize the truncated query context for special expressions. This PR targets on binary arithmetic expressions only. I will create follow-ups for the remaining expressions which support runtime error query context. ### Why are the changes needed? Improve the error context framework and make sure it works when whole stage codegen is not available. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit tests Closes #36525 from gengliangwang/serializeContext. Lead-authored-by: Gengliang Wang <gengliang@apache.org> Co-authored-by: Gengliang Wang <ltnwgl@gmail.com> Signed-off-by: Gengliang Wang <gengliang@apache.org> (cherry picked from commit e336567c8a9704b500efecd276abaf5bd3988679) Signed-off-by: Gengliang Wang <gengliang@apache.org>
1 parent 3cc47a1
Tip revision: 63f20c526bed8346fe3399aff6c0b2f7a78b441e authored by Gengliang Wang on 13 May 2022, 05:50:27 UTC
[SPARK-39166][SQL] Provide runtime error query context for binary arithmetic when WSCG is off
[SPARK-39166][SQL] Provide runtime error query context for binary arithmetic when WSCG is off
Tip revision: 63f20c5
File | Mode | Size |
---|---|---|
.github | ||
.idea | ||
R | ||
assembly | ||
bin | ||
binder | ||
build | ||
common | ||
conf | ||
core | ||
data | ||
dev | ||
docs | ||
examples | ||
external | ||
graphx | ||
hadoop-cloud | ||
launcher | ||
licenses | ||
licenses-binary | ||
mllib | ||
mllib-local | ||
project | ||
python | ||
repl | ||
resource-managers | ||
sbin | ||
sql | ||
streaming | ||
tools | ||
.asf.yaml | -rw-r--r-- | 1.1 KB |
.gitattributes | -rw-r--r-- | 130 bytes |
.gitignore | -rw-r--r-- | 2.0 KB |
CONTRIBUTING.md | -rw-r--r-- | 997 bytes |
LICENSE | -rw-r--r-- | 13.1 KB |
LICENSE-binary | -rw-r--r-- | 22.4 KB |
NOTICE | -rw-r--r-- | 2.0 KB |
NOTICE-binary | -rw-r--r-- | 56.5 KB |
README.md | -rw-r--r-- | 4.4 KB |
appveyor.yml | -rw-r--r-- | 2.7 KB |
pom.xml | -rw-r--r-- | 137.2 KB |
scalastyle-config.xml | -rw-r--r-- | 22.0 KB |
![swh spinner](/static/img/swh-spinner.gif)
Computing file changes ...