Revision 9760c8ab6028bf525ea482d9e944bc8e731e0715 authored by tianhanhu on 06 October 2021, 10:06:09 UTC, committed by Hyukjin Kwon on 06 October 2021, 10:06:22 UTC
### What changes were proposed in this pull request?
Migrating a Spark application from 2.4.x to 3.1.x and finding a difference in the exception chaining behavior. In a case of parsing a malformed CSV, where the root cause exception should be Caused by: java.lang.RuntimeException: Malformed CSV record, only the top level exception is kept, and all lower level exceptions and root cause are lost. Thus, when we call ExceptionUtils.getRootCause on the exception, we still get itself.
The reason for the difference is that RuntimeException is wrapped in BadRecordException, which has unserializable fields. When we try to serialize the exception from tasks and deserialize from scheduler, the exception is lost.
This PR makes unserializable fields of BadRecordException transient, so the rest of the exception could be serialized and deserialized properly.

### Why are the changes needed?
Make BadRecordException serializable

### Does this PR introduce _any_ user-facing change?
User could get root cause of BadRecordException

### How was this patch tested?
Unit testing

Closes #34167 from tianhanhu/master.

Authored-by: tianhanhu <adrianhu96@gmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit aed977c4682b6f378a26050ffab51b9b2075cae4)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
1 parent 88f4809
Raw File
.asf.yaml
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
---
github:
  description: "Apache Spark - A unified analytics engine for large-scale data processing"
  homepage: https://spark.apache.org/
  labels:
    - python
    - scala
    - r
    - java
    - big-data
    - jdbc
    - sql
    - spark
back to top