https://github.com/apache/spark
Revision eaceb4052ee359e5eaa2dd308ed2c9f9f71b0170 authored by Adam Binford on 26 April 2021, 06:39:56 UTC, committed by Liang-Chi Hsieh on 26 April 2021, 06:40:10 UTC
### What changes were proposed in this pull request?

Modifies the UpdateFields optimizer to fix correctness issues with certain nested and chained withField operations. Examples for recreating the issue are in the new unit tests as well as the JIRA issue.

### Why are the changes needed?

Certain withField patterns can cause Exceptions or even incorrect results. It appears to be a result of the additional UpdateFields optimization added in https://github.com/apache/spark/pull/29812. It traverses fieldOps in reverse order to take the last one per field, but this can cause nested structs to change order which leads to mismatches between the schema and the actual data. This updates the optimization to maintain the initial ordering of nested structs to match the generated schema.

### Does this PR introduce _any_ user-facing change?

It fixes exceptions and incorrect results for valid uses in the latest Spark release.

### How was this patch tested?

Added new unit tests for these edge cases.

Closes #32338 from Kimahriman/bug/optimize-with-fields.

Authored-by: Adam Binford <adamq43@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(cherry picked from commit 74afc68e2172cb0dc3567e12a8a2c304bb7ea138)
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
1 parent 6595db2
History
Tip revision: eaceb4052ee359e5eaa2dd308ed2c9f9f71b0170 authored by Adam Binford on 26 April 2021, 06:39:56 UTC
[SPARK-35213][SQL] Keep the correct ordering of nested structs in chained withField operations
Tip revision: eaceb40

README.md

back to top