Revision - bfa0d13 - [SPARK-49083][CONNECT] Allow from_xml and from_json to natively [...]

Revision bfa0d13f3f6b4b662ad0f355a8db00dd1244a698 authored by Herman van Hovell on 06 August 2024, 01:54:09 UTC, committed by Hyukjin Kwon on 06 August 2024, 01:54:09 UTC

[SPARK-49083][CONNECT] Allow from_xml and from_json to natively work with json schemas

### What changes were proposed in this pull request?
We allow the `JsonToStructs` and `XmlToStructs` expressions to use a json schema.

### Why are the changes needed?
A couple of reasons:
- We want to use a reference to the `from_json` and `from_xml` methods in the Column API in order to make unification of the Classic and Connect Scala clients possible.
- Reduce the amount of duplication between the Function API and the SparkConnectPlanner.
- Make DataFrame and SQL API behave the same.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #47573 from hvanhovell/SPARK-49083.

Authored-by: Herman van Hovell <herman@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

1 parent da5912a

Files
Changes

Permalinks

.asf.yaml

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# https://cwiki.apache.org/confluence/display/INFRA/git+-+.asf.yaml+features
---
github:
  description: "Apache Spark - A unified analytics engine for large-scale data processing"
  homepage: https://spark.apache.org/
  labels:
    - python
    - scala
    - r
    - java
    - big-data
    - jdbc
    - sql
    - spark
  enabled_merge_buttons:
    merge: false
    squash: true
    rebase: true

notifications:
  pullrequests: reviews@spark.apache.org
  issues: reviews@spark.apache.org
  commits: commits@spark.apache.org
  jira_options: link label

Showing with 0 additions and 0 deletions (0 / 0 diffs computed)

Computing file changes ...