Revision bf3ad7e94afb5416d995dc22344566899ee7c4b0 authored by Richard Chen on 01 August 2024, 00:48:19 UTC, committed by Wenchen Fan on 01 August 2024, 00:48:19 UTC
### What changes were proposed in this pull request?

Currently, the `actualSize` method of the `VARIANT` `columnType` isn't overridden, so we use the default size of 2kb for the `actualSize`. We should define `actualSize` so the cached variant column can correctly be written to the byte buffer.

Currently, if the avg per-variant size is greater than 2KB and the total column size is greater than 128KB (the default initial buffer size), an exception will be (incorrectly) thrown.

### Why are the changes needed?

to fix caching larger variants (in df.cache()), such as the ones included in the UTs.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

added UT

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #47559 from richardc-db/fix_variant_cache.

Authored-by: Richard Chen <r.chen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 06ed91a
History

README.md

back to top