Revision db036537ab5593b2520742b3b1a1028bb0fcc7fa authored by Vladimir Golubev on 29 July 2024, 14:07:02 UTC, committed by Wenchen Fan on 29 July 2024, 14:07:02 UTC
### What changes were proposed in this pull request?

Use `HashSet`/`HashMap` instead of doing linear searches over the `Seq`. In case of 1000s of partitions this significantly improves the performance.

### Why are the changes needed?

To avoid the O(n*m) passes in the `PreprocessTableCreation`

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #47484 from vladimirg-db/vladimirg-db/get-rid-of-linear-searches-preprocess-table-creation.

Authored-by: Vladimir Golubev <vladimir.golubev@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent efc6a75
History

README.md

back to top