spark
spark copied to clipboard
[SPARK-49006] Implement purging for OperatorStateMetadataV2 and StateSchemaV3 files
What changes were proposed in this pull request?
Currently, OperatorStateMetadataV2 and StateSchemaV3 files are written for every new query run. This PR will implement purging files so we only keep minLogEntriesToMaintain files per query.
Why are the changes needed?
These changes are needed so that we don't indefinitely keep these files across many query runs, bounding the number of state files we keep
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit tests
Was this patch authored or co-authored using generative AI tooling?
No