[SPARK-48694][CORE]Manage memory used by external cache
What changes were proposed in this pull request?
This PR proposes changes to count memory used by external cache in storage memory and include it in overall spill logic.
Why are the changes needed?
We have a scenario that use Spark together with a 3rd party file source cache, which is an independent lib and has its internal logic for cache entry creation, eviction and remove. Currently we allocate dedicated memory for this cache but the problem is that the memory can't be shared along with Spark execution/storage memory. It will be more effective for memory usage if we can count it in the UnifiedMemoryManager and include it in memory spill logic.
We also have a requirement of memory management for a native RDD cache implementation. The existing interfaces in MemoryStore is generally bound with Spark SerializerManager and BlockEvictionHandler. It's not easy to extend for such customized RDD cache.
Does this PR introduce any user-facing change?
Yes. It introduces some configurations for memory management: spark.memory.external.cache.enabled, spark.memory.external.storageFraction and spark.memory.storage.preferEvictExtCache.
How was this patch tested?
It's tested by tpcds workload with external file source cache.
Was this patch authored or co-authored using generative AI tooling?
No