[SPARK-39930][SQL] Introduce Cache Hints
What changes were proposed in this pull request?
This ticket intends to add query hints for cache behaviors, users can perform actions like the use/skip/cache/uncache, etc on the cached results by the following hints.
-
RESULT_CACHE: spark will use cache by default, this hint keeps this behavior but will cache the child plan if uncached yet.
-
NO_RESULT_CACHE:with this hint, we can skip the default behavior to bypass the cached data if it is cached
-
RESULT_UNCACHE: with this hint, we can remove the cached data.
RESULT_CACHE and NO_RESULT_CACHE are borrowed from oracle, FYI,https://docs.oracle.com/database/121/TGDBA/tune_result_cache.htm#TGDBA647., while RESULT_UNCACHE's added accordingly via the naming rule. Similar ideas can also be found in MySQL, FYI https://dev.mysql.com/doc/refman/5.7/en/query-cache-in-select.html
Maybe we can add more cache hints like RESULT_RECACHE to refresh the cached data, or maybe some cache behaviors that relate to metadata.
Why are the changes needed?
As a supplement for CACHE/UNCACHE commands, pure SQL users can operate the cache in their queries directly without extra definitions. Also, cache skipping is supported by hints.
Does this PR introduce any user-facing change?
yes, this PR brings some new hints
How was this patch tested?
-
new tests for ut
-
test dist by hand
cc @cloud-fan @MaxGekk @HyukjinKwon @wangyum, PTAL when you have time.
I like the idea.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!
To close the loop: CACHE TABLE abc AS SELECT id from range(0,1) should be sufficient. If it fails with view already exists, we can either rerun it with a different name, or drop the temp view first.
If we want to combine multiple SQL statements, SQL code block with control flow ability seems a better feature to build.
For a multi-tenant scenario like Thrift Server, if we explicitly cache/uncache some plans or relations, it will affect other's request