snowpark-python icon indicating copy to clipboard operation
snowpark-python copied to clipboard

SNOW-643388: Are CTEs supported?

Open KenRoytman opened this issue 3 years ago • 1 comments

What is the current behavior?

Using the same dataframe in a self-join results in generated SQL that has multiple copies of the same sub-query.

What is the desired behavior?

It seems beneficial to have such dataframe operations use CTEs when generating the SQL.

How would this improve snowflake-snowpark-python?

More efficient query execution.

References, Other Background

I see that the dataframe api offers the cache_result() function. However, that makes use of temp tables, and requires the end-user to explicitly call cache_result() (rather than the library figuring out when to use CTEs).

KenRoytman avatar Aug 11 '22 16:08 KenRoytman

Hi Ken, thanks for the feedback. Today extracting common subqueries into CTE is not supported. This is challenging to get right and requires some designs/thoughts, but it is definitely a nice optimization.

sfc-gh-sfan avatar Aug 12 '22 16:08 sfc-gh-sfan

Thanks for the reply. Is a simpler first step to have an API that would allow the user to specify which .select() calls should be treated as CTEs?

KenRoytman avatar Dec 01 '22 15:12 KenRoytman

Unfortunately I don't think we have that yet. It might be useful to look into cache_result if certain operations need to be executed multiple times.

sfc-gh-sfan avatar Dec 01 '22 18:12 sfc-gh-sfan

CTE is supported now by automatically converting duplicate subqueries to CTEs. You can enable this feature by setting session.cte_optimization_enabled = True.

sfc-gh-jdu avatar Jun 20 '24 21:06 sfc-gh-jdu