SNOW-630893: Flatten client generated SQL with column dependency
Please answer these questions before submitting your pull requests. Thanks!
-
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. The SQL generated from snowpark-python has deep nested queries.
-
Fill out the following pre-review checklist:
- [x] I am adding a new automated test(s) to verify correctness of my new code
- [ ] I am adding new logging messages
- [ ] I am adding new credentials
- [ ] I am adding a new dependency
-
Please describe how your code solves the related issue.
This PR begins the work to flatten the subqueries in some cases.
If a DataFrame is created by session.table() and session.sql(), the subsequent API calls with df.select, df.with_columns/with_column, df.drop, df.sort, df.filter, df.limit will possibility have flattened SQLs if certain rules are met.
To flatten the generated SQL, it considers the column expression change between a query and its subquery, and the dependencies among columns at the client side.
This feature can be turned off by setting:
from snowflake.snowpark import context
context._USE_SQL_SIMPLIFIER = False
Given this is a PR with more than 1k lines of changes and touches many different places, could you provide a guidance on how to review the code or give a brief intro? Otherwise try to break it down into smaller pieces.
This is a good point. I'll provide a guidance. It's hard to break it down to smaller PRs.