SNOW-630893: Flatten client generated SQL with column dependency

Open sfc-gh-yixie opened this issue 3 years ago • 1 comments

Please answer these questions before submitting your pull requests. Thanks!

What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR. The SQL generated from snowpark-python has deep nested queries.
Fill out the following pre-review checklist:
- [x] I am adding a new automated test(s) to verify correctness of my new code
- [ ] I am adding new logging messages
- [ ] I am adding new credentials
- [ ] I am adding a new dependency
Please describe how your code solves the related issue.

This PR begins the work to flatten the subqueries in some cases. If a DataFrame is created by session.table() and session.sql(), the subsequent API calls with df.select, df.with_columns/with_column, df.drop, df.sort, df.filter, df.limit will possibility have flattened SQLs if certain rules are met.

To flatten the generated SQL, it considers the column expression change between a query and its subquery, and the dependencies among columns at the client side.

This feature can be turned off by setting:

    from snowflake.snowpark import context
    context._USE_SQL_SIMPLIFIER = False

Aug 15 '22 16:08 sfc-gh-yixie

Given this is a PR with more than 1k lines of changes and touches many different places, could you provide a guidance on how to review the code or give a brief intro? Otherwise try to break it down into smaller pieces.

This is a good point. I'll provide a guidance. It's hard to break it down to smaller PRs.

Aug 24 '22 17:08 sfc-gh-yixie