snowflake-connector-python SNOW-633549: `fetch_pandas_all` fails when exeucting inserts, creates, etc.

TL;DR: Any non-select queries (inserts, create table etc.

What version of Python are you using?

Python 3.9.12

What operating system and processor architecture are you using?

Linux-5.4.181-99.354.amzn2.x86_64-x86_64-with-glibc2.28

What are the component versions in the environment (pip freeze)?

[omitting]

What did you do?

cur.fetch_pandas_all()

Raises:

NotSupportedError                         Traceback (most recent call last)
<ipython-input-29-64b7d8357e61> in <cell line: 4>()
      2 cur = conn.cursor()
      3 cur.execute("create temp table test_db.public.foo (bar int)")
----> 4 cur.fetch_pandas_all()

~/.cache/pypoetry/virtualenvs/python-kernel-OtKFaj5M-py3.9/lib/python3.9/site-packages/snowflake/connector/cursor.py in fetch_pandas_all(self, **kwargs)
    973             self._prefetch_hook()
    974         if self._query_result_format != "arrow":
--> 975             raise NotSupportedError
    976         self._log_telemetry_job_data(
    977             TelemetryField.PANDAS_FETCH_ALL, TelemetryData.TRUE

NotSupportedError: Unknown error

What did you expect to see?

I think standard practice here would be to return a dataframe with the number of updated rows or similar, rather than erroring

Jul 22 '22 17:07 jlorince

We found this to be true for all DDL operations (ALTER, CREATE, DROP, etc) - which has to do with the result_format returned ... Our workaround was to create a DictCursor and use fetchall()

        connection = sc.connect(...).engine.cursor(sc.DictCursor)

        try:
            result = connection.execute(query_string).fetchall()
            df = pd.DataFrame(result)

Jul 26 '22 09:07 ethanbates-ez

Currently only a subset of SELECT statements support serving Arrow results. Which is a requirement of our current implementation to serve you pandas, or arrow results both.

However; there's a way to work around this limitation for your case.

cur.execute("insert into users values (...)")
cur.execute(f"select * from table(result_scan('{cur.sfqid}'))")
print(cur.fetch_pandas_all())

This way the select statement will be able to transform the results for you into Arrow.

Aug 09 '22 23:08 sfc-gh-mkeller

As a pre-requisite please read https://github.com/snowflakedb/snowflake-connector-python/issues/986#issuecomment-1213496635 Another option would be to allow JSON results to be casted into pandas.DataFrame and pyarrow.Table on the connector's side. This is exactly what @ethanbates-ez is suggesting in his comment. This would allow fetch_pandas_all, fetch_pandas_batches, fetch_arrow_all and fetch_arrow_batches to work with all queries.

My problem with this solution is that these 4 fetch functions are always guaranteed to be faster than fetching Json results and this change would jeopardize this guarantee for convenience's sake.

I'd be okay introducing it as part of the connector, but only executing the code if the user explicitly triggers it. Knowing full well to not expect the same performance as if arrow results were available.

Would something like this API work for you?

with snowflake.connector.connect(...) as conn:
    with conn.cursor() as cur:
        cur.execute("create temp table asd(id number);")
        df = cur.fetch_pandas_all(slow_ok=True)

Aug 16 '22 20:08 sfc-gh-mkeller

To clean up and re-prioritize bugs and feature requests we are closing all issues older than 6 months as of March 1, 2023. If there are any issues or feature requests that you would like us to address, please re-create them. For urgent issues, opening a support case with this link Snowflake Community is the fastest way to get a response

Mar 10 '23 01:03 github-actions[bot]