SNOW-633549: `fetch_pandas_all` fails when exeucting inserts, creates, etc.
TL;DR: Any non-select queries (inserts, create table etc.
- What version of Python are you using?
Python 3.9.12
- What operating system and processor architecture are you using?
Linux-5.4.181-99.354.amzn2.x86_64-x86_64-with-glibc2.28
- What are the component versions in the environment (
pip freeze)?
[omitting]
- What did you do?
cur.fetch_pandas_all()
Raises:
NotSupportedError Traceback (most recent call last)
<ipython-input-29-64b7d8357e61> in <cell line: 4>()
2 cur = conn.cursor()
3 cur.execute("create temp table test_db.public.foo (bar int)")
----> 4 cur.fetch_pandas_all()
~/.cache/pypoetry/virtualenvs/python-kernel-OtKFaj5M-py3.9/lib/python3.9/site-packages/snowflake/connector/cursor.py in fetch_pandas_all(self, **kwargs)
973 self._prefetch_hook()
974 if self._query_result_format != "arrow":
--> 975 raise NotSupportedError
976 self._log_telemetry_job_data(
977 TelemetryField.PANDAS_FETCH_ALL, TelemetryData.TRUE
NotSupportedError: Unknown error
- What did you expect to see?
I think standard practice here would be to return a dataframe with the number of updated rows or similar, rather than erroring
We found this to be true for all DDL operations (ALTER, CREATE, DROP, etc) - which has to do with the result_format returned
...
Our workaround was to create a DictCursor and use fetchall()
connection = sc.connect(...).engine.cursor(sc.DictCursor)
try:
result = connection.execute(query_string).fetchall()
df = pd.DataFrame(result)
Currently only a subset of SELECT statements support serving Arrow results. Which is a requirement of our current implementation to serve you pandas, or arrow results both.
However; there's a way to work around this limitation for your case.
cur.execute("insert into users values (...)")
cur.execute(f"select * from table(result_scan('{cur.sfqid}'))")
print(cur.fetch_pandas_all())
This way the select statement will be able to transform the results for you into Arrow.
As a pre-requisite please read https://github.com/snowflakedb/snowflake-connector-python/issues/986#issuecomment-1213496635
Another option would be to allow JSON results to be casted into pandas.DataFrame and pyarrow.Table on the connector's side. This is exactly what @ethanbates-ez is suggesting in his comment. This would allow fetch_pandas_all, fetch_pandas_batches, fetch_arrow_all and fetch_arrow_batches to work with all queries.
My problem with this solution is that these 4 fetch functions are always guaranteed to be faster than fetching Json results and this change would jeopardize this guarantee for convenience's sake.
I'd be okay introducing it as part of the connector, but only executing the code if the user explicitly triggers it. Knowing full well to not expect the same performance as if arrow results were available.
Would something like this API work for you?
with snowflake.connector.connect(...) as conn:
with conn.cursor() as cur:
cur.execute("create temp table asd(id number);")
df = cur.fetch_pandas_all(slow_ok=True)
To clean up and re-prioritize bugs and feature requests we are closing all issues older than 6 months as of March 1, 2023. If there are any issues or feature requests that you would like us to address, please re-create them. For urgent issues, opening a support case with this link Snowflake Community is the fastest way to get a response