SNOW-1622029: Table.update() raises TypeError if table contains any VariantType columns
Please answer these questions before submitting your issue. Thanks!
- What version of Python are you using?
Python 3.9.6 (default, Feb 3 2024, 15:58:27) [Clang 15.0.0 (clang-1500.3.9.4)]
- What are the Snowpark Python and pandas versions in the environment?
pandas==2.2.2 snowflake-snowpark-python==1.20.0
- What did you do?
I am updating a Table row in my tests. I can reproduce using the same code as https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/latest/snowpark/api/snowflake.snowpark.Table.update but with one extra variant column. Updating any column, even if it is not the VariantType column, raises a TypeError:
session = Session.builder.config("local_testing", True).create()
target_df = session.create_dataframe([(1, 1, {}),(1, 2, {}),(2, 1, {}),(2, 2, {}),(3, 1, {}),(3, 2, {})], schema=["a", "b", "c"])
target_df.write.save_as_table("my_table", mode="overwrite", table_type="temporary")
t = session.table("my_table")
t.update({"b": 0}, t["a"] == 1)
Here is the stacktrace:
venv/lib/python3.9/site-packages/snowflake/snowpark/table.py:470: in update
result = new_df._internal_collect_with_tag(
venv/lib/python3.9/site-packages/snowflake/snowpark/_internal/telemetry.py:150: in wrap
result = func(*args, **kwargs)
venv/lib/python3.9/site-packages/snowflake/snowpark/dataframe.py:644: in _internal_collect_with_tag_no_telemetry
return self._session._conn.execute(
venv/lib/python3.9/site-packages/snowflake/snowpark/mock/_connection.py:559: in execute
res = execute_mock_plan(plan, plan.expr_to_alias)
venv/lib/python3.9/site-packages/snowflake/snowpark/mock/_plan.py:1166: in execute_mock_plan
matched_count = intermediate[target.columns].value_counts(dropna=False)[
venv/lib/python3.9/site-packages/pandas/core/frame.py:7509: in value_counts
counts = self.groupby(subset, dropna=dropna, observed=False)._grouper.size()
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:705: in size
ids, _, ngroups = self.group_info
properties.pyx:36: in pandas._libs.properties.CachedProperty.__get__
???
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:745: in group_info
comp_ids, obs_group_ids = self._get_compressed_codes()
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:764: in _get_compressed_codes
group_index = get_group_index(self.codes, self.shape, sort=True, xnull=True)
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:690: in codes
return [ping.codes for ping in self.groupings]
venv/lib/python3.9/site-packages/pandas/core/groupby/ops.py:690: in <listcomp>
return [ping.codes for ping in self.groupings]
venv/lib/python3.9/site-packages/pandas/core/groupby/grouper.py:691: in codes
return self._codes_and_uniques[0]
properties.pyx:36: in pandas._libs.properties.CachedProperty.__get__
???
venv/lib/python3.9/site-packages/pandas/core/groupby/grouper.py:835: in _codes_and_uniques
codes, uniques = algorithms.factorize( # type: ignore[assignment]
venv/lib/python3.9/site-packages/pandas/core/algorithms.py:795: in factorize
codes, uniques = factorize_array(
venv/lib/python3.9/site-packages/pandas/core/algorithms.py:595: in factorize_array
uniques, codes = table.factorize(
pandas/_libs/hashtable_class_helper.pxi:7281: in pandas._libs.hashtable.PyObjectHashTable.factorize
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E TypeError: unhashable type: 'dict'
pandas/_libs/hashtable_class_helper.pxi:7195: TypeError
- What did you expect to see?
The in-memory table should have been updated without raising a TypeError.
Per the documentation: https://docs.snowflake.com/en/developer-guide/snowpark/python/testing-locally#limitations
For Table.merge and Table.update, the session parameters ERROR_ON_NONDETERMINISTIC_UPDATE and ERROR_ON_NONDETERMINISTIC_MERGE must be set to False. This means that for multi-joins, one of the matched rows is updated.
Adding these params has no effect:
statement_params = {"ERROR_ON_NONDETERMINISTIC_UPDATE": False, "ERROR_ON_NONDETERMINISTIC_MERGE": False}
t.update({"b": 0}, t["a"] == 1, statement_params=statement_params)
E TypeError: unhashable type: 'dict'
Hello @djfletcher ,
Thanks for raising the issue, yes, the issue is with local testing while updating the table and its working fine with regular session. Will work on eliminating it.
session = Session.builder.config("local_testing", True).create() target_df = session.create_dataframe([(1, 1, {}),(1, 2, {}),(2, 1, {}),(2, 2, {}),(3, 1, {}),(3, 2, {})], schema=["a", "b", "c"]) target_df.write.save_as_table("my_table", mode="overwrite", table_type="temporary") t = session.table("my_table") t.show() t.update({"b": 0}, t["a"] == 1) t.show()
Output and Error:
|"A" |"B" |"C" |
|1 |1 |{} | |1 |2 |{} | |2 |1 |{} | |2 |2 |{} | |3 |1 |{} | |3 |2 |{} |
TypeError: unhashable type: 'dict'
Regards, Sujan
This issue is fixed as of v1.25.0