SNOW-1617523: ColumnEmulator does not support aliasing column names
What is the current behavior?
I am patching the function call_builtin to generate a uuid string (a primary key id) for each row:
# path/to/snowpark_job.py
from snowflake.snowpark import Session
from snowflake.snowpark.functions import call_builtin
def snowpark_job(session: Session, table_name: str):
df = session.table(table_name)
df = df.with_column("id", call_builtin("UUID_STRING"))
df.show()
# path/to/test.py
from unittest import mock
from uuid import uuid4
from snowflake.snowpark import Session
from snowflake.snowpark.functions import call_builtin
from snowflake.snowpark.mock import ColumnEmulator, ColumnType
from snowflake.snowpark.mock import patch as snowpark_patch
from snowflake.snowpark.types import StringType
from path.to.snowpark_job import snowpark_job
@snowpark_patch(call_builtin)
def patch_call_builtin(function_name: str, *args, **kwargs) -> ColumnEmulator:
if function_name == "UUID_STRING":
ret_column = ColumnEmulator(data=[str(uuid4()) for _ in range(1000)])
ret_column.sf_type = ColumnType(StringType(), True)
return ret_column
else:
raise NotImplementedError(
f"If you want to use the builtin function '{function_name}' then you will need to add a case here to patch it"
)
@mock.patch(
"path.to.snowpark_job.call_builtin",
new=patch_call_builtin,
)
def test_snowpark_job():
session = Session.builder.config("local_testing", True).create()
snowpark_job(session, "test_table")
It raises AttributeError: 'ColumnEmulator' object has no attribute 'as_'. I have also tried .alias() and .name() instead of with_column and each raises a similar error.
What is the desired behavior?
ColumnEmulator class should support aliasing column names.
Also, somewhat separately, there are no other arguments passed to patch_call_builtin() other than function_name, so I don't know the number of rows to generate uuids for. This is what I see when I put a debugger inside patch_call_builtin()
function_name = 'UUID_STRING'
args = ()
kwargs = {}
My solution was to simply generate more than needed (using range() with a larger number than rows in my test dataset) but I'm not sure if that's going to work.
If this is not an existing feature in snowflake-snowpark-python. How would this impact/improve non local testing mode?
It is extremely common to rename columns during data transformations, especially when using builtin functions. If builtin functions are supposed to be supported in Snowpark local testing then aliasing those column names should also be supported.