snowpark-python icon indicating copy to clipboard operation
snowpark-python copied to clipboard

SNOW-1619160: Support for patching table functions

Open djfletcher opened this issue 1 year ago • 3 comments

What is the current behavior?

Table functions, including builtin snowpark ones like flatten, raise an exception in tests: NotImplementedError: [Local Testing] table_function.TableFunctionJoin is not supported. When I try to patch it, the patched function is passed a normal Column and not a ColumnEmulator with the underlying rows series:

# path/to/snowpark_job.py
from snowflake.snowpark import Session
from snowflake.snowpark.functions import flatten


def snowpark_job(session: Session):
    df = session.create_dataframe([[[1, 2, 3], [4, 5], []]], schema=["lists"])
    flattened_df = df.select(flatten(df.lists))
    flattened_df.show()

Nor can I patch the builtin table function. As far as I can tell, patching functions using snowflake.snowpark.mock does not support returning 0, 1, or many rows per input row. But more specifically, when I put a debugger inside patch_flatten() it is being passed a normal Column and not a ColumnEmulator so I can't interact with the underlying series of rows.

# path/to/test.py
from unittest import mock
from uuid import uuid4

from snowflake.snowpark import Session
from snowflake.snowpark.functions import flatten
from snowflake.snowpark.mock import ColumnEmulator, ColumnType
from snowflake.snowpark.mock import patch as snowpark_patch
from snowflake.snowpark.types import IntegerType

from path.to.snowpark_job import snowpark_job


@snowpark_patch(flatten)
def patch_flatten(column: ColumnEmulator, *args, **kwargs) -> ColumnEmulator:
    ret_data = [integer for row in column for integer in row]
    ret_column = ColumnEmulator(data=ret_data)
    ret_column.sf_type = ColumnType(IntegerType(), True)
    return ret_column


@mock.patch(
    "path.to.snowpark_job.flatten",
    new=patch_flatten,
)
def test_snowpark_job():
    session = Session.builder.config("local_testing", True).create()
    snowpark_job(session)

What is the desired behavior?

Ideally builtin table functions like flatten have test implementations, but more generally it might be more practical to support patching table functions.

If this is not an existing feature in snowflake-snowpark-python. How would this impact/improve non local testing mode?

Table functions are a fairly common use case when transforming semi structured data into structured data, so it would make the library more robust.

References, Other Background

djfletcher avatar Aug 08 '24 21:08 djfletcher

Any update on this?

flopetegui avatar Dec 05 '24 16:12 flopetegui

Hi, supporting Flatten blocks me from working with a local session. Any update about it?

rafaels2 avatar Feb 03 '25 13:02 rafaels2

Sharing a workaround to this problem:

def my_flatten(df, *args, **kwargs):
    return df.flatten(*args, **kwargs)

def get_mock_flatten(session):
    def mock_flatten(df, col, ...):
         # Your mock goes here.
         # You can use `df.to_local_iterator` and `session.create_dataframe` for example.

rafaels2 avatar Feb 20 '25 15:02 rafaels2