snowpark-python icon indicating copy to clipboard operation
snowpark-python copied to clipboard

SNOW-701482: Inconsistency with numeric datatypes in Snowpark table/DataFrame Schema

Open brian-kalinowski-sonos opened this issue 3 years ago • 1 comments

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    • Python 3.8.13
  2. What operating system and processor architecture are you using?

    • macOS-10.16-x86_64-i386-64bit
  3. What are the component versions in the environment (pip freeze)?

    • snowflake-connector-python==2.7.12
    • snowflake-snowpark-python==1.0.0
  4. What did you do?

tbl_name = 'NUMERIC_TABLE'

create_table_query = f'''
create or replace TABLE {tbl_name} (
    NUMBER_COL NUMBER(38,0),
    INT_COL INT,
    DOUBLE_COL DOUBLE,
    FLOAT_COL FLOAT,
    LONG_COL BIGINT,
    SHORT_COL SMALLINT,
    BYTE_COL TINYINT
)
'''

client.session.sql(create_table_query).show()

tbl = client.session.table('NUMERIC_TABLE')

print('Snowpark Schema:')

for f in tbl.schema.fields:
    print(f)

"""
---------------------------------------------
|"status"                                   |
---------------------------------------------
|Table NUMERIC_TABLE successfully created.  |
---------------------------------------------

Snowpark Schema:
StructField('NUMBER_COL', LongType(), nullable=True)
StructField('INT_COL', LongType(), nullable=True)
StructField('DOUBLE_COL', DoubleType(), nullable=True)
StructField('FLOAT_COL', DoubleType(), nullable=True)
StructField('LONG_COL', LongType(), nullable=True)
StructField('SHORT_COL', LongType(), nullable=True)
StructField('BYTE_COL', LongType(), nullable=True)
"""
  1. What did you expect to see?

Creating a table with various numeric types seems to return only a LongType() for most columns/types when viewing the Snowpark table/dataframe schema property.

Seems to be some inconsistencies with generating the Snowpark schema vs. what datatypes the table is created with.

Running a DESCRIBE TABLE NUMERIC_TABLE; returns:

name	              type
NUMBER_COL     NUMBER(38,0)
INT_COL	              NUMBER(38,0)
DOUBLE_COL	      FLOAT
FLOAT_COL	      FLOAT
LONG_COL	      NUMBER(38,0)
SHORT_COL	      NUMBER(38,0)
BYTE_COL	      NUMBER(38,0)

The above is more expected when using the Synonymous numeric type names, but seems pretty far off when viewing the Snowpark schema fields on a table/dataframe.

  1. Can you set logging to DEBUG and collect the logs?
2022-11-29 14:20:36,803 - Snowpark Session information: 
"version" : 1.0.0,
"python.version" : 3.8.13,
"python.connector.version" : 2.7.12,
"python.connector.session.id" : 28283155476234,
"os.name" : Darwin

2022-11-29 14:20:37,129 - Execute query [queryID: 01a8a45c-0506-3265-0000-19b9b8c44872] 
create or replace TABLE NUMERIC_TABLE (
    NUMBER_COL NUMBER(38,0),
    INT_COL INT,
    DOUBLE_COL DOUBLE,
    FLOAT_COL FLOAT,
    LONG_COL BIGINT,
    SHORT_COL SMALLINT,
    BYTE_COL TINYINT
)

brian-kalinowski-sonos avatar Nov 29 '22 22:11 brian-kalinowski-sonos

@brian-kalinowski-sonos In Snowflake, int and long are the same as number(38, 0). float is the same as double. Refer to https://docs.snowflake.com/en/sql-reference/data-types-numeric

sfc-gh-yixie avatar May 11 '23 05:05 sfc-gh-yixie