datacontract-cli icon indicating copy to clipboard operation
datacontract-cli copied to clipboard

Snowflake data type issues with datacontract test

Open dwelden opened this issue 11 months ago • 7 comments

Data types generated by datacontract import are not matching the detected data types in datacontract test.

> datacontract import --format sql --source Sample_Employees.sql --dialect snowflake --output Sample_Employees.datacontract.yaml
Written result to Sample_Employees.datacontract.yaml

> datacontract test Sample_Employees.datacontract.yaml
...
╭─────────┬─────────────────────────────────────┬────────────────────────────────┬─────────────────────────────────────╮
│ Result  │ Check                               │ Field                          │ Details                             │
├─────────┼─────────────────────────────────────┼────────────────────────────────┼─────────────────────────────────────┤
│         │ Check that field DEPARTMENT has a   │ SAMPLE_EMPLOYEES.DEPARTMENT    │                                     │
│         │ max length of 16777216              │                                │                                     │
│         │ Check that field EMPLOYEE_NAME has  │ SAMPLE_EMPLOYEES.EMPLOYEE_NAME │                                     │
│         │ a max length of 16777216            │                                │                                     │
│         │ Check that field JOB has a max      │ SAMPLE_EMPLOYEES.JOB           │                                     │
│         │ length of 16777216                  │                                │                                     │
│         │ Check that field PHONE_DECIMAL has  │ SAMPLE_EMPLOYEES.PHONE_DECIMAL │                                     │
│         │ a max length of 16777216            │                                │                                     │
│ failed  │ Check that field DEPARTMENT has     │ SAMPLE_EMPLOYEES.DEPARTMENT    │ Type Mismatch, Expected Type:       │
│         │ type VARCHAR(16777216)              │                                │ VARCHAR(16777216); Actual Type:     │
│         │                                     │                                │ TEXT                                │
│ failed  │ Check that field EMPLOYEE_ID has    │ SAMPLE_EMPLOYEES.EMPLOYEE_ID   │ Type Mismatch, Expected Type:       │
│         │ type DECIMAL(38, 0)                 │                                │ DECIMAL(38, 0); Actual Type: NUMBER │
│ failed  │ Check that field EMPLOYEE_NAME has  │ SAMPLE_EMPLOYEES.EMPLOYEE_NAME │ Type Mismatch, Expected Type:       │
│         │ type VARCHAR(16777216)              │                                │ VARCHAR(16777216); Actual Type:     │
│         │                                     │                                │ TEXT                                │
│ failed  │ Check that field JOB has type       │ SAMPLE_EMPLOYEES.JOB           │ Type Mismatch, Expected Type:       │
│         │ VARCHAR(16777216)                   │                                │ VARCHAR(16777216); Actual Type:     │
│         │                                     │                                │ TEXT                                │
│ failed  │ Check that field MANAGER_ID has     │ SAMPLE_EMPLOYEES.MANAGER_ID    │ Type Mismatch, Expected Type:       │
│         │ type DECIMAL(38, 0)                 │                                │ DECIMAL(38, 0); Actual Type: NUMBER │

...

Sample_Employees.sql.txt Sample_Employees.datacontract.yaml.txt datacontract.test.error.log

dwelden avatar Feb 25 '25 20:02 dwelden

Can you post a screenshot of this table structure in Snowflake?

jochenchrist avatar Mar 01 '25 11:03 jochenchrist

Image

dwelden avatar Mar 01 '25 17:03 dwelden

In your YAML, there is a field "PHONE_DECIMAL" instead of "PHONE_NUMBER" (probably caused by search and replace)

jochenchrist avatar Mar 02 '25 11:03 jochenchrist

For the fields, I am not quite sure, why Snowflake reports different types. The import seems fine, please consider as a workaround to configure and overwrite the reported "snowflakeTypes" manually in your data contract YAML.

jochenchrist avatar Mar 02 '25 11:03 jochenchrist

Yes I did not catch that until after posting this issue

On Mar 2, 2025, at 05:23, jochenchrist @.> wrote: In your YAML, there is a field "PHONE_DECIMAL" instead of "PHONE_NUMBER" (probably caused by search and replace)—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.>

jochenchrist left a comment (datacontract/datacontract-cli#662) In your YAML, there is a field "PHONE_DECIMAL" instead of "PHONE_NUMBER" (probably caused by search and replace)

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

dwelden avatar Mar 02 '25 12:03 dwelden

Soda supported data types for Snowflake has the following table:

Category Data type
text CHAR, VARCHAR, CHARACTER, STRING, TEXT
number NUMBER, INT, INTEGER, BIGINT, SMALLINT, TINYINT, BYTEINT, FLOAT, FLOAT4, FLOAT8, DOUBLE, DOUBLE PRECISION, REAL
time DATE, DATETIME, TIME, TIMESTAMP, TIMESTAMPT_LTZ, TIMESTAMP_NTZ, TIMESTAMP_TZ

Looking at the output it appears that Soda is returning the Category of the VARCHAR columns instead of the Data Type. In turn, datacontract test sees a mismatch between what Soda returns and what the YAML file contains.

For the numeric columns, datacontract import is changing NUMBER(38,0) to DECIMAL(38,0) in the YAML, but Soda is returning NUMBER without precision and scale which is flagged as a mismatch.

dwelden avatar Mar 03 '25 17:03 dwelden

As per your advice, I edited the snowflakeType entries to change VARCHAR(16777216) to TEXT and DECIMAL(38,0) to NUMBER(38,0). This resolved the mismatches on the varchar columns, but we still have the precision and scale issue with the numeric columns:

1) EMPLOYEE_ID Check that field EMPLOYEE_ID has type NUMBER(38, 0): Type Mismatch, Expected Type: NUMBER(38, 0); Actual
Type: NUMBER
2) SALARY Check that field SALARY has type NUMBER(38, 0): Type Mismatch, Expected Type: NUMBER(38, 0); Actual Type:
NUMBER
3) MANAGER_ID Check that field MANAGER_ID has type NUMBER(38, 0): Type Mismatch, Expected Type: NUMBER(38, 0); Actual
Type: NUMBER

dwelden avatar Mar 03 '25 17:03 dwelden