Snowflake data type issues with datacontract test
Data types generated by datacontract import are not matching the detected data types in datacontract test.
> datacontract import --format sql --source Sample_Employees.sql --dialect snowflake --output Sample_Employees.datacontract.yaml
Written result to Sample_Employees.datacontract.yaml
> datacontract test Sample_Employees.datacontract.yaml
...
╭─────────┬─────────────────────────────────────┬────────────────────────────────┬─────────────────────────────────────╮
│ Result │ Check │ Field │ Details │
├─────────┼─────────────────────────────────────┼────────────────────────────────┼─────────────────────────────────────┤
│ │ Check that field DEPARTMENT has a │ SAMPLE_EMPLOYEES.DEPARTMENT │ │
│ │ max length of 16777216 │ │ │
│ │ Check that field EMPLOYEE_NAME has │ SAMPLE_EMPLOYEES.EMPLOYEE_NAME │ │
│ │ a max length of 16777216 │ │ │
│ │ Check that field JOB has a max │ SAMPLE_EMPLOYEES.JOB │ │
│ │ length of 16777216 │ │ │
│ │ Check that field PHONE_DECIMAL has │ SAMPLE_EMPLOYEES.PHONE_DECIMAL │ │
│ │ a max length of 16777216 │ │ │
│ failed │ Check that field DEPARTMENT has │ SAMPLE_EMPLOYEES.DEPARTMENT │ Type Mismatch, Expected Type: │
│ │ type VARCHAR(16777216) │ │ VARCHAR(16777216); Actual Type: │
│ │ │ │ TEXT │
│ failed │ Check that field EMPLOYEE_ID has │ SAMPLE_EMPLOYEES.EMPLOYEE_ID │ Type Mismatch, Expected Type: │
│ │ type DECIMAL(38, 0) │ │ DECIMAL(38, 0); Actual Type: NUMBER │
│ failed │ Check that field EMPLOYEE_NAME has │ SAMPLE_EMPLOYEES.EMPLOYEE_NAME │ Type Mismatch, Expected Type: │
│ │ type VARCHAR(16777216) │ │ VARCHAR(16777216); Actual Type: │
│ │ │ │ TEXT │
│ failed │ Check that field JOB has type │ SAMPLE_EMPLOYEES.JOB │ Type Mismatch, Expected Type: │
│ │ VARCHAR(16777216) │ │ VARCHAR(16777216); Actual Type: │
│ │ │ │ TEXT │
│ failed │ Check that field MANAGER_ID has │ SAMPLE_EMPLOYEES.MANAGER_ID │ Type Mismatch, Expected Type: │
│ │ type DECIMAL(38, 0) │ │ DECIMAL(38, 0); Actual Type: NUMBER │
...
Sample_Employees.sql.txt Sample_Employees.datacontract.yaml.txt datacontract.test.error.log
Can you post a screenshot of this table structure in Snowflake?
In your YAML, there is a field "PHONE_DECIMAL" instead of "PHONE_NUMBER" (probably caused by search and replace)
For the fields, I am not quite sure, why Snowflake reports different types. The import seems fine, please consider as a workaround to configure and overwrite the reported "snowflakeTypes" manually in your data contract YAML.
Yes I did not catch that until after posting this issue
On Mar 2, 2025, at 05:23, jochenchrist @.> wrote: In your YAML, there is a field "PHONE_DECIMAL" instead of "PHONE_NUMBER" (probably caused by search and replace)—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.>
jochenchrist left a comment (datacontract/datacontract-cli#662) In your YAML, there is a field "PHONE_DECIMAL" instead of "PHONE_NUMBER" (probably caused by search and replace)
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>
Soda supported data types for Snowflake has the following table:
| Category | Data type |
|---|---|
| text | CHAR, VARCHAR, CHARACTER, STRING, TEXT |
| number | NUMBER, INT, INTEGER, BIGINT, SMALLINT, TINYINT, BYTEINT, FLOAT, FLOAT4, FLOAT8, DOUBLE, DOUBLE PRECISION, REAL |
| time | DATE, DATETIME, TIME, TIMESTAMP, TIMESTAMPT_LTZ, TIMESTAMP_NTZ, TIMESTAMP_TZ |
Looking at the output it appears that Soda is returning the Category of the VARCHAR columns instead of the Data Type. In turn, datacontract test sees a mismatch between what Soda returns and what the YAML file contains.
For the numeric columns, datacontract import is changing NUMBER(38,0) to DECIMAL(38,0) in the YAML, but Soda is returning NUMBER without precision and scale which is flagged as a mismatch.
As per your advice, I edited the snowflakeType entries to change VARCHAR(16777216) to TEXT and DECIMAL(38,0) to NUMBER(38,0). This resolved the mismatches on the varchar columns, but we still have the precision and scale issue with the numeric columns:
1) EMPLOYEE_ID Check that field EMPLOYEE_ID has type NUMBER(38, 0): Type Mismatch, Expected Type: NUMBER(38, 0); Actual
Type: NUMBER
2) SALARY Check that field SALARY has type NUMBER(38, 0): Type Mismatch, Expected Type: NUMBER(38, 0); Actual Type:
NUMBER
3) MANAGER_ID Check that field MANAGER_ID has type NUMBER(38, 0): Type Mismatch, Expected Type: NUMBER(38, 0); Actual
Type: NUMBER