Validate that dbt-teradata works with dbt-expectations package
Describe the feature
dbt users frequently rely on https://github.com/calogica/dbt-expectations. We need to ensure that dbt-teradata works well with this dbt package.
Describe alternatives you've considered
Don't depend on existing dbt libraries and always create your own.
Additional context
This came up in a discussion with a large Telco customer who is using dbt with Teradata.
Who will this benefit?
Users who want to leverage https://github.com/calogica/dbt-expectations in their dbt projects.
Are you interested in contributing this feature?
Yes.
Has this issue been looked at?
Hi @sambloom92 , we have not investigated it yet. Right now, we prioritize catching up with dbt releases. If you have tried it and have feedback it would be very helpful.
Ok I have been writing some tests using dbt-teradata and dbt-expectations for a work project, so I will post all the issues I find and the code I've used to fix them.
These are the tests I've tried so far (I won't try them all as there are lots, but these are the ones I've needed so far):
- expect_column_values_to_be_between
- expect_column_values_to_be_of_type
- expect_column_values_to_be_of_type_list
- expect_column_values_lengths_to_equal
- expect_values_to_match_regex
The main fix that I needed to implement to make most of these work was to create a teradata__truth_expression macro, as the default__truth_expression macro relies on a boolean type existing, which of course teradata does not have, so I've wrapped it in a CASE WHEN and cast it to a BYTEINT type (luckily the truth expression uses adapter.dispatch, so it's a very simple fix):
{% macro teradata__truth_expression(expression) %}
case when {{ expression }} then CAST(1 AS BYTEINT) else CAST(0 AS BYTEINT) end as expression
{% endmacro %}
This then requires that each test that calls dbt_expectations.expression_is_true must be changed so that it uses the optional test_condition argument to use the BYTEINT value instead of letting it default to '= true':
{{ dbt_expectations.expression_is_true(model,
expression=expression_min_max,
group_by_columns=group_by_columns,
row_condition=row_condition,
test_condition="= 1")
}}
Unfortunately there are a lot of tests which use dbt_expectations.expression_is_true, but most of them don't use adapter.dispatch, so it's not possible to just write a teradata__ prefixed implementation and call it a day- you have to completely override them by re-writing the test locally.
Ideally all of the tests in dbt_expectations would be implemented such that the test calls a macro of the same name, and this macro then uses adapter.dispatch, with a default__ prefixed implementation. Then you wouldn't have to override anything, you could just write a teradata__ prefixed implementation, without having to re-write any existing code, and adapter.dispatch will call the correct version for the given adapter.
I may suggest this as an enhancement to the dbt-expectations package, as it won't change anything functionally, but should make it easier for other dbt adapaters to maintain compatibility. Here is an example of what I mean:
{% test expect_column_values_to_match_regex(model, column_name,
regex,
row_condition=None,
is_raw=False,
flags=""
) %}
{% set expression %}
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw, flags=flags) }} > 0
{% endset %}
{{ dbt_expectations.expression_is_true(model,
expression=expression,
group_by_columns=None,
row_condition=row_condition
)
}}
{% endtest %}
would ideally be refactored to:
{% test expect_column_values_to_match_regex(model, column_name,
regex,
row_condition=None,
is_raw=False,
flags=""
) %}
{{ expect_column_values_to_match_regex(model, column_name, regex, row_condition, is_raw,flags) }}
{% endtest %}
{% macro expect_column_values_to_match_regex(model, column_name,
regex,
row_condition=None,
is_raw=False,
flags=""
) %}
{{adapter.dispatch('expect_column_values_to_match_regex', 'dbt_expectations')(model, column_name, regex, row_condition, is_raw,flags)}}
{% endmacro %}
{% macro default__expect_column_values_to_match_regex(model, column_name,
regex,
row_condition=None,
is_raw=False,
flags=""
) %}
{% set expression %}
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw, flags=flags) }} > 0
{% endset %}
{{ dbt_expectations.expression_is_true(model,
expression=expression,
group_by_columns=None,
row_condition=row_condition,
)
}}
{% endmacro %}
So then all you'd need to do to make it teradata compatible is provide this macro in the dbt-teradata adapater:
{% macro teradata__expect_column_values_to_match_regex(model, column_name,
regex,
row_condition=None,
is_raw=False,
flags=""
) %}
{% set expression %}
{{ dbt_expectations.regexp_instr(column_name, regex, is_raw=is_raw, flags=flags) }} > 0
{% endset %}
{{ dbt_expectations.expression_is_true(model,
expression=expression,
group_by_columns=None,
row_condition=row_condition,
test_condition="= 1"
)
}}
{% endmacro %}
@adamtwo do you think this change would be a reasonable request for the dbt-expectations maintainers? Or is there an easier way that I'm missing which would not require them to change anything on their side?
Finally, expect_column_values_to_be_in_type_list also has an incompatibility issue to do with teradata requiring SELECT statements in UNIONs to select from a table. My fix for this was to just add an empty 'dummy' table and select from this. There was also an issue with trying to cast things as TEXT, so I just changed it to VARCHAR(50) (hopefully that should be long enough for all the data types).
Here is the default implementation:
{% macro default__expect_column_values_to_be_in_type_list(model, column_name, column_type_list) %}
{%- if execute -%}
{%- set column_name = column_name | upper -%}
{%- set columns_in_relation = adapter.get_columns_in_relation(model) -%}
{%- set column_type_list = column_type_list| map("upper") | list -%}
with relation_columns as (
{% for column in columns_in_relation %}
select
cast('{{ escape_single_quotes(column.name | upper) }}' as {{ dbt.type_string() }}) as relation_column,
cast('{{ column.dtype | upper }}' as {{ dbt.type_string() }}) as relation_column_type
{% if not loop.last %}union all{% endif %}
{% endfor %}
),
test_data as (
select
*
from
relation_columns
where
relation_column = '{{ column_name }}'
and
relation_column_type not in ('{{ column_type_list | join("', '") }}')
)
select *
from test_data
{%- endif -%}
{% endmacro %}
And the teradata implementation:
{% macro teradata__expect_column_values_to_be_in_type_list(model, column_name, column_type_list) %}
{%- if execute -%}
{%- set column_name = column_name | upper -%}
{%- set columns_in_relation = adapter.get_columns_in_relation(model) -%}
{%- set column_type_list = column_type_list| map("upper") | list -%}
with dummy_table as (select null as dummy_column),
relation_columns as (
{% for column in columns_in_relation %}
select
'{{ escape_single_quotes(column.name | upper) }}' as relation_column,
CAST('{{ column.dtype | upper }}' AS VARCHAR(50)) as relation_column_type
from dummy_table
{% if not loop.last %}union all{% endif %}
{% endfor %}
),
test_data as (
select
*
from
relation_columns
where
relation_column = '{{ column_name }}'
and
relation_column_type not in ('{{ column_type_list | join("', '") }}')
)
select *
from test_data
{%- endif -%}
{% endmacro %}
TLDR;
The main issues i've discovered are:
- No boolean type in teradata, so anywhere with
expression = trueneeds to be adapted to use a byteint type instead - Most tests don't use
adapter.dispatchso making them teradata compatible requires completely overriding them, rather than just providing ateradata__prefixed implementation of the specific part that causes the incompatibility. I am suggesting dbt-expectations change this on their side, to make it easier for adapaters to maintain compatibility - teradata requires SELECT statements in UNIONs to select from a table, so add a dummy table.
- teradata has no TEXT type