Collisions in dbt_scd_id while calculating snapshots
Describe the bug
The snapshot calculation relies on the Teradata HASHROW function. The dbt_scd_id is generated for each row based on the provided unique_key and the current timestamp. However, the HASHROW function produces a 4-byte hash, which is highly prone to collisions. For instance, the values d3dadd49420542fb49ffbf6a77349b45 and 34f325fe5a4216f27357328b61c9eccb both produce the same hash 02-27-E3-B4. Similarly, the numbers 162181727 and 880145039 generate the same hash 2E-5B-FE-DD. In a source with 36 million numbers, we have over 180 thousand duplicate dbt_scd_id.
These collisions cause the snapshot update to fail with the error: [Error 7547] Target row updated by multiple source rows.
Steps To Reproduce
Create a source with the provided values as IDs and then try to create a snapshot of them.
Expected behavior
Calculating the snapshot without errors.
Screenshots and log output
The output of dbt --version:
Core:
- installed: 1.7.11
- latest: 1.8.0 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- teradata: 1.7.2 - Up to date!
The operating system you're using: Windows 11
The output of python --version:
Python 3.11.3