dataform-scd icon indicating copy to clipboard operation
dataform-scd copied to clipboard

SCD package does not handle deleted records (missing SCD close-out for deletions)

Open traychomilevdriven opened this issue 3 months ago • 0 comments

The current implementation of the Dataform SCD package only captures new or updated records based on last_updated_time. However, it does not handle deleted records, causing previously existing rows to remain open-ended (with scd_valid_to = NULL) even when the source data no longer contains them.

Steps to Reproduce

Load source data into a staging table (e.g. stg_table_1).

Run the SCD update process — records are inserted correctly.

Delete one of the records in the source table.

Re-run the SCD process.

Expected Behavior

Deleted records should be marked as inactive in the target SCD table by setting scd_valid_to (or similar) to the current timestamp.

Actual Behavior

Deleted records remain active indefinitely in the SCD table. No change is detected since the deletion does not produce a last_updated_time update.

Technical Details

The generated SQL includes logic like:

select * from ...stg_table_1 where last_updated_time > ( select max(last_updated_time) from ...dim_table_1_scd_updates )

This query filters only by updated timestamps, which excludes deleted records that no longer exist in the staging dataset.

Suggested Improvement

Add an optional configuration (e.g. track_deletions: true) that performs a LEFT JOIN between the existing SCD table and the source table to identify and close out deleted records:

select scd.uniq_key, current_timestamp() as scd_valid_to from scd left join source on scd.uniq_key = source.uniq_key where source.uniq_key is null and scd.scd_valid_to is null;

traychomilevdriven avatar Nov 06 '25 05:11 traychomilevdriven