data-engineer-handbook icon indicating copy to clipboard operation
data-engineer-handbook copied to clipboard

Simplified `UNNEST` with `STRUCT` de-structuring

Open pilosoposerio opened this issue 1 year ago • 3 comments

The following CTE block:

https://github.com/DataExpert-io/data-engineer-handbook/blob/954ac02af1edcceaf56c0e784ec88c3904e73550/bootcamp/materials/1-dimensional-data-modeling/lecture-lab/incremental_scd_query.sql#L67-L75

can be simplified to (using CROSS JOIN to the UNNESTed array):

unnested_changed_records AS ( 
  
     SELECT c.player_name, 
           r.*
            FROM changed_records c, UNNEST(c.records) r
     ), 

provided that you remove the UNNEST call in changed_records.records (i.e., just leave records as an ARRAY):

https://github.com/DataExpert-io/data-engineer-handbook/blob/954ac02af1edcceaf56c0e784ec88c3904e73550/bootcamp/materials/1-dimensional-data-modeling/lecture-lab/incremental_scd_query.sql#L46-L60

This is especially helpful when your scd_type has a lot more fields and you need to de-structure them all.

pilosoposerio avatar Nov 17 '24 17:11 pilosoposerio

Looks good to merge, but I’d recommend adding a check for when the records array is empty—just to avoid any issues with UNNEST.

BUBLET avatar Nov 17 '24 20:11 BUBLET

is this easier to read than using the deconstructor?

(unnest(ARRAY[
               (ls.scoring_class, ls.is_active, ls.start_season, ls.end_season)::scd_type,
               (ts.scoring_class, ts.is_active, ts.current_season, ts.current_season)::scd_type
           ])).*

redpandaxl avatar Nov 19 '24 05:11 redpandaxl

CROSS JOIN and UNNEST is actually easier, especially if there're many fields There's no need to explicity specify each field during deconstruction and thus no risk of manual errors

BUBLET avatar Nov 19 '24 07:11 BUBLET

Code is just for demonstration. Suggest approach can be used in real scenarios.

ry-v1 avatar Dec 14 '24 18:12 ry-v1