[SIP] Proposal for aggregate dimensions
Please make sure you are familiar with the SIP process documented here. The SIP will be numbered by a committer upon acceptance.
[SIP] Proposal for aggregate dimensions
Motivation
Adding columns to a report that are functionality dependent on an already added dimension should come at a low cost.
For example:
'GROUP BY person_id'
If we then want to add columns to the report "first_name,last_name,age" we'd have the choice of adding it to the dimension or as a metric.
Simply adding it to the dimension incurs penalty of putting in the GROUP BY. This is an expensive operation as well as a potential error (what if one of the rows actually had the wrong age and we ended up creating a new person)
Simply adding it as a metric is a bit awkward but is the best fit for my use case now. I have dozens of these columns and perfer to just wrap them in an ANY aggregate function.
Proposed Change
I propose we add a toggle to set a dimension as an "aggregate dimension" Instead of adding this column to the group by the user can just pick the aggregate it would like (min,max,any, first, etc). By default we would not have to duplicate this as a new metric column but could keep the same name. I think this would also work with the newer Drill features in a better way than the metrics.
Rejected Alternatives
Currently adding these strings as a metric in an ANY aggregate works best for me. This awkward and confusing to the end user to see dimensional type data as well as a lot of duplicated column naming "any(first_name) as first_name"