FlowKit icon indicating copy to clipboard operation
FlowKit copied to clipboard

Inconsistent join behaviour in spatial units

Open jc-harrison opened this issue 3 years ago • 1 comments

GeomSpatialUnit uses a left join from the cells table to the mapping/geom table, so includes rows for all cells in the cells table regardless of whether they map to a location in the geom table. PolygonSpatialUnit uses an inner join if no mapping table is specified, or a left join if a mapping table is specified.

This means that a SubscriberLocations query (or any other query that uses a JoinToLocation) may or may not include CDR events at known-but-unmapped cells, depending on the type of spatial unit. We should make this behaviour consistent.

I think it would make most sense to always use an inner join - cell IDs in the cells table that don't map to any locations for the specified spatial unit should be treated the same as unknown cell IDs.

E.g. if we have defined a fixed mapping from cell IDs to admin3 regions (e.g. via a cell clustering), we might want to run location queries on all CDR events that map to admin3 regions using this pre-defined mapping. If new cell locations are added to the cells table after defining the mapping, and the mapping is not updated, I don't think we would usually want to start including events at the new cells (mapped to null location), because these are effectively no different from events at unknown cell IDs (which are always excluded in JoinToLocation).

jc-harrison avatar Aug 23 '22 16:08 jc-harrison

See also https://github.com/Flowminder/FlowKit/issues/4246#issuecomment-1023185866

jc-harrison avatar Aug 23 '22 16:08 jc-harrison