Record first/last active date for cell IDs
Over time, new cells will be added to a mobile network and old cells removed. This means that the content of the infrastructure.cells table needs to be periodically updated to remain current.
Updating cell locations has a number of impacts:
- Cell locations may be provided for cells that were first active many months ago, meaning that previously-unlocatable CDR events from an indeterminate time ago may become locatable.
- Often, we may want to define cell clusters (or other cell groupings) based on only the cells that were active during a certain period. This information is not always readily available in
infrastructure.cells(while we would hope to receive locations for all new cells that become active, we may not know when those cells were added, or that a previously-active cell has been removed from the network). - Sometimes, it can be useful to identify short-lived cell IDs (those that only appear in the CDR over a short period) to help with quality-checking cell locations information.
For all of these reasons (and probably more), it would be useful to keep an up-to-date record of all cell IDs seen in the CDR (regardless of whether they correspond to known cells in infrastructure.cells), with the dates that each was first/last active. We could keep this information in an events.cell_ids table, and update this on CDR ingestion via a new task in FlowETL (I'd hope this update wouldn't be too slow, but even if it is I don't think it should need to block making the new CDR available for analysis).