[new]: `is_table_row_number_anomalous(fully_qualified_table_name)`
Check the idea has not already been suggested
- [X] I could not find my idea in existing issues
Edit the title above with self-explanatory function name and argument names
- [X] The function name and the argument names I entered in the title above seems self explanatory to me.
BigFunction Description as it would appear in the documentation
Get table row count for the last 7 days (using time travelling feature of BigQuery) and check that current row count is not unusual compared to yesterday row count and regarding the evolution of the 7 latest days
Examples of (arguments, expected output) as they would appear in the documentation
one_public_table_name --> false
I can't understand clearly. Does that mean comparing today's row count with yesterday's and today's row count with the day before yesterday and so on up to 7 days latest days?
For example - if today is 28 April 2023
then the comparison will be
| Today | previous | is_equal |
|---|---|---|
| 28-April-2023 | 27-April-2023 | true |
| 28-April-2023 | 26-April-2023 | true |
| 28-April-2023 | 25-April-2023 | true |
| 28-April-2023 | 24-April-2023 | false |
| 28-April-2023 | 23-April-2023 | true |
| 28-April-2023 | 22-April-2023 | true |
| 28-April-2023 | 21-April-2023 | true |
Is this need to be done or just have to compare it with 7 days old table? and output as true or false.
I was thinking on:
- getting table row count for today using INFORMATION_SCHEMA
- getting table row count for latest 7 days using INFORMATION_SCHEMA and time travel
- calling prophet bigfunction to forecast the row count of today given the row counts of the latest 7 days
- return an anomaly score given the distance from the prediction and the real value. The score computation must be clever.
We can also avoid prophet and use a business rule.