versatile-data-kit icon indicating copy to clipboard operation
versatile-data-kit copied to clipboard

False positive notifications on service start

Open tpalashki opened this issue 4 years ago • 0 comments

Describe the bug A false-positive notification is sometimes sent upon service (re)start.

Steps To Reproduce Note: this problem does not always reproduce.

  • Restart the service
  • Observe the sent notifications for jobs that executed in the past.

Expected behavior To not send a false-positive notifications on service restart.

Additional context The origins of this bug stem from the following logic in the service: On service restart we iterate over all Kubernetes jobs in order to sync the internal data job execution state with what could have happened while the service was not running. During this process we also update the datajob_termination_status metrics. It could happen that the last k8s job of a particular data job that we sync is not the last execution that actually happened. Since we don't keep track of executions chronologically, we expose an execution that happened in the past and send a notification as a result.

Suggested solution Deprecate the last_termination_status and execution_id kept in the data_job table and base the metrics on the data_job_execution table. This way when we receive information about older execution it won't affect the exposed metrics, as they will be always based on the last entry in the data_job_execution table.

tpalashki avatar Nov 03 '21 11:11 tpalashki