Error during VisibilityDeleteExecution
We getting extreme amount of logs from temporal server:
{"level":"error","ts":"2024-12-15T16:21:36.296Z","msg":"Operation failed with an error.","error":"context deadline exceeded","logging-call-at":"visiblity_manager_metrics.go:264","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/persistence/visibility.(*visibilityManagerMetrics).updateErrorMetric\n\t/home/builder/temporal/common/persistence/visibility/visiblity_manager_metrics.go:264\ngo.temporal.io/server/common/persistence/visibility.(*visibilityManagerMetrics).DeleteWorkflowExecution\n\t/home/builder/temporal/common/persistence/visibility/visiblity_manager_metrics.go:128\ngo.temporal.io/server/service/history.(*visibilityQueueTaskExecutor).processDeleteExecution\n\t/home/builder/temporal/service/history/visibility_queue_task_executor.go:494\ngo.temporal.io/server/service/history.(*visibilityQueueTaskExecutor).Execute\n\t/home/builder/temporal/service/history/visibility_queue_task_executor.go:122\ngo.temporal.io/server/service/history/queues.(*executableImpl).Execute\n\t/home/builder/temporal/service/history/queues/executable.go:236\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:223\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:211"}
{"level":"error","ts":"2024-12-15T16:21:36.304Z","msg":"Fail to process task","shard-id":1,"address":"127.0.0.1:7234","component":"visibility-queue-processor","wf-namespace-id":"064f58ee-d88c-4c7c-8b81-77b93c315829","wf-id":"*","wf-run-id":"f4dd4001-fdbd-44d7-aaf1-9c401226e546","queue-task-id":23085605,"queue-task-visibility-timestamp":"2024-12-14T13:07:44.404Z","queue-task-type":"VisibilityDeleteExecution","queue-task":{"NamespaceID":"064f58ee-d88c-4c7c-8b81-77b93c315829","WorkflowID":"*","RunID":"f4dd4001-fdbd-44d7-aaf1-9c401226e546","VisibilityTimestamp":"2024-12-14T13:07:44.404345212Z","TaskID":23085605,"Version":0,"CloseExecutionVisibilityTaskID":9663191,"StartTime":null,"CloseTime":null},"wf-history-event-id":0,"error":"context deadline exceeded","lifecycle":"ProcessingFailed","logging-call-at":"lazy_logger.go:68","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/builder/temporal/common/log/zap_logger.go:156\ngo.temporal.io/server/common/log.(*lazyLogger).Error\n\t/home/builder/temporal/common/log/lazy_logger.go:68\ngo.temporal.io/server/service/history/queues.(*executableImpl).HandleErr\n\t/home/builder/temporal/service/history/queues/executable.go:347\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:224\ngo.temporal.io/server/common/backoff.ThrottleRetry.func1\n\t/home/builder/temporal/common/backoff/retry.go:119\ngo.temporal.io/server/common/backoff.ThrottleRetryContext\n\t/home/builder/temporal/common/backoff/retry.go:145\ngo.temporal.io/server/common/backoff.ThrottleRetry\n\t/home/builder/temporal/common/backoff/retry.go:120\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/builder/temporal/common/tasks/fifo_scheduler.go:211"}
Any idea how to investigate and/or recover from this?
Expected Behavior
Not getting errors, visibility correctly updated
Actual Behavior
Getting an extrem amount of errors, we can see past events listed in temporal-ui, way after retention period. Workflows seems to be running, finishing, we can see them in temporal-ui.
Steps to Reproduce the Problem
Not sure. We did nothing special, it was working fine. We changed mysql password, temporal-service run into some access denied error, service restarted and these logs flooding since then.
Specifications
- Version: 1.22.4
After upgrading to the latest version the issue is not fixed, but got a new error:
{"level":"error","ts":"2024-12-16T20:48:10.526Z","msg":"Operation failed with an error.","error":"unable to delete custom search attributes: context deadline exceeded","logging-call-at":"/home/runner/work/docker-builds/docker-builds/temporal/common/persistence/visibility/visiblity_manager_metrics.go:195","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/log/zap_logger.go:155\ngo.temporal.io/server/common/persistence/visibility.(*visibilityManagerMetrics).updateErrorMetric\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/persistence/visibility/visiblity_manager_metrics.go:195\ngo.temporal.io/server/common/persistence/visibility.(*visibilityManagerMetrics).DeleteWorkflowExecution\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/persistence/visibility/visiblity_manager_metrics.go:129\ngo.temporal.io/server/service/history.(*visibilityQueueTaskExecutor).processDeleteExecution\n\t/home/runner/work/docke^Coff/retry.go:64\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/tasks/fifo_scheduler.go:233\ngo.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask\n\t/home/runner/work/docker-builds/docker-builds/temporal/common/tasks/fifo_scheduler.go:211"}
The number of logs emitted is considerably lower, but there are 170k rows in visibility tasks and 64k in executions_visibility (the retention period is one day, this is way more than we should have)
I have the same issue as well.
ServerVersion : 1.25.1
I am also facing similar issue.
Workflow records are maintained in executions_visibility table even after the retention period set on the database, this is leading to degraded db performance is impacting all functionalities in temporal.
Is this any way to cleanly delete the records in visibility store that have passed the retention period ?
Temporal Server Version: 1.26.2