trigger.dev icon indicating copy to clipboard operation
trigger.dev copied to clipboard

bug: Tasks stuck in queue and duplicated indefinitely after nightly server restart

Open lpkobamn opened this issue 1 year ago • 6 comments

Provide environment information

System: OS: Linux 6.5 Ubuntu 22.04.4 LTS 22.04.4 LTS (Jammy Jellyfish) CPU: (4) x64 unknown Memory: 14.89 GB / 19.34 GB Container: Yes Shell: 5.1.16 - /bin/bash Binaries: Node: 22.6.0 - ~/.nvm/versions/node/v22.6.0/bin/node npm: 10.8.2 - ~/.nvm/versions/node/v22.6.0/bin/npm bun: 1.1.22 - ~/.bun/bin/bun

Describe the bug

I'm running a self-hosted Trigger.dev, following the setup instructions [here](https://trigger.dev/docs/open-source-self-hosting) and deploying using [triggerdotdev/docker](https://github.com/triggerdotdev/docker).

The issue arises every night after server restart:

  1. Random task gets stuck in the queued state and does not execute.
  2. The same task repeatedly appears in the queue, leading to thousands of duplicates over time (see the attached screenshots).
  3. Manual cancellation of all queued tasks is the only way to allow the task to start properly again. However, canceling 3,800+ tasks manually is time-consuming and impractical.

I've already tried the following steps without success:

  • Running ./stop.sh, ./update.sh, and ./start.sh.
  • Ensuring I'm on the latest version of the self-hosted stack.

Steps to Reproduce:

  1. Run Trigger.dev self-hosted.
  2. Restart server while some task is executing.
  3. Observe tasks getting stuck in queued and duplicated indefinitely.

Expected Behavior:

  • The task should either resume or fail cleanly after restart.
  • Queued tasks should not duplicate endlessly.

Screenshots:

  • Tasks Dashboard: Showing 3800+ queued tasks.
  • Task Runs List: Evidence of duplication and stalled executions.

Environment Details:

  • Trigger.dev version: Latest (as of 16 December 2024)
  • Deployment method: Self-hosted via Docker ([triggerdotdev/docker](https://github.com/triggerdotdev/docker))
  • Container environment: LXC on Proxmox

Additional Information: Please advise where I should look to troubleshoot this issue further:

  1. Could this be related to database locking or an issue with worker recovery after restart?
  2. Are there configurations or logs I should check to identify the root cause?
  3. Is there a way to bulk cancel thousands of queued tasks efficiently?

Any guidance on fixing this or preventing task duplication would be greatly appreciated.


Thank you!
Attachments: (Screenshots included)

  1. Tasks Dashboard view.
  2. Task Runs list view.

Let me know if you need more details!

Reproduction repo

https://github.com/triggerdotdev/docker

To reproduce

  1. Run Trigger.dev self-hosted.
  2. Restart server while some task is executing.
  3. Observe tasks getting stuck in queued and duplicated indefinitely.

Additional information

2024-12-16_11-56-04 2024-12-16_11-58-54

lpkobamn avatar Dec 16 '24 09:12 lpkobamn

My problem is the task stays at QUEUE for a very long time (+2mins), and it's the only task that's running so it doesn't make sense to take that long

yassineatik avatar Feb 06 '25 01:02 yassineatik

the problem remained even after updating to the latest version, you have to delete the stack and volumes every time, restart the stack and build functions. I don't understand where to ask for help, there was no feedback from the developer.

lpkobamn avatar Feb 06 '25 05:02 lpkobamn

@lpkobamn does that mean you lost all user's, settings, previous jobs alert integrations? I'm not sure how to proceed I have the same issue after changing the docker registry from docker hub to GitHub's registry.

unckleg avatar Feb 06 '25 12:02 unckleg

You can reach them via discord: https://trigger.dev/docs/community

yassineatik avatar Feb 06 '25 12:02 yassineatik

My guess is you need to login the Docker registry again. There's some discussion about this in the community including some scripts you can use.

matt-aitken avatar Feb 06 '25 13:02 matt-aitken

My guess is you need to login the Docker registry again. There's some discussion about this in the community including some scripts you can use.

Your message has nothing to do with the problem. The registry has absolutely nothing to do with it, the problem is in the hanging tasks, which is solved only after manually canceling all the hung ones or completely reinstalling the stack.

lpkobamn avatar Mar 09 '25 09:03 lpkobamn