Self-hosted runs not completing
Provide environment information
System: OS: Linux 6.8 Ubuntu 24.04.1 LTS 24.04.1 LTS (Noble Numbat) CPU: (8) x64 Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz Memory: 14.35 GB / 31.29 GB Container: Yes Shell: 5.2.21 - /bin/bash Binaries: Node: 18.19.1 - /usr/bin/node npm: 9.2.0 - /usr/bin/npm
Describe the bug
Some task are stuck indefinitely in waiting for no reasons. But it's random because it wont always be stuck
Reproduction repo
No idea
To reproduce
No idea on how to reproduce but maybe I am using something wrong, there's my code:
import { prisma } from "@dimension/core-lib/src/prisma"
import { Shop, ShopCookie } from "@dimension/database-main"
import { tiktokMessages } from "@dimension/tiktok-messages"
import { transformCookie } from "@dimension/tiktok-support-messages"
import { logger, schedules, task } from "@trigger.dev/sdk/v3"
import { handleError, handleMaxDuration, maxDurationPending } from "../lib/error"
import { getActiveShops, wrapCronJob } from "../lib/utils"
const isEnabled = true
const maxDurationWarning = 1000 * 60 * 20 // 20 minutes
const name = "Process shops messages campaigns"
const cron = "0 * * * *" // Every hour
export const processShopMessagesCampaigns = task({
id: "process-shop-messages-campaigns",
run: async ({ shop }: { shop: Shop & { cookie: ShopCookie | null } }) => {
/* Some DB operations */
})
export const processShopsMessagesCampaigns = schedules.task({
id: "process-shops-messages-campaigns",
cron: isEnabled ? cron : undefined,
run: async () => {
const now = new Date()
const main = async () => {
const shops = await getActiveShops(true)
if (!shops.length) {
logger.log("No shops found")
return
}
await processShopMessagesCampaigns.batchTriggerAndWait(
shops.map((shop) => ({
payload: { shop },
options: {
tags: [shop.slug],
},
}))
)
}
const checkDuration = maxDurationPending(name, maxDurationWarning)
await main()
},
})
Additional information
No response
This is the only problem I am encountering but it is very problematic since I made a policy of non overlapping crons this is blocking the whole process for the new crons
This is a self-hosted deployment
For me worker container can't connect to coordinator using websocket, because of this run gets stuck. @rharkor could you please share your compose file?
services:
trigger:
image: ghcr.io/triggerdotdev/trigger.dev:v3
environment:
REMIX_APP_PORT: 3000
NODE_ENV: production
RUNTIME_PLATFORM: docker-compose
V3_ENABLED: true
TRIGGER_TELEMETRY_DISABLED: 1
INTERNAL_OTEL_TRACE_DISABLED: 1
INTERNAL_OTEL_TRACE_LOGGING_ENABLED: 0
POSTGRES_USER: $POSTGRES_USER
POSTGRES_PASSWORD: $POSTGRES_PASSWORD
POSTGRES_DB: $POSTGRES_DB
MAGIC_LINK_SECRET: $MAGIC_LINK_SECRET
SESSION_SECRET: $SESSION_SECRET
ENCRYPTION_KEY: $ENCRYPTION_KEY
PROVIDER_SECRET: $PROVIDER_SECRET
COORDINATOR_SECRET: $COORDINATOR_SECRET
DATABASE_URL: 'postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@postgresql:5432/$POSTGRES_DB?sslmode=disable'
DIRECT_URL: 'postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@postgresql:5432/$POSTGRES_DB?sslmode=disable'
REDIS_HOST: redis
REDIS_PORT: 6379
REDIS_TLS_DISABLED: true
COORDINATOR_HOST: 127.0.0.1
COORDINATOR_PORT: 9020
WHITELISTED_EMAILS: ''
ADMIN_EMAILS: $ADMIN_EMAILS
DEFAULT_ORG_EXECUTION_CONCURRENCY_LIMIT: 300
DEFAULT_ENV_EXECUTION_CONCURRENCY_LIMIT: 100
DEPLOY_REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
DEPLOY_REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
EMAIL_TRANSPORT: $EMAIL_TRANSPORT
FROM_EMAIL: $FROM_EMAIL
REPLY_TO_EMAIL: $REPLY_TO_EMAIL
SMTP_HOST: $SMTP_HOST
SMTP_PORT: $SMTP_PORT
SMTP_SECURE: $SMTP_SECURE
SMTP_USER: $SMTP_USER
SMTP_PASSWORD: $SMTP_PASSWORD
LOGIN_ORIGIN: ${SERVICE_FQDN_TRIGGER}
APP_ORIGIN: ${SERVICE_FQDN_TRIGGER}
DEV_OTEL_EXPORTER_OTLP_ENDPOINT: '$SERVICE_FQDN_TRIGGER/otel'
ELECTRIC_ORIGIN: 'http://electric:3000'
networks:
- trigger
depends_on:
postgresql:
condition: service_healthy
redis:
condition: service_healthy
electric:
condition: service_healthy
healthcheck:
test: "timeout 10s bash -c ':> /dev/tcp/127.0.0.1/3000' || exit 1"
interval: 10s
timeout: 5s
retries: 5
docker-provider:
image: ghcr.io/triggerdotdev/provider/docker:v3
platform: linux/amd64
volumes:
- /var/run/docker.sock:/var/run/docker.sock
user: root
networks:
- trigger
depends_on:
trigger:
condition: service_healthy
environment:
HTTP_SERVER_PORT: 9020
PLATFORM_HOST: trigger
PLATFORM_WS_PORT: 3000
PLATFORM_SECRET: $PROVIDER_SECRET
SECURE_CONNECTION: false
COORDINATOR_HOST: 127.0.0.1
COORDINATOR_PORT: 9020
DOCKER_NETWORK: trigger
REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
FORCE_CHECKPOINT_SIMULATION: 0
ENFORCE_MACHINE_PRESETS: true
OTEL_EXPORTER_OTLP_ENDPOINT: '$SERVICE_FQDN_TRIGGER/otel'
healthcheck:
test:
- CMD
- node
- '-e'
- "require('http').get('http://127.0.0.1:9020/health', (r) => {if (r.statusCode !== 200) process.exit(1); else process.exit(0); }).on('error', () => process.exit(1))"
interval: 5s
coordinator:
image: ghcr.io/triggerdotdev/coordinator:v3
platform: linux/amd64
volumes:
- /var/run/docker.sock:/var/run/docker.sock
user: root
networks:
- trigger
ports:
- '127.0.0.1:9020:9020'
depends_on:
trigger:
condition: service_healthy
environment:
HTTP_SERVER_PORT: 9020
PLATFORM_HOST: trigger
PLATFORM_WS_PORT: 3000
PLATFORM_SECRET: $PROVIDER_SECRET
SECURE_CONNECTION: false
COORDINATOR_HOST: 127.0.0.1
COORDINATOR_PORT: 9020
REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
FORCE_CHECKPOINT_SIMULATION: 0
OTEL_EXPORTER_OTLP_ENDPOINT: '$SERVICE_FQDN_TRIGGER/otel'
healthcheck:
test:
- CMD
- node
- '-e'
- "require('http').get('http://127.0.0.1:9020/health', (r) => {if (r.statusCode !== 200) process.exit(1); else process.exit(0); }).on('error', () => process.exit(1))"
interval: 5s
electric:
image: electricsql/electric:latest
environment:
DATABASE_URL: 'postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@postgresql:5432/$POSTGRES_DB?sslmode=disable'
networks:
- trigger
depends_on:
postgresql:
condition: service_healthy
healthcheck:
test: 'curl --fail http://127.0.0.1:3000/v1/health || exit 1'
interval: 10s
retries: 5
start_period: 10s
timeout: 10s
redis:
image: redis:7
networks:
- trigger
healthcheck:
test:
- CMD-SHELL
- 'redis-cli ping | grep PONG'
interval: 1s
timeout: 3s
retries: 5
volumes:
- redis-data:/data
postgresql:
image: postgres:16-alpine
volumes:
- postgresql-data:/var/lib/postgresql/data/
networks:
- trigger
environment:
POSTGRES_USER: $POSTGRES_USER
POSTGRES_PASSWORD: $POSTGRES_PASSWORD
POSTGRES_DB: $POSTGRES_DB
command:
- -c
- wal_level=logical
healthcheck:
test:
- CMD-SHELL
- 'pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}'
interval: 5s
timeout: 20s
retries: 10
volumes:
postgresql-data:
redis-data:
networks:
trigger:
name: trigger
external: true
his is a self-hosted deployment
Sorry forgot to mention in the title 🙏
because of this run gets s
I am using the app and worker separately, so I don't really know which one you want, also to mention that everything works fin 99.5% of the time.
because of this run gets s
I am using the app and worker separately, so I don't really know which one you want, also to mention that everything works fin 99.5% of the time.
I am running them on the same server, but would be great to see any working configuration. Pulling my hair for the last 3 days)
because of this run gets s
I am using the app and worker separately, so I don't really know which one you want, also to mention that everything works fin 99.5% of the time.
I am running them on the same server, but would be great to see any working configuration. Pulling my hair for the last 3 days)
Okay so this is my config:
docker-compose.webapp.yml
x-env: &webapp-env
LOGIN_ORIGIN: https://${TRIGGER_DOMAIN:?Please set this in your .env file}
APP_ORIGIN: https://${TRIGGER_DOMAIN}
DEV_OTEL_EXPORTER_OTLP_ENDPOINT: https://${TRIGGER_DOMAIN}/otel
ELECTRIC_ORIGIN: http://electric:3000
volumes:
postgres-data:
redis-data:
networks:
default:
services:
webapp:
image: ghcr.io/triggerdotdev/trigger.dev:${TRIGGER_IMAGE_TAG:-v3}
restart: ${RESTART_POLICY:-unless-stopped}
env_file:
- .env
environment:
<<: *webapp-env
ports:
- ${WEBAPP_PUBLISH_IP:-127.0.0.1}:3040:3030
depends_on:
- postgres
- redis
networks:
- default
postgres:
image: postgres:${POSTGRES_IMAGE_TAG:-16}
restart: ${RESTART_POLICY:-unless-stopped}
volumes:
- postgres-data:/var/lib/postgresql/data/
env_file:
- .env
networks:
- default
ports:
- ${DOCKER_PUBLISH_IP:-127.0.0.1}:5433:5432
command:
- -c
- wal_level=logical
redis:
image: redis:${REDIS_IMAGE_TAG:-7}
restart: ${RESTART_POLICY:-unless-stopped}
volumes:
- redis-data:/data
networks:
- default
ports:
- ${DOCKER_PUBLISH_IP:-127.0.0.1}:6389:6379
electric:
image: electricsql/electric:${ELECTRIC_IMAGE_TAG:-latest}
restart: ${RESTART_POLICY:-unless-stopped}
environment:
DATABASE_URL: $DATABASE_URL
networks:
- default
depends_on:
- postgres
ports:
- ${DOCKER_PUBLISH_IP:-127.0.0.1}:3061:3000
docker-compoe.worker.yml
x-env: &worker-env
PLATFORM_HOST: ${TRIGGER_DOMAIN:?Please set this in your .env file}
PLATFORM_WS_PORT: 443
SECURE_CONNECTION: "true"
OTEL_EXPORTER_OTLP_ENDPOINT: https://${TRIGGER_DOMAIN}/otel
networks:
default:
services:
docker-provider:
image: ghcr.io/triggerdotdev/provider/docker:${TRIGGER_IMAGE_TAG:-v3}
restart: ${RESTART_POLICY:-unless-stopped}
volumes:
- /var/run/docker.sock:/var/run/docker.sock
user: root
networks:
- default
ports:
- ${DOCKER_PUBLISH_IP:-127.0.0.1}:9021:9020
env_file:
- .env
environment:
<<: *worker-env
PLATFORM_SECRET: $PROVIDER_SECRET
coordinator:
image: ghcr.io/triggerdotdev/coordinator:${TRIGGER_IMAGE_TAG:-v3}
restart: ${RESTART_POLICY:-unless-stopped}
volumes:
- /var/run/docker.sock:/var/run/docker.sock
user: root
networks:
- default
ports:
- ${DOCKER_PUBLISH_IP:-127.0.0.1}:9020:9020
env_file:
- .env
environment:
<<: *worker-env
PLATFORM_SECRET: $COORDINATOR_SECRET
@murshudov did you manage to solve this? I'm also having the same issue.
@eth0izzle Yes, it is working for us quite well. I forgot all steps required, but can share updated compose file if needed
@eth0izzle Also our deployment is on Coolify
@murshudov yes also using coolify which I think where the issue is but I'm tearing my hair out. Your compose file would be much appreciated!
@eth0izzle I will try my best to explain.
docker-compose.yml
services:
trigger:
image: ghcr.io/triggerdotdev/trigger.dev:v3
environment:
SERVICE_FQDN_TRIGGER_3030:
PORT: 3030
REMIX_APP_PORT: 3030
NODE_ENV: production
RUNTIME_PLATFORM: docker-compose
V3_ENABLED: true
TRIGGER_TELEMETRY_DISABLED: 1
INTERNAL_OTEL_TRACE_DISABLED: 1
INTERNAL_OTEL_TRACE_LOGGING_ENABLED: 0
POSTGRES_USER: $POSTGRES_USER
POSTGRES_PASSWORD: $POSTGRES_PASSWORD
POSTGRES_DB: $POSTGRES_DB
MAGIC_LINK_SECRET: $MAGIC_LINK_SECRET
SESSION_SECRET: $SESSION_SECRET
ENCRYPTION_KEY: $ENCRYPTION_KEY
PROVIDER_SECRET: $PROVIDER_SECRET
COORDINATOR_SECRET: $COORDINATOR_SECRET
DATABASE_URL: 'postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@postgres:5432/$POSTGRES_DB?sslmode=disable'
DIRECT_URL: 'postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@postgres:5432/$POSTGRES_DB?sslmode=disable'
REDIS_HOST: redis
REDIS_PORT: 6379
REDIS_TLS_DISABLED: true
COORDINATOR_HOST: 127.0.0.1
COORDINATOR_PORT: 9020
WHITELISTED_EMAILS: ''
ADMIN_EMAILS: $ADMIN_EMAILS
DEFAULT_ORG_EXECUTION_CONCURRENCY_LIMIT: 300
DEFAULT_ENV_EXECUTION_CONCURRENCY_LIMIT: 100
DEPLOY_REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
DEPLOY_REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
EMAIL_TRANSPORT: $EMAIL_TRANSPORT
FROM_EMAIL: $FROM_EMAIL
REPLY_TO_EMAIL: $REPLY_TO_EMAIL
SMTP_HOST: $SMTP_HOST
SMTP_PORT: $SMTP_PORT
SMTP_SECURE: $SMTP_SECURE
SMTP_USER: $SMTP_USER
SMTP_PASSWORD: $SMTP_PASSWORD
LOGIN_ORIGIN: ${SERVICE_FQDN_TRIGGER}
APP_ORIGIN: ${SERVICE_FQDN_TRIGGER}
DEV_OTEL_EXPORTER_OTLP_ENDPOINT: '$SERVICE_FQDN_TRIGGER_3030/otel'
ELECTRIC_ORIGIN: 'http://electric:3000'
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
electric:
condition: service_healthy
healthcheck:
test: "timeout 10s bash -c ':> /dev/tcp/127.0.0.1/3030' || exit 1"
interval: 10s
timeout: 5s
retries: 5
docker-provider:
image: ghcr.io/triggerdotdev/provider/docker:v3
platform: linux/amd64
volumes:
- /var/run/docker.sock:/var/run/docker.sock
user: root
depends_on:
trigger:
condition: service_healthy
environment:
HTTP_SERVER_PORT: 9020
PLATFORM_HOST: trigger
PLATFORM_WS_PORT: 3030
PLATFORM_SECRET: $PROVIDER_SECRET
SECURE_CONNECTION: false
COORDINATOR_HOST: coordinator
COORDINATOR_PORT: 9020
FORCE_CHECKPOINT_SIMULATION: 0
DOCKER_NETWORK: $DOCKER_NETWORK
ENFORCE_MACHINE_PRESETS: true
OTEL_EXPORTER_OTLP_ENDPOINT: '$SERVICE_FQDN_TRIGGER_3030/otel'
healthcheck:
test:
- CMD
- node
- '-e'
- "require('http').get('http://127.0.0.1:9020/health', (r) => {if (r.statusCode !== 200) process.exit(1); else process.exit(0); }).on('error', () => process.exit(1))"
interval: 5s
coordinator:
image: ghcr.io/triggerdotdev/coordinator:v3
platform: linux/amd64
volumes:
- /var/run/docker.sock:/var/run/docker.sock
user: root
depends_on:
trigger:
condition: service_healthy
environment:
HTTP_SERVER_PORT: 9020
PLATFORM_HOST: trigger
PLATFORM_WS_PORT: 3030
PLATFORM_SECRET: $COORDINATOR_SECRET
SECURE_CONNECTION: false
REGISTRY_HOST: $DEPLOY_REGISTRY_HOST
REGISTRY_NAMESPACE: $DEPLOY_REGISTRY_NAMESPACE
FORCE_CHECKPOINT_SIMULATION: 0
OTEL_EXPORTER_OTLP_ENDPOINT: '$SERVICE_FQDN_TRIGGER_3030/otel'
healthcheck:
test:
- CMD
- node
- '-e'
- "require('http').get('http://127.0.0.1:9020/health', (r) => {if (r.statusCode !== 200) process.exit(1); else process.exit(0); }).on('error', () => process.exit(1))"
interval: 5s
electric:
image: electricsql/electric:latest
environment:
DATABASE_URL: 'postgres://$POSTGRES_USER:$POSTGRES_PASSWORD@postgres:5432/$POSTGRES_DB?sslmode=disable'
ELECTRIC_INSECURE: true
depends_on:
postgres:
condition: service_healthy
healthcheck:
test: 'curl --fail http://127.0.0.1:3000/v1/health || exit 1'
interval: 10s
retries: 5
start_period: 10s
timeout: 10s
redis:
image: redis:7
healthcheck:
test:
- CMD-SHELL
- 'redis-cli ping | grep PONG'
interval: 1s
timeout: 3s
retries: 5
volumes:
- redis-data:/data
postgres:
image: postgres:16-alpine
volumes:
- postgres-data:/var/lib/postgresql/data/
environment:
POSTGRES_USER: $POSTGRES_USER
POSTGRES_PASSWORD: $POSTGRES_PASSWORD
POSTGRES_DB: $POSTGRES_DB
command:
- -c
- wal_level=logical
healthcheck:
test:
- CMD-SHELL
- 'pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}'
interval: 5s
timeout: 20s
retries: 10
volumes:
postgres-data:
redis-data:
Env vars for Coolify:
SERVICE_FQDN_TRIGGER=https://trigger-dev.example.com
SERVICE_FQDN_TRIGGER_3030=https://trigger-dev.example.com
DEPLOY_REGISTRY_HOST=docker.io
DEPLOY_REGISTRY_NAMESPACE=your docker hub user name
DOCKER_NETWORK=rkscwg0c048gc4wcs8og48o4 # This one is network name created by coolify. It is in your trigger service url at the end: /service/rkscwg0c048gc4wcs8og48o4
POSTGRES_DB=trigger
POSTGRES_PASSWORD=your postgres password
POSTGRES_USER=trigger
[email protected]
EMAIL_TRANSPORT=smtp
[email protected]
[email protected]
SMTP_HOST=email-smtp.eu-west-2.amazonaws.com
SMTP_PORT=465
SMTP_SECURE=true
SMTP_USER=AAAAZLNAJ5X6YVUHHPPP
SMTP_PASSWORD=your smpt password
# Lengths are important!!! 32 and 64 chars
MAGIC_LINK_SECRET=cccce85a1f7b9dbbbbeeeefcb234aaaa
SESSION_SECRET=cccce85a1f7b9dbbbbeeeefcb234aaaa
ENCRYPTION_KEY=cccce85a1f7b9dbbbbeeeefcb234aaaa
COORDINATOR_SECRET=oG2iIa0pkqNIu2E7Dr0hLNa7i3OnXxBbUbawz3ZoG2iIa0pkqNIu2E7Dr0hLNa7i
PROVIDER_SECRET=oG2iIa0pkqNIu2E7Dr0hLNa7i3OnXxBbUbawz3ZoG2iIa0pkqNIu2E7Dr0hLNa7i
These are working files for us in production. It is a single server setup.
So issue was the following:
docker-provider creates worker containers for your tasks. But those containers get connected to host network by default and coordinator can't reach them, because coordinator is inside the network created by Coolify.
By specifying DOCKER_NETWORK on the docker-provider we create containers attached to the network created by Coolify so everyone can talk to anyone on this network.
I may miss some details, but everything is created inside Coolify network by default, except the dynamically created worker containers. By this change introduced by @Mortalife these worker also get connected to Coolify network and everyone is happy.
Hope this helps.
I'm going to close this as we're going to release v4 self-hosting in the next couple of weeks. Link to the excellent (unofficial) coolify self-hosting docs: https://github.com/Mortalife/trigger-dev-coolify