copilot-cli icon indicating copy to clipboard operation
copilot-cli copied to clipboard

Health check failure for next js app

Open willredington opened this issue 2 years ago • 14 comments

The problem: health checks are failing every time

What I think the issue is:

  • the service isn't getting hit by whatever is doing the health check (logs in ECS do not show the container service being hit, only the start of "ready in x milliseconds")

Things I've tried:

  • deploying a basic nginx container on port 80 (works)
  • removing the HOSTNAME specification in the dockerfile (does not work)

I've got a fairly standard Next JS app that just serves some static content. I've confirmed I can build and run the docker image locally, and calls to /8080 return a 200 response. Additionally there is middleware that logs any requests (including the health check), here is a snapshot of running it locally

image

Here is the docker file for my next js app for reference (although I've confirmed it builds and runs locally)

# https://github.com/vercel/next.js/blob/canary/examples/with-docker/Dockerfile
FROM node:18-alpine AS base

# Install dependencies only when needed
FROM base AS deps
# Check https://github.com/nodejs/docker-node/tree/b4117f9333da4138b03a546ec926ef50a31506c3#nodealpine to understand why libc6-compat might be needed.
RUN apk add --no-cache libc6-compat
WORKDIR /app

# Install dependencies based on the preferred package manager
COPY package.json yarn.lock* package-lock.json* pnpm-lock.yaml* ./
RUN \
  if [ -f yarn.lock ]; then yarn --frozen-lockfile; \
  elif [ -f package-lock.json ]; then npm ci; \
  elif [ -f pnpm-lock.yaml ]; then yarn global add pnpm && pnpm i --frozen-lockfile; \
  else echo "Lockfile not found." && exit 1; \
  fi


# Rebuild the source code only when needed
FROM base AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .

# Next.js collects completely anonymous telemetry data about general usage.
# Learn more here: https://nextjs.org/telemetry
# Uncomment the following line in case you want to disable telemetry during the build.
ENV NEXT_TELEMETRY_DISABLED 1

RUN npm run build

# Production image, copy all the files and run next
FROM base AS runner
WORKDIR /app

ENV NODE_ENV production
# Uncomment the following line in case you want to disable telemetry during runtime.
ENV NEXT_TELEMETRY_DISABLED 1

RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs

COPY --from=builder /app/public ./public

# Set the correct permission for prerender cache
RUN mkdir .next
RUN chown nextjs:nodejs .next

# Automatically leverage output traces to reduce image size
# https://nextjs.org/docs/advanced-features/output-file-tracing
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static

USER nextjs

EXPOSE 8080

ENV PORT 8080
# set hostname to localhost
ENV HOSTNAME "0.0.0.0"

CMD ["node", "server.js"]

And here is my manifest file for my service I want to deploy:

# The manifest for the "demo-service" service.
# Read the full specification for the "Load Balanced Web Service" type at:
#  https://aws.github.io/copilot-cli/docs/manifest/lb-web-service/

# Your service name will be used in naming your resources like log groups, ECS services, etc.
name: demo-service
type: Load Balanced Web Service

# Distribute traffic to your service.
http:
  # Requests to this path will be forwarded to your service.
  # To match all requests you can use the "/" path.
  path: "/"
  # You can specify a custom health check path. The default is "/".
  healthcheck:
    path: "/"
    port: 8080
    interval: 180s
    timeout: 120s

# Configuration for your containers and service.
image:
  # Docker build arguments. For additional overrides: https://aws.github.io/copilot-cli/docs/manifest/lb-web-service/#image-build
  build: Dockerfile
  # Port exposed through your container to route traffic to it.
  port: 8080

cpu: 256 # Number of CPU units for the task.
memory: 512 # Amount of memory in MiB used by the task.
count: 1 # Number of tasks that should be running in your service.
exec: true # Enable running commands in your container.
network:
  connect: true # Enable Service Connect for intra-environment traffic between services.

The health checks fail each time, any advice would be greatly appreciated. Another thing to add, the service starts up in less than a second, so timing doesn't appear to be the issue

willredington avatar Oct 13 '23 20:10 willredington

I've tried deploying a basic nginx image, and it works fine, but it appears anything not on 80 seems to fail

willredington avatar Oct 13 '23 20:10 willredington

Hello @willredington. Can you clarify "The health checks fail each time"? Did you get an error during deployment? Or where did you find the error?

iamhopaul123 avatar Oct 13 '23 20:10 iamhopaul123

@iamhopaul123 here is the health check error I am getting image

willredington avatar Oct 13 '23 21:10 willredington

Hello @willredington. The manifest looks good to me and I can't find anything in the dockerfile either. Is it possible that 8080 is not handled properly in your app code?

iamhopaul123 avatar Oct 13 '23 21:10 iamhopaul123

@willredington Im seeing pretty much the same thing. I was able to deploy nextjs version 13.4.12 without this issue though.

Tjhayhay avatar Nov 08 '23 18:11 Tjhayhay

Were you able to solve this issue? I'm facing the same issue but using port 3000.

BrandonEscamilla avatar Nov 10 '23 16:11 BrandonEscamilla

unfortunately no, I ended up using AWS cdk to deploy to ECS, worked fine

willredington avatar Nov 11 '23 16:11 willredington

I ended up opting for the Request-Driven Web Service instead of the load balancer, and it worked flawlessly. This information may be helpful for anyone encountering a similar issue.

BrandonEscamilla avatar Nov 13 '23 16:11 BrandonEscamilla

I just spent many hours troubleshooting an issue with identical symptoms to yours. Disabling service connect allowed my NextJS to receive traffic so it could pass health checks and serve traffic through the ALB.

network:
  connect: false # Enable Service Connect for intra-environment traffic between services.

rubiconjosh avatar Dec 09 '23 13:12 rubiconjosh

I have never used Service Connect. Should we set our hostname in NextJS to the service discovery name?

I have spent too much time on this problem and do not need Service Connect so I am not going to try to find a configuration that will work.

rubiconjosh avatar Dec 09 '23 13:12 rubiconjosh

Hello @rubiconjosh. Sorry about the confusion.

Should we set our hostname in NextJS to the service discovery name?

To give you an example, the service connect name will just be the service name. However, since service discovery is always enabled, you should be able to use the service discovery endpoint as well. Screenshot 2023-12-11 at 2 24 42 PM

iamhopaul123 avatar Dec 11 '23 22:12 iamhopaul123

I just spent many hours troubleshooting an issue with identical symptoms to yours. Disabling service connect allowed my NextJS to receive traffic so it could pass health checks and serve traffic through the ALB.

network:
  connect: false # Enable Service Connect for intra-environment traffic between services.

I was facing the same issue. In my case I upgraded NextJS and my deployment broke. Removing network: connect: true made it work. Not sure what is going on but it's a huge headache

acrinklaw avatar Jan 02 '24 22:01 acrinklaw

glad you found a way to fix it 🎉

KollaAdithya avatar Jan 03 '24 00:01 KollaAdithya

This issue is stale because it has been open 60 days with no response activity. Remove the stale label, add a comment, or this will be closed in 14 days.

github-actions[bot] avatar Mar 06 '24 00:03 github-actions[bot]

This issue is closed due to inactivity. Feel free to reopen the issue if you have any further questions!

github-actions[bot] avatar Mar 21 '24 00:03 github-actions[bot]