kamal icon indicating copy to clipboard operation
kamal copied to clipboard

Suggestion: maintenance mode

Open rfunduk opened this issue 3 years ago • 12 comments

Something that is occasionally handy (when running a long/app-breaking migration, upgrading the database, making some fundamental change to configuration, or in the event of a security incident perhaps) is the ability to quickly lock down the application with a 503 and a 'be back soon' page.

Used to use nginx/apache reconfig with a local file for this (e.g. via something like https://github.com/capistrano/maintenance), but more recently I've been configuring AWS ALB to respond with a static blob (ex. using terraform variables).

Doing it via the load balancer is probably not a very portable solution (Digital Ocean doesn't appear to support it)... maybe traefik entrypoints could be used for this.

rfunduk avatar Mar 27 '23 12:03 rfunduk

Would love to see this explored. We currently do something like this, but it's Ruby specific:

if ENV["MAINTENANCE"]
  require "pathname"
  MAINTENANCE_HTML = Pathname.new(File.dirname(__FILE__)).join("public/maintenance.html").read

  run lambda { |env| [ 200, { "Content-Type" => "text/html" }, [ MAINTENANCE_HTML ] ] }
else
  ... 
end

And then set MAINTENANCE as an ENV in deploy. But I could totally see injecting this as a pre-baked image of some kind. I just don't want to pay a very high complexity price for it. So it's gotta be approximately as simple as what's above.

dhh avatar Mar 28 '23 12:03 dhh

Ah, you do this in config.ru, and just bypass the whole rails app to serve the maintenance page? That seems pretty straight-forward (and clever!). Personally I think I'd be happy with that.

I agree that it seems like it might take some ugliness to bake this in to mrsk with some kind of sidecar container or traefik hacks. Maybe I'll attempt it, just to compare. It would be cool to have a built-in solution.

rfunduk avatar Mar 28 '23 12:03 rfunduk

I think actually a basic solution here could be to have a mrsk-owned image that just does what ENV["MAINTENANCE"] does. Injects a Rack app that runs that lambda, with a way to overwrite the maintenance.html screen, and swaps that container in instead of the app one when you go into maintenance mode. Might be pretty simple. Do investigate if you like!

dhh avatar Mar 28 '23 13:03 dhh

I don't think a mrsk-owned image is really necessary since this basically covers what's needed:

docker run --rm --name maintenance -v $PWD/index.html:/usr/share/nginx/html/index.html -p 3000:80 nginx:mainline-alpine-slim

This image is ridiculously tiny!

Haven't got much further than this so far. I fiddled with sort of hijacking the app:boot process but it got gross really quick. Maybe mrsk maintenance on or similar would be preferable, like a first-class concept... honestly your config.ru strategy seems like enough, and no doubt could be done similarly with stacks other than Ruby.

rfunduk avatar Mar 29 '23 13:03 rfunduk

@rfunduk Would be happy to see that explored too! We can totally wrap that up as a concept under mrsk maintenance. Please do explore it! Need to make sure it catches all verbs and all paths, so maybe you need a bit of config injection too, but shouldn't be a far reach.

dhh avatar Mar 29 '23 14:03 dhh

As a comparison, I was able to use mrsk accessories to configure a maintenance service.

Using traefik label to specify routing priority to maintenance page, I'm able to lock down with 503 and maintenance page with mrsk accessory boot maintenance.

accessories:
  maintenance:
    image: nginx:mainline-alpine-slim
    roles:
      - web
    port: 3000:80
    files:
      - config/nginx.conf:/etc/nginx/nginx.conf
      - public/maintenance.html:/usr/share/nginx/html/index.html
    labels:
      traefik.http.routers.maintenance.rule: PathPrefix(`/`)
      traefik.http.routers.maintenance.priority: 99

Jberczel avatar May 17 '23 01:05 Jberczel

@dhh @djmb example provided by @Jberczel looks good. Is this something we want to include to MRSK? I think it would be useful.

igor-alexandrov avatar Jul 20 '23 12:07 igor-alexandrov

You have to make /up respond with 200 during the maintenance too.

On January 11, 2024, GitHub @.***> wrote:

@dhh https://github.com/dhh Curious how you combine your approach to maintenance pages with the /up health check. Since you're not loading the Rails app, the health check would fail?

— Reply to this email directly, view it on GitHub https://github.com/basecamp/kamal/issues/162#issuecomment-1886716607, or unsubscribe <https://github.com/notifications/unsubscribe- auth/AAAAVNMWMCQOQBMTK3ZYZITYN6WZPAVCNFSM6AAAAAAWJDNU42VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBWG4YTMNRQG4>. You are receiving this because you were mentioned.Message ID: @.***>

dhh avatar Jan 11 '24 15:01 dhh

we can add a location input to nginx.conf for healthcheck

# used for kamal healthcheck
location /up {
    return 200 "OK"
}

im currently using the turnout gem, but it gets a bit tricky when running app on multiple servers, the example provided by @Jberczel looks great

acidtib avatar Jan 11 '24 20:01 acidtib

Took a crack at it, and expanded on @Jberczel 's example

A Kamal accessory, you can use custom html, and set the app healthpath (caddy replies with 200)

accessories:
  maintenance:
    image: ghcr.io/acidtib/kamal-maintenance:latest
    roles:
      - web
    port: "3000:80"
    env:
      clear:
        HEALTHCHECK_PATH: /healthz
        LOGS_ENABLED: true
    labels:
      traefik.http.routers.maintenance.rule: PathPrefix(`/`)
      traefik.http.routers.maintenance.priority: 99
    files:
      - public/maintenance.html:/usr/share/caddy/index.html

https://github.com/acidtib/kamal-maintenance

acidtib avatar Jan 12 '24 08:01 acidtib

For anyone considering using a 200 response, keep in mind that any other requests like incoming 'payment received' webhooks, API POST requests, etc will be considered successful and won't be retried. This is probably not what you want.

I suggest only returning 200 for the maintenance page and 503 for everything else.

marckohlbrugge avatar Jan 22 '24 13:01 marckohlbrugge

Even for the maintenance page I would use 503 so robots don't index the page.

n-studio avatar May 09 '24 20:05 n-studio