snekomatic icon indicating copy to clipboard operation
snekomatic copied to clipboard

Graceful shutdown / SIGTERM

Open njsmith opened this issue 6 years ago • 9 comments

When Heroku decides to restart our app (e.g. for an upgrade, or just automatically once a day), it sends SIGTERM and then waits 30 seconds for it to exit.

Right now, we don't have any SIGTERM handling at all, so

This creates some race conditions: for example, suppose that a PR is merged into this repo by a new contributor. This kicks off two processes:

  • snekomatic sees that it's a new contributor's first PR, and tries to invite them and post a comment
  • heroku sees that there's a new commit in snekomatic, and tries to reboot snekomatic to pick up the changes

In practice we kinda get away with it, because it takes heroku a few seconds to notice the new commit and then build a new container image, so snekomatic wins the race. But if snekomatic gets fancier and starts doing more, this will become more of a problem.

It's easy to have a task listen for SIGTERM. But after it gets it, what should it do? I guess we want to do a graceful shutdown of hypercorn (i.e. finish processing current requests, but stop accepting new ones). I don't think hypercorn has any support for this? And Trio doesn't make it particularly easy yet either (cf https://github.com/python-trio/trio/issues/147).

Oh, no, I'm wrong: hypercorn has a config option shutdown_event that it uses for doing graceful shutdown when running in multi-process mode. Maybe we can use this when from API mode too? Is it just as simple as passing shutdown_event=some_trio_event_object as part of our config? CC @pgjones

njsmith avatar Aug 20 '19 21:08 njsmith

When Hypercorn receives a SIGTERM it should trigger a graceful shutdown, whereby it stops accepting connections and allows the remaining ones to complete. It then sends a lifespan.shutdown ASGI event and waits for the application to reply with a lifespan.shutdown.complete event which Quart converts the lifespan messages into shutdown (and startup) handling.

In summary you should be able to do,

@app.after_serving
async def shutdown():
    await stuff_to_finish 

Note as well that Quart-Trio introduces a nursery on the app which I hope allows you to create background tasks, and then await them in the after serving.

@app.route("/job")
async def trigger():
    app.nursery.start_soon(something)
    return 201

@app.after_serving
async def shutdown():
    await app.nursery  # I've forgotten the syntax 

pgjones avatar Aug 21 '19 08:08 pgjones

@pgjones I'm using hypercorn.trio.serve

njsmith avatar Aug 21 '19 09:08 njsmith

Ah, I see.

I'm thinking the hypercorn.trio.serve function needs another argument of the type shutdown: Optional[Awaitable[None]] = None which would allow usage like,

trigger_shutdown = trio.Event()
nursery.start_soon(partial(serve, app, config, shutdown=trigger_shutdown.wait))
...
trigger_shutdown.set()  # Trigger graceful shutdown of serve

What do you think?

pgjones avatar Aug 22 '19 09:08 pgjones

I think you mean Optional[Callable[[], Awaitable[None]]] or similar, but otherwise sure that would work.

njsmith avatar Aug 22 '19 09:08 njsmith

Sorry there was a typo, it should have been

nursery.start_soon(partial(serve, app, config, shutdown=trigger_shutdown.wait()))

So that you could pass any awaitable (rather than a callable). Do you think it should be a Callable?

pgjones avatar Aug 22 '19 15:08 pgjones

Trio's normal convention is that we pretend awaitable objects don't exist, only async functions. The major advantages are that it simplifies teaching, and that we can easily use pylint/mypy to catch missing awaits. But that means we'd need to disable those checks to use something like shutdown=trigger_shutdown.wait().

tl;dr: yeah I think a callable would be better

njsmith avatar Aug 22 '19 18:08 njsmith

Sounds sensible, I've added in https://gitlab.com/pgjones/hypercorn/commit/33a6f1a3847382921ec3d37143145d1bf06ee263. I'll release soon, so please say if it doesn't do the job.

pgjones avatar Aug 23 '19 20:08 pgjones

I haven't tried using it, but the diff looks fine :-)

Re: the commit message, just in case one of us is confused: AFAIK there's no problem with regular cancellation on trio, it's just not graceful :-)

njsmith avatar Aug 23 '19 20:08 njsmith

Sorry about the commit message, it isn't very clear (I can't quite remember the reasoning now either).

I've released 0.8.0 now, if you'd like to use the shutdown trigger.

pgjones avatar Aug 26 '19 12:08 pgjones