Graceful shutdown / SIGTERM
When Heroku decides to restart our app (e.g. for an upgrade, or just automatically once a day), it sends SIGTERM and then waits 30 seconds for it to exit.
Right now, we don't have any SIGTERM handling at all, so
This creates some race conditions: for example, suppose that a PR is merged into this repo by a new contributor. This kicks off two processes:
- snekomatic sees that it's a new contributor's first PR, and tries to invite them and post a comment
- heroku sees that there's a new commit in snekomatic, and tries to reboot snekomatic to pick up the changes
In practice we kinda get away with it, because it takes heroku a few seconds to notice the new commit and then build a new container image, so snekomatic wins the race. But if snekomatic gets fancier and starts doing more, this will become more of a problem.
It's easy to have a task listen for SIGTERM. But after it gets it, what should it do? I guess we want to do a graceful shutdown of hypercorn (i.e. finish processing current requests, but stop accepting new ones). I don't think hypercorn has any support for this? And Trio doesn't make it particularly easy yet either (cf https://github.com/python-trio/trio/issues/147).
Oh, no, I'm wrong: hypercorn has a config option shutdown_event that it uses for doing graceful shutdown when running in multi-process mode. Maybe we can use this when from API mode too? Is it just as simple as passing shutdown_event=some_trio_event_object as part of our config? CC @pgjones
When Hypercorn receives a SIGTERM it should trigger a graceful shutdown, whereby it stops accepting connections and allows the remaining ones to complete. It then sends a lifespan.shutdown ASGI event and waits for the application to reply with a lifespan.shutdown.complete event which Quart converts the lifespan messages into shutdown (and startup) handling.
In summary you should be able to do,
@app.after_serving
async def shutdown():
await stuff_to_finish
Note as well that Quart-Trio introduces a nursery on the app which I hope allows you to create background tasks, and then await them in the after serving.
@app.route("/job")
async def trigger():
app.nursery.start_soon(something)
return 201
@app.after_serving
async def shutdown():
await app.nursery # I've forgotten the syntax
@pgjones I'm using hypercorn.trio.serve
Ah, I see.
I'm thinking the hypercorn.trio.serve function needs another argument of the type shutdown: Optional[Awaitable[None]] = None which would allow usage like,
trigger_shutdown = trio.Event()
nursery.start_soon(partial(serve, app, config, shutdown=trigger_shutdown.wait))
...
trigger_shutdown.set() # Trigger graceful shutdown of serve
What do you think?
I think you mean Optional[Callable[[], Awaitable[None]]] or similar, but otherwise sure that would work.
Sorry there was a typo, it should have been
nursery.start_soon(partial(serve, app, config, shutdown=trigger_shutdown.wait()))
So that you could pass any awaitable (rather than a callable). Do you think it should be a Callable?
Trio's normal convention is that we pretend awaitable objects don't exist, only async functions. The major advantages are that it simplifies teaching, and that we can easily use pylint/mypy to catch missing awaits. But that means we'd need to disable those checks to use something like shutdown=trigger_shutdown.wait().
tl;dr: yeah I think a callable would be better
Sounds sensible, I've added in https://gitlab.com/pgjones/hypercorn/commit/33a6f1a3847382921ec3d37143145d1bf06ee263. I'll release soon, so please say if it doesn't do the job.
I haven't tried using it, but the diff looks fine :-)
Re: the commit message, just in case one of us is confused: AFAIK there's no problem with regular cancellation on trio, it's just not graceful :-)
Sorry about the commit message, it isn't very clear (I can't quite remember the reasoning now either).
I've released 0.8.0 now, if you'd like to use the shutdown trigger.