Run predictions off main thread to avoid blocking health check
Fixes https://github.com/replicate/cog/issues/1719
Defining the prediction endpoints with async def runs them on the main thread per FastAPI docs, which is problematic because it blocks the server from responding to the health check endpoint. Converting these to def allows health checks to run and fixes the problem I described in the above issue.
One side effect of this, which may or may not be desirable depending on your perspective, is that prediction requests to an instance that is currently running a prediction now fail with status code 409 and a “currently running a prediction” message, rather than essentially being queued up by uvicorn. I think this is generally desirable since retrying the request could succeed (e.g. in a situation where multiple instances are available behind a load balancer).