cog icon indicating copy to clipboard operation
cog copied to clipboard

"Already running a prediction" When hitting multiple requests

Open isahillohiya03 opened this issue 2 years ago • 10 comments

This is not exactly issue, So the situation is,

I am running COG container locally and I want to process multiple requests at once, however when i hit 100 requests at once it gave me output for 20 and for rest it gave me "Already running a prediction". however my system utilisation was very low How can i do this in parallel.

I am using Image similarity model with VIT and it uses GPU.

isahillohiya03 avatar Oct 10 '23 06:10 isahillohiya03

Ditto, trying to figure this out as well, using the latest beta version. I've tried by setting threads:

docker run -d -p 5000:5000 <container> python -m cog.server.http --threads=8

But still keep hitting the "Already running a prediction" error.

jasongrishkoff avatar Nov 03 '23 10:11 jasongrishkoff

I thought it was because of GPU only that we can make one prediction at a time but it's same for CPUs as well

isahillohiya03 avatar Nov 06 '23 09:11 isahillohiya03

There's a new version https://github.com/replicate/cog/releases/tag/v0.9.0-beta9 that has support for async predictor functions. That might help?

cc @technillogue

zeke avatar Nov 06 '23 22:11 zeke

We hope to roll out concurrent predictions in the next months, but the 0.9.0b9 only allows async def predict, not concurrent predictions.

The threads argument controls how many HTTP requests can be served concurrently, but right now unless I'm mistaken Predictor.predict can still only run one prediction at a time.

Even if that wasn't the case, it's very hard to use torch and get true GPU concurrency without ultimately implementing something like batching or microbatching. For right now, if you can implement batching yourself, that's best.

technillogue avatar Nov 07 '23 19:11 technillogue

Okay, thanks for the update. In my case what I've done is set up ~5 docker containers on separate ports, and then used nginx to load balance between them. This allows me to have up to 5 ongoing predictions at any given time.

jasongrishkoff avatar Nov 07 '23 19:11 jasongrishkoff

@technillogue +1 to concurrent predictions

vinch00 avatar Dec 30 '23 22:12 vinch00

+1 to concurrent predictions!

tripathiarpan20 avatar May 01 '24 15:05 tripathiarpan20