bigscience Running Bloom

What kind of machine is required to just run the inference on the 176B model? https://huggingface.co/bigscience/bloom

Jul 07 '22 13:07 kamalkraj

Right now, 8*80GB A100 or 16*40GB A100 [GPUs]. With the "accelerate" library you have offloading though so as long as you have enough RAM or even just disk for 300GB you're good to go (but slower).

Source: https://www.infoq.com/news/2022/07/bigscience-bloom-nlp-ai/

According to this post you can run it on consumer hardware at 3 minutes/token.

According to this post even on pretty good GPU hardware it can take 90 seconds/token though. Seems like you need really upper range systems to run it quickly.

Aug 22 '22 16:08 aljungberg

For inference only, what are the minimum requirements for RAM and GPU memories?

Oct 10 '22 13:10 celsofranssa

About 350 GB of GPU RAM (~200 GB if you quantise to int8).

Oct 12 '22 16:10 aljungberg

About 350 GB of GPU RAM (~200 GB if you quantise to int8). For inference only?

Oct 25 '22 13:10 celsofranssa

Yep, need to get all those parameters into GPU RAM to run inference. Like I mentioned, you can use the accelerate framework to do "swapping" from CPU RAM to GPU RAM, which lets you do it with much less GPU RAM at a ridiculous speed penalty.

Oct 25 '22 20:10 aljungberg