koonseng

Results 2 comments of koonseng

I have the same problem. I'm running this on AWS g3.4xlarge model with 128GB of memory. python3 inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B Loading togethercomputer/Pythia-Chat-Base-7B to cuda:0... Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████| 2/2 [00:09

OK, solved it. The problem was the g3.4xlarge instance has only 8GB per GPU, clearly not enough. I re-ran this on a g5.2xlarge and the problem disappears.