Results 13 comments of Logan Lebanoff

Has anyone tested loading 65B with `accelerate` to load on multiple GPUs?

See here: https://github.com/facebookresearch/llama/issues/84#issuecomment-1456285764

I got it working following the instructions in this repo https://github.com/zsc/llama_infer. It uses huggingface's `transformers` and `accelerate` to load the model. Since it no longer needs torchrun, then you can...

I ran into this issue as well with torch==2.0. When I uninstalled it and re-installed as torch==1.13.1, then it seemed to fix the issue.

The error went away for me on GPU

CUDA 11.7. Also I'm used conda for install pytorch with cuda (`conda install pytorch=1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia`) ``` $ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright...

> > Is this feature in the Chat UI product roadmap? > > Yes! We're working on something similar right now 😄 Excited about the plugin support! Any update on...

Here's what fixed it for me https://github.com/huggingface/chat-ui/issues/1169#issuecomment-2173309506

Any progress on this? I'm also interested in hooking up retrieval to the UI

Here's what fixed the `Controller is already closed` issue for me. Maybe it will for you too, though I was not using DeepInfra. https://github.com/huggingface/chat-ui/issues/1169#issuecomment-2173309506