Logan Lebanoff
Logan Lebanoff
Has anyone tested loading 65B with `accelerate` to load on multiple GPUs?
See here: https://github.com/facebookresearch/llama/issues/84#issuecomment-1456285764
I got it working following the instructions in this repo https://github.com/zsc/llama_infer. It uses huggingface's `transformers` and `accelerate` to load the model. Since it no longer needs torchrun, then you can...
I ran into this issue as well with torch==2.0. When I uninstalled it and re-installed as torch==1.13.1, then it seemed to fix the issue.
The error went away for me on GPU
CUDA 11.7. Also I'm used conda for install pytorch with cuda (`conda install pytorch=1.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia`) ``` $ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright...
> > Is this feature in the Chat UI product roadmap? > > Yes! We're working on something similar right now 😄 Excited about the plugin support! Any update on...
Here's what fixed it for me https://github.com/huggingface/chat-ui/issues/1169#issuecomment-2173309506
Any progress on this? I'm also interested in hooking up retrieval to the UI
Here's what fixed the `Controller is already closed` issue for me. Maybe it will for you too, though I was not using DeepInfra. https://github.com/huggingface/chat-ui/issues/1169#issuecomment-2173309506