How to run 70 B llama model
Hi . i am looking for running LLAMA 2 70B model . i have 2 gpus . need to run . does the localGPT.py Auto detect my 2 gpus or i should code somewhere to detect the GPU
I also have a higher GPU and wish to run this in a production environment; however, its working only on CPU and memory, not a single % GPU is being used. I am looking for your kind guidance for how to make it work over GPU and make the process faster.
Yeah even i am thinking the same . Like i tried to run via terminal like this below , command: python run_localGPT.py --device_type cuda
output : it does run in cuda .
Actual query for me is : I am not able to detect all my gpus in my server as it goes with cpu process and if i give the above command it runs in 1 gpu.
I am getting this error right now when i execute the query such as
(localGPT) viswanath:~/localGPT$ python run_localGPT.py --device_type cuda /home/miniconda3/envs/localGPT/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11000). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 2023-12-17 01:52:31,506 - INFO - run_localGPT.py:221 - Running on: cuda 2023-12-17 01:52:31,506 - INFO - run_localGPT.py:222 - Display Source Documents set to: False 2023-12-17 01:52:31,506 - INFO - run_localGPT.py:223 - Use history set to: False 2023-12-17 01:52:31,834 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large load INSTRUCTOR_Transformer max_seq_length 512 2023-12-17 01:52:32,947 - INFO - posthog.py:16 - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information. 2023-12-17 01:52:33,043 - INFO - run_localGPT.py:56 - Loading Model: TheBloke/Llama-2-70b-Chat-GGUF, on: cuda 2023-12-17 01:52:33,043 - INFO - run_localGPT.py:57 - This action can take a few minutes! 2023-12-17 01:52:33,043 - INFO - load_models.py:38 - Using Llamacpp for GGUF/GGML quantized models Traceback (most recent call last): File "/nlsasfs/home/localGPT/run_localGPT.py", line 258, in main() File "/nlsasfs/home/miniconda3/envs/localGPT/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/nlsasfs/home/miniconda3/envs/localGPT/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/nlsasfs/home/miniconda3/envs/localGPT/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/nlsasfs/home/mcq/viswanath/localGPT/run_localGPT.py", line 229, in main qa = retrieval_qa_pipline(device_type, use_history, promptTemplate_type="llama") File "/nlsasfs/home/mcq/viswanath/localGPT/run_localGPT.py", line 144, in retrieval_qa_pipline qa = RetrievalQA.from_chain_type( File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.py", line 100, in from_chain_type combine_documents_chain = load_qa_chain( File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/langchain/chains/question_answering/init.py", line 249, in load_qa_chain return loader_mapping[chain_type]( File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/langchain/chains/question_answering/init.py", line 73, in _load_stuff_chain llm_chain = LLMChain( File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/langchain/load/serializable.py", line 74, in init super().init(**kwargs) File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain llm none is not an allowed value (type=type_error.none.not_allowed)
How to fix this issue help me out .
hi, what did you change in the run_localgpt.py or ingest.py to get it to work with the 70B model? many thanks
none is not an allowed value (type=type_error.none.not_allowed)
1st of all, the solution for your error is , please run the below command. CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir if you are using a Windows machine, then use this command as below.
set CMAKE_ARGS="-DLLAMA_CUBLAS=on" set FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
if the above does not work, then try uninstalling llama-cpp-python==0.1.83 with this command pip uninstall llama-cpp-python==0.1.83 and install once again with this command. pip install llama-cpp-python==0.1.83 --no-cache-dir
2nd observation from this code is it only takes GPU to ingest the documents; the rest of the entire process works on CPU and memory only. I am also looking for the changes that will help to utilize GPU to improve performance.
hi, what did you change in the run_localgpt.py or ingest.py to get it to work with the 70B model? many thanks
you have to select desired models in [constants.py] which starts from code line#91 onwards.
Hope this help. Note : 70B will only load if you have sufficient GPU in the system.
(localGPT) viswanath:~/localGPT$ python run_localGPT.py --device_type cuda /home/miniconda3/envs/localGPT/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11000). Please update your GPU driver by downloading and installing a new version from the URL:
@VISWANATH78 Try reading the above snippet of error given by you.
THE SOLUTION TO YOUR PROBLEM IS:
Either update your Nvidia graphics driver, or downgrade your Cuda version to be compatible with your no-updated graphics driver.
I will suggest to update the graphics driver, reason being. If Cuda is downgraded it may for some reason break other dependencies.
Thank you, Nitkarsh Chourasia