localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

How to run 70 B llama model

Open VISWANATH78 opened this issue 2 years ago • 6 comments

Hi . i am looking for running LLAMA 2 70B model . i have 2 gpus . need to run . does the localGPT.py Auto detect my 2 gpus or i should code somewhere to detect the GPU

VISWANATH78 avatar Dec 09 '23 06:12 VISWANATH78

I also have a higher GPU and wish to run this in a production environment; however, its working only on CPU and memory, not a single % GPU is being used. I am looking for your kind guidance for how to make it work over GPU and make the process faster.

TechInnovate01 avatar Dec 15 '23 04:12 TechInnovate01

Yeah even i am thinking the same . Like i tried to run via terminal like this below , command: python run_localGPT.py --device_type cuda

output : it does run in cuda .

Actual query for me is : I am not able to detect all my gpus in my server as it goes with cpu process and if i give the above command it runs in 1 gpu.

I am getting this error right now when i execute the query such as

(localGPT) viswanath:~/localGPT$ python run_localGPT.py --device_type cuda /home/miniconda3/envs/localGPT/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11000). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org/ to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 2023-12-17 01:52:31,506 - INFO - run_localGPT.py:221 - Running on: cuda 2023-12-17 01:52:31,506 - INFO - run_localGPT.py:222 - Display Source Documents set to: False 2023-12-17 01:52:31,506 - INFO - run_localGPT.py:223 - Use history set to: False 2023-12-17 01:52:31,834 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large load INSTRUCTOR_Transformer max_seq_length 512 2023-12-17 01:52:32,947 - INFO - posthog.py:16 - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information. 2023-12-17 01:52:33,043 - INFO - run_localGPT.py:56 - Loading Model: TheBloke/Llama-2-70b-Chat-GGUF, on: cuda 2023-12-17 01:52:33,043 - INFO - run_localGPT.py:57 - This action can take a few minutes! 2023-12-17 01:52:33,043 - INFO - load_models.py:38 - Using Llamacpp for GGUF/GGML quantized models Traceback (most recent call last): File "/nlsasfs/home/localGPT/run_localGPT.py", line 258, in main() File "/nlsasfs/home/miniconda3/envs/localGPT/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, **kwargs) File "/nlsasfs/home/miniconda3/envs/localGPT/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/nlsasfs/home/miniconda3/envs/localGPT/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/nlsasfs/home/mcq/viswanath/localGPT/run_localGPT.py", line 229, in main qa = retrieval_qa_pipline(device_type, use_history, promptTemplate_type="llama") File "/nlsasfs/home/mcq/viswanath/localGPT/run_localGPT.py", line 144, in retrieval_qa_pipline qa = RetrievalQA.from_chain_type( File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.py", line 100, in from_chain_type combine_documents_chain = load_qa_chain( File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/langchain/chains/question_answering/init.py", line 249, in load_qa_chain return loader_mapping[chain_type]( File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/langchain/chains/question_answering/init.py", line 73, in _load_stuff_chain llm_chain = LLMChain( File "/nlsasfs/home/mcq/viswanath/miniconda3/envs/localGPT/lib/python3.10/site-packages/langchain/load/serializable.py", line 74, in init super().init(**kwargs) File "pydantic/main.py", line 341, in pydantic.main.BaseModel.init pydantic.error_wrappers.ValidationError: 1 validation error for LLMChain llm none is not an allowed value (type=type_error.none.not_allowed)

How to fix this issue help me out .

VISWANATH78 avatar Dec 17 '23 06:12 VISWANATH78

hi, what did you change in the run_localgpt.py or ingest.py to get it to work with the 70B model? many thanks

none is not an allowed value (type=type_error.none.not_allowed)

1st of all, the solution for your error is , please run the below command. CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir if you are using a Windows machine, then use this command as below.

set CMAKE_ARGS="-DLLAMA_CUBLAS=on" set FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

if the above does not work, then try uninstalling llama-cpp-python==0.1.83 with this command pip uninstall llama-cpp-python==0.1.83 and install once again with this command. pip install llama-cpp-python==0.1.83 --no-cache-dir

2nd observation from this code is it only takes GPU to ingest the documents; the rest of the entire process works on CPU and memory only. I am also looking for the changes that will help to utilize GPU to improve performance.

TechInnovate01 avatar Dec 18 '23 04:12 TechInnovate01

hi, what did you change in the run_localgpt.py or ingest.py to get it to work with the 70B model? many thanks

you have to select desired models in [constants.py] which starts from code line#91 onwards.

Hope this help. Note : 70B will only load if you have sufficient GPU in the system.

TechInnovate01 avatar Dec 18 '23 04:12 TechInnovate01

(localGPT) viswanath:~/localGPT$ python run_localGPT.py --device_type cuda /home/miniconda3/envs/localGPT/lib/python3.10/site-packages/torch/cuda/init.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11000). Please update your GPU driver by downloading and installing a new version from the URL:

@VISWANATH78 Try reading the above snippet of error given by you.

THE SOLUTION TO YOUR PROBLEM IS:

Either update your Nvidia graphics driver, or downgrade your Cuda version to be compatible with your no-updated graphics driver.

I will suggest to update the graphics driver, reason being. If Cuda is downgraded it may for some reason break other dependencies.

Thank you, Nitkarsh Chourasia

NitkarshChourasia avatar Apr 06 '24 19:04 NitkarshChourasia