Hi everyone,
i am currently trying to use localGPT for a project and i encountered a problem.
Basically i have two setup :
- my home setup with : i5 8600K, 32Gb DDR4 and an RTX 2080
- my work setup with : i7 8700k , 128Gb DDR4 and an Nvidia A2
in both setup localGPT was installed the same way and everything. When i run the ingest.py code i get no error whatsoever, it is when i run the main program that i encounter problems.
Everything work perfectly on my home setup, but on my work setup i run on this error : torch.cuda.outofMemoryError .
Even though i have more Vram on the A2. Also i didn't change the model i use the base one which is "TheBloke/vicuna-7B-1.1-HF"
Do you guys know what's wrong ?
Here is the full error :
Traceback (most recent call last):
File "C:\Users\Ali_I\Documents\LocalGPT\localGPT\run_localGPT.py", line 235, in
main()
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1130, in call
return self.main(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "C:\Users\Ali_I\Documents\LocalGPT\localGPT\run_localGPT.py", line 213, in main
res = qa(query)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\base.py", line 140, in call
raise e
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\base.py", line 134, in call
self._call(inputs, run_manager=run_manager)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\retrieval_qa\base.py", line 120, in _call
answer = self.combine_documents_chain.run(
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\base.py", line 239, in run
return self(kwargs, callbacks=callbacks)[self.output_keys[0]]
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\base.py", line 140, in call
raise e
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\base.py", line 134, in call
self._call(inputs, run_manager=run_manager)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\combine_documents\base.py", line 84, in _call
output, extra_return_dict = self.combine_docs(
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\combine_documents\stuff.py", line 87, in combine_docs
return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\llm.py", line 213, in predict
return self(kwargs, callbacks=callbacks)[self.output_key]
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\base.py", line 140, in call
raise e
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\base.py", line 134, in call
self._call(inputs, run_manager=run_manager)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\llm.py", line 69, in _call
response = self.generate([inputs], run_manager=run_manager)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\chains\llm.py", line 79, in generate
return self.llm.generate_prompt(
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\llms\base.py", line 134, in generate_prompt
return self.generate(prompt_strings, stop=stop, callbacks=callbacks)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\llms\base.py", line 191, in generate
raise e
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\llms\base.py", line 185, in generate
self._generate(prompts, stop=stop, run_manager=run_manager)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\llms\base.py", line 436, in _generate
self._call(prompt, stop=stop, run_manager=run_manager)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\langchain\llms\huggingface_pipeline.py", line 168, in _call
response = self.pipeline(prompt)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\pipelines\text_generation.py", line 201, in call
return super().call(text_inputs, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\pipelines\base.py", line 1120, in call
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\pipelines\base.py", line 1127, in run_single
model_outputs = self.forward(model_inputs, **forward_params)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\pipelines\base.py", line 1026, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\pipelines\text_generation.py", line 263, in _forward
generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1522, in generate
return self.greedy_search(
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 2339, in greedy_search
outputs = self(
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 688, in forward
outputs = self.model(
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 578, in forward
layer_outputs = decoder_layer(
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 292, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\Ali_I\AppData\Local\Programs\Python\Python310\lib\site-packages\transformers\models\llama\modeling_llama.py", line 212, in forward
attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 138.00 MiB (GPU 0; 14.84 GiB total capacity; 13.94 GiB already allocated; 77.19 MiB free; 13.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@mingyuwanggithub The documents are all loaded, then split into chunks then embedding are generated all without using the GPU.
The VRAM usage seems to come from the Duckdb, which to use the GPU to probably to compute the distances between the different vectors. to test it I took around 700mb of PDF files which generated around 320 kb of actual text it used around 7.7GB of VRAM to process the text from the documents with a brand new DB, and appropriately returned the VRAM after DB is set to None.
But as I ran it again with the exact same documents the VRAM requirements increased to 9.2GB. database contained 35k embedding when loaded, Third time it used 7.9GB VRAM and the DB loaded 71k embedding, on the fourth round it used 8.8GB of VRMAM and the DB loaded 107K embedding.
So I did not manage to reproduce the CUDA out of memory that I experienced. At that moment I had experienced a few crashes due to pdf documents that where not liked by the parser and moments where I had manually killed the task maybe it did not like that. I do not know.
In conclusion apparently it seems to try add the new docs to the database without checking if they are already there. increasing the number of embedding in the DB but not really increasing the VRAM requirements nor the time to process the documents. However it seems to be extremely memory hungry as 7.9GB of VRAM for 320k of text seems a lot.
it might be good to stage the inputs of text to the DB to stay within the bounds of the VRAM?
It might still be good to have a check if a doc has been ingested and avoid ingesting it again?