llama-cpp-python When using LlamaCppEmbeddings to embed a gguf type model, an error is reported

I used the latest module and while embedding the gguf model into chroma, a critical error occurred llamaem= LlamaCppEmbeddings(model_path="D:\models\llama-2-7b-chat.Q4_K_M.gguf") vectorstore = Chroma.from_documents(documents=all_splits, embedding=llamaem)

error: File "d:/project/python/document-GPT/test.py", line 71, in <module> vectorstore = Chroma.from_documents(documents=all_splits, embedding=llamaem)#嵌入 File "D:\python\lib\site-packages\langchain\vectorstores\chroma.py", line 771, in from_documents return cls.from_texts( File "D:\python\lib\site-packages\langchain\vectorstores\chroma.py", line 729, in from_texts chroma_collection.add_texts( File "D:\python\lib\site-packages\langchain\vectorstores\chroma.py", line 275, in add_texts embeddings = self._embedding_function.embed_documents(texts) File "D:\python\lib\site-packages\langchain\embeddings\llamacpp.py", line 113, in embed_documents embeddings = [self.client.embed(text) for text in texts] File "D:\python\lib\site-packages\langchain\embeddings\llamacpp.py", line 113, in <listcomp> embeddings = [self.client.embed(text) for text in texts] File "D:\python\lib\site-packages\llama_cpp\llama.py", line 1292, in embed return list(map(float, self.create_embedding(input)["data"][0]["embedding"])) File "D:\python\lib\site-packages\llama_cpp\llama.py", line 1256, in create_embedding self.eval(tokens) File "D:\python\lib\site-packages\llama_cpp\llama.py", line 1030, in eval self._ctx.decode(self._batch) File "D:\python\lib\site-packages\llama_cpp\llama.py", line 471, in decode raise RuntimeError(f"llama_decode returned {return_code}") RuntimeError: llama_decode returned 1

Nov 15 '23 05:11 chengjia604

I am also running into this, there are other issues related to this as well, indicating memory leak etc. I hope someone can look into this

Nov 18 '23 12:11 bsridatta

I can confirm this. I am using vicuna-7b-16k gguf Q5KM, there are still GPU VRAMs left. It seems that it is triggered with long input + embeddings.

Nov 21 '23 08:11 lithces

Sorry to get to this so late, I'll take a look!

Nov 21 '23 09:11 abetlen

What type of split are you performing on the document? I was also getting the same error but changing the splitter to RecursiveCharacterTextSplitter solved the issue.

Nov 22 '23 18:11 shashankshekhardehradun

This is how I was trying and failed

from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()

Nov 22 '23 22:11 bsridatta

Still getting the same error. I'm using CharacterTextSplitter

Dec 03 '23 10:12 essam-tobgi-dev

Can you try changing it to RecursiveCharacterTextSplitter?

Dec 07 '23 13:12 shashankshekhardehradun

i run with the same problem in the embedding, but i fix it by modify the param n_ctx=4096. It seems the long text cause the #problem

llama = Llama(model_path='./llama-2-7b.Q4_K_M.gguf', embedding=True, n_ctx=4096, n_gpu_layers=30)

Dec 20 '23 04:12 GluttonousCat

I think Chromadb doesn't support LlamaCppEmbeddings feature of Langchain. Checkout the embeddings integrations it supports in the below link. Apparently, we need to create a custom EmbeddingFunction class (also shown in the below link) to use unsupported embeddings APIs.

https://docs.trychroma.com/embeddings#custom-embedding-functions

Dec 21 '23 15:12 deekshith-rj

The link provided by @deekshith-rj now return a 404, I think the new page is https://docs.trychroma.com/guides/embeddings#custom-embedding-functions

Jun 07 '24 12:06 enzolutions