localGPT
localGPT copied to clipboard
Can you suggest any light model for system with cpu? current model is taking so much time on cpu.
I will be adding GGML support for quantized cpu models soon
I will be adding GGML support for quantized cpu models soon
Thank you LeafmanZ
I tried to load TheBloke/vicuna-7B-1.1-HF with CPU/64GB, it still crashed in the end.
With CPU, one may use model_name = "hkunlp/instructor-large" for embeddings and store them in a vectorstore (the ingest.py part of this repo) but use opanai for query, e.g.,
import os
from langchain.llms import OpenAI
os.environ['OPENAI_API_KEY'] = 'sk-...'
llm = OpenAI()
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
# retriever=vectorstore.as_retriever(),
retriever=db.as_retriever(),
memory=memory
)
user_question = '....'
response = conversation_chain({'question': user_question})
print(user_question, response['answer']])