localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

Can you suggest any light model for system with cpu? current model is taking so much time on cpu.

Open rahulb7230 opened this issue 2 years ago • 3 comments

rahulb7230 avatar Jun 12 '23 14:06 rahulb7230

I will be adding GGML support for quantized cpu models soon

LeafmanZ avatar Jun 13 '23 01:06 LeafmanZ

I will be adding GGML support for quantized cpu models soon

Thank you LeafmanZ

rahulb7230 avatar Jun 13 '23 05:06 rahulb7230

I tried to load TheBloke/vicuna-7B-1.1-HF with CPU/64GB, it still crashed in the end.

With CPU, one may use model_name = "hkunlp/instructor-large" for embeddings and store them in a vectorstore (the ingest.py part of this repo) but use opanai for query, e.g.,

import os
from langchain.llms import OpenAI

os.environ['OPENAI_API_KEY'] = 'sk-...'
llm = OpenAI()
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    # retriever=vectorstore.as_retriever(),
    retriever=db.as_retriever(),
    memory=memory
)

user_question = '....'
response = conversation_chain({'question': user_question})
print(user_question, response['answer']])

ffreemt avatar Jun 13 '23 13:06 ffreemt