localGPT Can you suggest any light model for system with cpu? current model is taking so much time on cpu.

Jun 12 '23 14:06 rahulb7230

I will be adding GGML support for quantized cpu models soon

Jun 13 '23 01:06 LeafmanZ

I will be adding GGML support for quantized cpu models soon

Thank you LeafmanZ

Jun 13 '23 05:06 rahulb7230

I tried to load TheBloke/vicuna-7B-1.1-HF with CPU/64GB, it still crashed in the end.

With CPU, one may use model_name = "hkunlp/instructor-large" for embeddings and store them in a vectorstore (the ingest.py part of this repo) but use opanai for query, e.g.,

import os
from langchain.llms import OpenAI

os.environ['OPENAI_API_KEY'] = 'sk-...'
llm = OpenAI()
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    # retriever=vectorstore.as_retriever(),
    retriever=db.as_retriever(),
    memory=memory
)

user_question = '....'
response = conversation_chain({'question': user_question})
print(user_question, response['answer']])

Jun 13 '23 13:06 ffreemt