vashat comments

Results 10 comments of


                                            vashat

Can I use vectorstore with LLMChain?

> retriever = vectorstore.as_retriever(search_kwargs=dict(k=1)) > memory = VectorStoreRetrieverMemory(retriever=retriever) > LLMChain(llm=llm, prompt=prompt, verbose=True, memory=memory) Thank you! That solved it!

Azure OpenAI streaming is slow and chunky

I am having the same problem with Python SDK. If we managed to disable content filter, would it be more unsafe than using openai directly instead of azure or the...

MPSNDArray.mm:782: failed assertion; bufer is not large enough Mac M1 MPS

Hi! Any progress on this issue?

MPSNDArray.mm:782: failed assertion; bufer is not large enough Mac M1 MPS

> > It is over twice as slow as CPU only but it worked. > > I agree that the Whisper now running on the GPU on the M1 Max...

Can't get server to run with more than 6 slots

@pudepiedj I'm running the test by using the web interface that comes with the server (it is at the web root of the api). I just open 10 Chrome tabs...

Can't get server to run with more than 6 slots

https://github.com/ggerganov/llama.cpp/assets/1157672/ff6bdc9f-036a-4500-a8d5-dc5b32323488 @pudepiedj I added a video that demonstrates the problem. And also the output from the server is visible in the video in case you can see what goes wrong....

Can't get server to run with more than 6 slots

@pudepiedj I've tried a model 1/10th the size but still same problem, slots after the first 6 slots always have to wait. Have tried both on local machine Apple M3...

Unsupported Model Binary Version

I also have this problem with whisper-large-v3

M2 mac is so slow

I can't get it to work with inswapper_128.onnx, it still looking for inswapper_128_fp16.onnx. How do I make it switch? ```Exception in Tkinter callback Traceback (most recent call last): File "/Users/admin/miniconda3/envs/deeplivecam/lib/python3.10/tkinter/__init__.py",...

M2 mac is so slow

I discovered that there is some acceleration going on by the ANE (NPU) when using coreml (instead of GPU). Used asitop utility to measure it. It is utilized at 25%...