vashat

Results 10 comments of vashat

> retriever = vectorstore.as_retriever(search_kwargs=dict(k=1)) > memory = VectorStoreRetrieverMemory(retriever=retriever) > LLMChain(llm=llm, prompt=prompt, verbose=True, memory=memory) Thank you! That solved it!

I am having the same problem with Python SDK. If we managed to disable content filter, would it be more unsafe than using openai directly instead of azure or the...

> > It is over twice as slow as CPU only but it worked. > > I agree that the Whisper now running on the GPU on the M1 Max...

@pudepiedj I'm running the test by using the web interface that comes with the server (it is at the web root of the api). I just open 10 Chrome tabs...

https://github.com/ggerganov/llama.cpp/assets/1157672/ff6bdc9f-036a-4500-a8d5-dc5b32323488 @pudepiedj I added a video that demonstrates the problem. And also the output from the server is visible in the video in case you can see what goes wrong....

@pudepiedj I've tried a model 1/10th the size but still same problem, slots after the first 6 slots always have to wait. Have tried both on local machine Apple M3...

I also have this problem with whisper-large-v3

I can't get it to work with inswapper_128.onnx, it still looking for inswapper_128_fp16.onnx. How do I make it switch? ```Exception in Tkinter callback Traceback (most recent call last): File "/Users/admin/miniconda3/envs/deeplivecam/lib/python3.10/tkinter/__init__.py",...

I discovered that there is some acceleration going on by the ANE (NPU) when using coreml (instead of GPU). Used asitop utility to measure it. It is utilized at 25%...