vashat
vashat
> retriever = vectorstore.as_retriever(search_kwargs=dict(k=1)) > memory = VectorStoreRetrieverMemory(retriever=retriever) > LLMChain(llm=llm, prompt=prompt, verbose=True, memory=memory) Thank you! That solved it!
I am having the same problem with Python SDK. If we managed to disable content filter, would it be more unsafe than using openai directly instead of azure or the...
Hi! Any progress on this issue?
> > It is over twice as slow as CPU only but it worked. > > I agree that the Whisper now running on the GPU on the M1 Max...
@pudepiedj I'm running the test by using the web interface that comes with the server (it is at the web root of the api). I just open 10 Chrome tabs...
https://github.com/ggerganov/llama.cpp/assets/1157672/ff6bdc9f-036a-4500-a8d5-dc5b32323488 @pudepiedj I added a video that demonstrates the problem. And also the output from the server is visible in the video in case you can see what goes wrong....
@pudepiedj I've tried a model 1/10th the size but still same problem, slots after the first 6 slots always have to wait. Have tried both on local machine Apple M3...
I also have this problem with whisper-large-v3
I can't get it to work with inswapper_128.onnx, it still looking for inswapper_128_fp16.onnx. How do I make it switch? ```Exception in Tkinter callback Traceback (most recent call last): File "/Users/admin/miniconda3/envs/deeplivecam/lib/python3.10/tkinter/__init__.py",...
I discovered that there is some acceleration going on by the ANE (NPU) when using coreml (instead of GPU). Used asitop utility to measure it. It is utilized at 25%...