Raj Hammeer Singh Hada
Raj Hammeer Singh Hada
``` ``` Add this after or before your application tag in AnroidManifest.xml and you're good to go.
 Model directly works 👍 **GGUF link** - https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/blob/main/Phi-3-mini-4k-instruct-q4.gguf **Command** -` main -m Phi-3-mini-4k-instruct-q4.gguf -p "\nYou are a helpful AI assistant.\n\nHow to explain Internet for a medieval knight?\n"`
@mirek190 The model doesn't stop generating you mean? Yeah faced it too. This PR: #6851 handles it.
Closing this since PR: https://github.com/ggerganov/llama.cpp/pull/6857 was merged into master with support for Phi-3 4K context length.
Status: Phi-3 4K models are supported in master after https://github.com/ggerganov/llama.cpp/pull/6857 merge Phi-3 128K models aren't supported yet (as of 24th Apr 2024)
I have T4 installed and still facing the same issue on an AWS compute machine **pytorch version - **`2.0.1+cu117`**** ``` Cuda support: False : 0 devices Traceback (most recent call...
If you are confirm that you're using GPU(s) then try updating the nvidia drivers to an appropriate version(in Ubuntu distro anything >= 450 is good enough) Try running torch.cuda.devices to...
I can put up a docker img for TTS server, would that suffice? I would probably want to put different docker images for different TTS services. Would this be a...
https://github.com/KoljaB/RealtimeTTS/pull/136 @PylotLight Here you go, Pl review from your side as well.
Does the FastAPI server handles concurrency? If made 2 requests at the same time, it gives all the chunks to the 2nd request.