Raj Hammeer Singh Hada

Results 20 comments of Raj Hammeer Singh Hada

``` ``` Add this after or before your application tag in AnroidManifest.xml and you're good to go.

![image](https://github.com/ggerganov/llama.cpp/assets/29945363/072a0fe2-9de8-4c63-9af8-5b71cde27556) Model directly works 👍 **GGUF link** - https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/blob/main/Phi-3-mini-4k-instruct-q4.gguf **Command** -` main -m Phi-3-mini-4k-instruct-q4.gguf -p "\nYou are a helpful AI assistant.\n\nHow to explain Internet for a medieval knight?\n"`

@mirek190 The model doesn't stop generating you mean? Yeah faced it too. This PR: #6851 handles it.

Closing this since PR: https://github.com/ggerganov/llama.cpp/pull/6857 was merged into master with support for Phi-3 4K context length.

Status: Phi-3 4K models are supported in master after https://github.com/ggerganov/llama.cpp/pull/6857 merge Phi-3 128K models aren't supported yet (as of 24th Apr 2024)

I have T4 installed and still facing the same issue on an AWS compute machine **pytorch version - **`2.0.1+cu117`**** ``` Cuda support: False : 0 devices Traceback (most recent call...

If you are confirm that you're using GPU(s) then try updating the nvidia drivers to an appropriate version(in Ubuntu distro anything >= 450 is good enough) Try running torch.cuda.devices to...

I can put up a docker img for TTS server, would that suffice? I would probably want to put different docker images for different TTS services. Would this be a...

https://github.com/KoljaB/RealtimeTTS/pull/136 @PylotLight Here you go, Pl review from your side as well.

Does the FastAPI server handles concurrency? If made 2 requests at the same time, it gives all the chunks to the 2nd request.