llama-api
llama-api copied to clipboard

→

Metadata

An OpenAI-like LLaMA inference API

Reame
Issues

Results 16 llama-api issues

Sort by recently updated

Stopped working after enabling CUDA

Hi, this was working really quite well on CPU for me, but I gave the tool access to the paths for libcublas, it compiled and now can't start or load...

High RAM and CPU usage

![image](https://github.com/c0sogi/llama-api/assets/39416418/929e3839-fdd8-41b1-bbe4-1fa88bb5f8d3) When I run a model on my GPU, my CPU and RAM Usage is insanely high

Proxy to openAI

2

comment

Hi! I have a strange suggestion :) Do a proxy object that will send requests to openal if in openai_replacement_models specifies openai_proxy (or something like it). For example: openai_replacement_models =...

Usage of embedding through langchain

Hello, I appreciate this API, but I am struggling to use the embedding part with langchain, is there any support regarding how to (if possible) use the embedding with langchain?...

how to run this api in cpu only mode

1

comment

Hello can someone guide me to run this nice API in CPU mode only

exllamav2

2

comment

Please add support for exllamav2

Support min_p sampler

Support [min_p sampler](https://old.reddit.com/r/LocalLLaMA/comments/17vonjo/your_settings_are_probably_hurting_your_model_why/), which is implemented in ExLlamav2.-

How can I use a specific prompt template?

For example openchat 3.5 wants this prompt template format: GPT4 User: {prompt}GPT4 Assistant: I tried a few things a managed to crash the server so I am stuck. Can anyone...

Zephyr7b gives gobbly gook output but Mistral7b works fine.

Could there be some new format of gguf that we need to update the code for or something?

exllama GPU split

1

comment

It's not clear from the documentation how to split VRAM over multiple GPUs with exllama.

1
2
›

About

An OpenAI-like LLaMA inference API

api

fastapi

llama

llamacpp

exllama

111

Stars

9

Forks

Watchers

Owner

← Metadata

111

Stars

9

Forks

Watchers

Owner

Metadata

An OpenAI-like LLaMA inference API