chatllm.cpp icon indicating copy to clipboard operation
chatllm.cpp copied to clipboard

Pure C++ implementation of several models for real-time chatting on your computer (CPU)

Results 17 chatllm.cpp issues
Sort by recently updated
recently updated
newest added

```shell ggml_opencl: selecting platform: 'NVIDIA CUDA' ggml_opencl: selecting device: 'NVIDIA GeForce RTX 3080' ________ __ __ __ __ ___ (百川) / ____/ /_ ____ _/ /_/ / / / /...

gpu

With 686 tokens, a single run would take more than 6 secodns on a 96C machine. Here is the profiling data for compute graph. [bge-reranker-dump.txt](https://github.com/user-attachments/files/15910956/bge-reranker-dump.txt) Any advice for better performance?

你好,请问支持GLM-4V吗?有计划支持吗?

enhancement

GGML is kind of not supported anymore and all models have moved to GGUF as a standard a year ago. Are there any plans to support it here? I'm wondering...

gguf

Follow the new code in the github, compile success, thanks. Then follow the 'Tutorial on RAG', I use the code on the github to generate 'fruits.dat', and in the next...

The `model_downloader.py` script doesn't list the recently supported phi 3.5 moe. I'd like to also know if it's ok to use the v0.3 release from Jul 6 as-is to run...