chatllm.cpp issues

2

gpu

baichuan13 does not work well

1

```shell ggml_opencl: selecting platform: 'NVIDIA CUDA' ggml_opencl: selecting device: 'NVIDIA GeForce RTX 3080' ________ __ __ __ __ ___ (百川) / ____/ /_ ____ _/ /_/ / / / /...

cagev

gpu

bge-reranker is extreamly slow

10

With 686 tokens, a single run would take more than 6 secodns on a 96C machine. Here is the profiling data for compute graph. [bge-reranker-dump.txt](https://github.com/user-attachments/files/15910956/bge-reranker-dump.txt) Any advice for better performance?

RobinQu

你好，请问支持GLM-4V吗？

2

你好，请问支持GLM-4V吗？有计划支持吗？

yhl41001

enhancement

Support GGUF

1

GGML is kind of not supported anymore and all models have moved to GGUF as a standard a year ago. Are there any plans to support it here? I'm wondering...

trufae

gguf

GentleYo

phi 3.5 moe usage

5

The `model_downloader.py` script doesn't list the recently supported phi 3.5 moe. I'd like to also know if it's ok to use the v0.3 release from Jul 6 as-is to run...

GlasslessPizza

chatllm.cpp
chatllm.cpp copied to clipboard

Metadata

LlaMA 3.1 70B not work

How to use GPU?

baichuan13 does not work well

bge-reranker is extreamly slow

你好，请问支持GLM-4V吗？

Support GGUF

calculate required scratch memory

LlaMA 3.2 1B & 3B generate garbage after hundreds of tokens

Tutorial on RAG: Convert fruit vector store failed - GGML_ASSERT: ...... Aborted

phi 3.5 moe usage

← Metadata

Owner

Metadata

chatllm.cpp chatllm.cpp copied to clipboard

Metadata

← Metadata

Owner

Metadata

chatllm.cpp
chatllm.cpp copied to clipboard