Jeximo comments

Results 17 comments of


                                            Jeximo

AMD EPYC 9654 is not optimized for max speed

> 96 cores 192 threads ... the peek inferencing speed tops at around 60 threads This sounds normal. The CPU may be over-saturated: [token generation performance tips readme](https://github.com/ggerganov/llama.cpp/blob/f87f7b898651339fe173ddf016ca826163e899d8/docs/token_generation_performance_tips.md#verifying-that-the-cpu-is-not-oversaturated)

Output of main and server is totally different

> how to keep the output response of API is the same as the output of main command. It's unclear what settings you used. [Readme shows](https://github.com/ggerganov/llama.cpp/blob/4e9a7f7f7fb6acbddd1462909c8d696e38edbfcc/examples/server/README.md) `seed`, and `temperature` API....

AMD MI300 GPU (gfx942) is lower than expected

> n_threads = 112 / 224 Test `--threads N`. I don't know what's optimal for your system, usually it's best to start at 1, then see if token generation speed...

truly opensource model called olmo

There's a PR to implement this: https://github.com/ggerganov/llama.cpp/pull/6741

Android OpenCL question

> Hi, > > I am using this model ggml-model-q4_0.gguf and ggml-model-f32.gguf > Unclear, but this doesn't seem to be the focus of your question. > > My issues is...

b2447 (c47cf41) decreased output quality

> Why does this commit alter output? @Azirine In order to figure out the difference, then show the steps for how you decided https://github.com/ggerganov/llama.cpp/commit/c47cf414efafb8f60596edc7edb5a2d68065e992 lowered output quality.

b2447 (c47cf41) decreased output quality

> CPU only beat GPU output hands down. ... GPU 75% / CPU 25% -> Always seems to yield higher quality output. GPU 50% / CPU 50% -> Even better...

b2447 (c47cf41) decreased output quality

> LMSYS Chatbot Arena @Azirine See I didn't say "_LMSYS_". Please do not read things I didn't say, **that'd be great**. > alters the model's outputs even with identical prompts,...

Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

> n_gpu_layer= -1, This isn't a thing, you've set your GPU to use no layers. Increase the #.

Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

> Even when I try with 30 it still the same issue https://github.com/ggerganov/llama.cpp/blob/4e9a7f7f7fb6acbddd1462909c8d696e38edbfcc/examples/main/README.md?plain=1#L318 The original post has a typo in the parameter, `--n-gpu-layers N`. Did you use `--n-gpu-layers 30`, and...