Can not find my gpu by rcom
Describe the bug Can not find my gpu amd 6800 xt I have set HSA to 10.3.0
I am runing fedora and kernel 6.x.x does not have /sys/module/amdgpu/version so no version number can be found my gpu device id is GPU-2ca29a1675e0e0f6 and I have set it as well.
Log Couldn't find '/home/magnusolsen/.ollama/id_ed25519'. Generating new private key. Your new public key is:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBYrjb/N69UTc88BJSO+HT+B6UbYL3jGukWVTnxc9Uvn
time=2025-09-29T21:15:58.005+02:00 level=INFO source=routes.go:1466 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:1 HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/magnusolsen/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://127.0.0.1:11435 http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:1 http_proxy: https_proxy: no_proxy:]" time=2025-09-29T21:15:58.005+02:00 level=INFO source=images.go:518 msg="total blobs: 5" time=2025-09-29T21:15:58.005+02:00 level=INFO source=images.go:525 msg="total unused blobs removed: 0" time=2025-09-29T21:15:58.006+02:00 level=INFO source=routes.go:1519 msg="Listening on 127.0.0.1:11435 (version 0.12.0)" time=2025-09-29T21:15:58.006+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" time=2025-09-29T21:15:58.016+02:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/download/linux-drivers.html" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-09-29T21:15:58.016+02:00 level=INFO source=amd_linux.go:305 msg="filtering out device per user request" id=GPU-2ca29a1675e0e0f6 index=0 visible_devices=[1] time=2025-09-29T21:15:58.016+02:00 level=INFO source=amd_linux.go:406 msg="no compatible amdgpu devices detected"
rocminfo rocminfo ROCk module is loaded
HSA System Attributes
Runtime Version: 1.1
Runtime Ext Version: 1.6
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
Agent 1
Name: AMD Ryzen 5 3400G with Radeon Vega Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 5 3400G with Radeon Vega Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3700
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 49222032(0x2ef1190) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 49222032(0x2ef1190) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 49222032(0x2ef1190) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 4
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 49222032(0x2ef1190) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx1030
Uuid: GPU-2ca29a1675e0e0f6
Marketing Name: AMD Radeon RX 6800 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 4096(0x1000) KB
L3: 131072(0x20000) KB
Chip ID: 29631(0x73bf)
ASIC Revision: 1(0x1)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2575
BDFID: 768
Internal Node ID: 1
Compute Unit: 72
SIMDs per CU: 2
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 131
SDMA engine uCode:: 85
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1030
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
time=2025-09-29T21:15:58.016+02:00 level=INFO source=gpu.go:396 msg="no compatible GPUs were discovered" time=2025-09-29T21:15:58.016+02:00 level=INFO source=types.go:131 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="46.9 GiB" available="33.6 GiB" [GIN] 2025/09/29 - 21:15:58 | 200 | 405.192µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/09/29 - 21:15:58 | 200 | 75.582331ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/09/29 - 21:16:04 | 200 | 424.799µs | 127.0.0.1 | GET "/api/tags"
Please include the output of Alpaca, for this you'll need to run Alpaca from the terminal, then try to reproduce the error you want to report.
flatpak run com.jeffser.Alpaca
flatpak run com.jeffser.Alpaca F: Not sharing "/dev/kfd" with sandbox: File "/dev/kfd" has unsupported type 0o20000 F: Not sharing "/dev/dri" with sandbox: Path "/dev" is reserved by Flatpak INFO [main.py | main] Alpaca version: 8.1.1 /app/share/Alpaca/alpaca/widgets/blocks/latex.py:26: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all Axes decorations. fig.tight_layout() INFO [ollama_instances.py | start] Starting Alpaca's Ollama instance... INFO [ollama_instances.py | start] Started Alpaca's Ollama instance Couldn't find '/home/magnusolsen/.ollama/id_ed25519'. Generating new private key. Your new public key is:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAII+j32AYrqQ0j0DpN4bPkqYZSPAHO2f3gUUKNaLks2Mv
time=2025-09-29T21:25:22.293+02:00 level=INFO source=routes.go:1466 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0 GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES:1 HSA_OVERRIDE_GFX_VERSION:10.3.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/magnusolsen/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://127.0.0.1:11435 http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:GPU-2ca29a1675e0e0f6 http_proxy: https_proxy: no_proxy:]" time=2025-09-29T21:25:22.293+02:00 level=INFO source=images.go:518 msg="total blobs: 5" time=2025-09-29T21:25:22.294+02:00 level=INFO source=images.go:525 msg="total unused blobs removed: 0" time=2025-09-29T21:25:22.294+02:00 level=INFO source=routes.go:1519 msg="Listening on 127.0.0.1:11435 (version 0.12.0)" time=2025-09-29T21:25:22.294+02:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs" INFO [ollama_instances.py | start] client version is 0.12.0 time=2025-09-29T21:25:22.304+02:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/download/linux-drivers.html" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory" time=2025-09-29T21:25:22.305+02:00 level=INFO source=amd_linux.go:393 msg="skipping rocm gfx compatibility check" HSA_OVERRIDE_GFX_VERSION=10.3.0 time=2025-09-29T21:25:22.305+02:00 level=INFO source=types.go:131 msg="inference compute" id=GPU-2ca29a1675e0e0f6 library=rocm variant="" compute=gfx1030 driver=0.0 name=1002:73bf total="16.0 GiB" available="14.0 GiB" time=2025-09-29T21:25:22.305+02:00 level=INFO source=routes.go:1560 msg="entering low vram mode" "total vram"="16.0 GiB" threshold="20.0 GiB" [GIN] 2025/09/29 - 21:25:22 | 200 | 387.207µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/09/29 - 21:25:22 | 200 | 66.891193ms | 127.0.0.1 | POST "/api/show" [GIN] 2025/09/29 - 21:26:25 | 200 | 320.733µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/09/29 - 21:26:27 | 200 | 610.427µs | 127.0.0.1 | GET "/api/tags" [GIN] 2025/09/29 - 21:26:27 | 200 | 61.985534ms | 127.0.0.1 | POST "/api/show" llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/magnusolsen/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 14: qwen2.block_count u32 = 28 llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: general.file_type u32 = 15 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type q4_K: 169 tensors llama_model_loader: - type q6_K: 29 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.36 GiB (4.91 BPW) load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 7.62 B print_info: general.name = Qwen2.5 Coder 7B Instruct print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 llama_model_load: vocab only - skipping tensors time=2025-09-29T21:26:27.709+02:00 level=INFO source=server.go:399 msg="starting runner" cmd="/app/plugins/Ollama/bin/ollama runner --model /home/magnusolsen/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 --port 35345" time=2025-09-29T21:26:27.710+02:00 level=INFO source=server.go:504 msg="system memory" total="46.9 GiB" free="33.2 GiB" free_swap="7.9 GiB" time=2025-09-29T21:26:27.710+02:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/home/magnusolsen/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 library=rocm parallel=1 required="6.5 GiB" gpus=1 time=2025-09-29T21:26:27.711+02:00 level=INFO source=server.go:544 msg=offload library=rocm layers.requested=-1 layers.model=29 layers.offload=29 layers.split=[29] memory.available="[14.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="6.5 GiB" memory.required.partial="6.5 GiB" memory.required.kv="896.0 MiB" memory.required.allocations="[6.5 GiB]" memory.weights.total="4.1 GiB" memory.weights.repeating="3.7 GiB" memory.weights.nonrepeating="426.4 MiB" memory.graph.full="942.0 MiB" memory.graph.partial="1.1 GiB" time=2025-09-29T21:26:27.729+02:00 level=INFO source=runner.go:864 msg="starting go runner" /opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory ggml_cuda_init: failed to initialize ROCm: no ROCm-capable device is detected load_backend: loaded ROCm backend from /app/plugins/Ollama/lib/ollama/libggml-hip.so load_backend: loaded CPU backend from /app/plugins/Ollama/lib/ollama/libggml-cpu-haswell.so time=2025-09-29T21:26:27.794+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 compiler=cgo(gcc) time=2025-09-29T21:26:27.795+02:00 level=INFO source=runner.go:900 msg="Server listening on 127.0.0.1:35345" time=2025-09-29T21:26:27.798+02:00 level=INFO source=runner.go:799 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:16384 KvCacheType: NumThreads:4 GPULayers:29[ID:GPU-2ca29a1675e0e0f6 Layers:29(0..28)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:true}" time=2025-09-29T21:26:27.799+02:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-09-29T21:26:27.799+02:00 level=INFO source=server.go:1285 msg="waiting for server to become available" status="llm server loading model" llama_model_loader: loaded meta data with 34 key-value pairs and 339 tensors from /home/magnusolsen/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-60e05f2100071479f596b964f89f510f057ce397ea22f2833a0cfe029bfc2463 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen2 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen2.5 Coder 7B Instruct llama_model_loader: - kv 3: general.finetune str = Instruct llama_model_loader: - kv 4: general.basename str = Qwen2.5-Coder llama_model_loader: - kv 5: general.size_label str = 7B llama_model_loader: - kv 6: general.license str = apache-2.0 llama_model_loader: - kv 7: general.license.link str = https://huggingface.co/Qwen/Qwen2.5-C... llama_model_loader: - kv 8: general.base_model.count u32 = 1 llama_model_loader: - kv 9: general.base_model.0.name str = Qwen2.5 Coder 7B llama_model_loader: - kv 10: general.base_model.0.organization str = Qwen llama_model_loader: - kv 11: general.base_model.0.repo_url str = https://huggingface.co/Qwen/Qwen2.5-C... llama_model_loader: - kv 12: general.tags arr[str,6] = ["code", "codeqwen", "chat", "qwen", ... llama_model_loader: - kv 13: general.languages arr[str,1] = ["en"] llama_model_loader: - kv 14: qwen2.block_count u32 = 28 llama_model_loader: - kv 15: qwen2.context_length u32 = 32768 llama_model_loader: - kv 16: qwen2.embedding_length u32 = 3584 llama_model_loader: - kv 17: qwen2.feed_forward_length u32 = 18944 llama_model_loader: - kv 18: qwen2.attention.head_count u32 = 28 llama_model_loader: - kv 19: qwen2.attention.head_count_kv u32 = 4 llama_model_loader: - kv 20: qwen2.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 21: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 22: general.file_type u32 = 15 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen2 llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,152064] = ["!", """, "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,152064] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 151645 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 151643 llama_model_loader: - kv 30: tokenizer.ggml.bos_token_id u32 = 151643 llama_model_loader: - kv 31: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 32: tokenizer.chat_template str = {%- if tools %}\n {{- '<|im_start|>... llama_model_loader: - kv 33: general.quantization_version u32 = 2 llama_model_loader: - type f32: 141 tensors llama_model_loader: - type q4_K: 169 tensors llama_model_loader: - type q6_K: 29 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 4.36 GiB (4.91 BPW) load: printing all EOG tokens: load: - 151643 ('<|endoftext|>') load: - 151645 ('<|im_end|>') load: - 151662 ('<|fim_pad|>') load: - 151663 ('<|repo_name|>') load: - 151664 ('<|file_sep|>') load: special tokens cache size = 22 load: token to piece cache size = 0.9310 MB print_info: arch = qwen2 print_info: vocab_only = 0 print_info: n_ctx_train = 32768 print_info: n_embd = 3584 print_info: n_layer = 28 print_info: n_head = 28 print_info: n_head_kv = 4 print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = 7 print_info: n_embd_k_gqa = 512 print_info: n_embd_v_gqa = 512 print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-06 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 0.0e+00 print_info: f_attn_scale = 0.0e+00 print_info: n_ff = 18944 print_info: n_expert = 0 print_info: n_expert_used = 0 print_info: causal attn = 1 print_info: pooling type = -1 print_info: rope type = 2 print_info: rope scaling = linear print_info: freq_base_train = 1000000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 32768 print_info: rope_finetuned = unknown print_info: model type = 7B print_info: model params = 7.62 B print_info: general.name = Qwen2.5 Coder 7B Instruct print_info: vocab type = BPE print_info: n_vocab = 152064 print_info: n_merges = 151387 print_info: BOS token = 151643 '<|endoftext|>' print_info: EOS token = 151645 '<|im_end|>' print_info: EOT token = 151645 '<|im_end|>' print_info: PAD token = 151643 '<|endoftext|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 151659 '<|fim_prefix|>' print_info: FIM SUF token = 151661 '<|fim_suffix|>' print_info: FIM MID token = 151660 '<|fim_middle|>' print_info: FIM PAD token = 151662 '<|fim_pad|>' print_info: FIM REP token = 151663 '<|repo_name|>' print_info: FIM SEP token = 151664 '<|file_sep|>' print_info: EOG token = 151643 '<|endoftext|>' print_info: EOG token = 151645 '<|im_end|>' print_info: EOG token = 151662 '<|fim_pad|>' print_info: EOG token = 151663 '<|repo_name|>' print_info: EOG token = 151664 '<|file_sep|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = true) load_tensors: CPU_Mapped model buffer size = 4460.45 MiB llama_context: constructing llama_context llama_context: n_seq_max = 1 llama_context: n_ctx = 16384 llama_context: n_ctx_per_seq = 16384 llama_context: n_batch = 512 llama_context: n_ubatch = 512 llama_context: causal_attn = 1 llama_context: flash_attn = 0 llama_context: kv_unified = false llama_context: freq_base = 1000000.0 llama_context: freq_scale = 1 llama_context: n_ctx_per_seq (16384) < n_ctx_train (32768) -- the full capacity of the model will not be utilized llama_context: CPU output buffer size = 0.59 MiB llama_kv_cache_unified: CPU KV buffer size = 896.00 MiB llama_kv_cache_unified: size = 896.00 MiB ( 16384 cells, 28 layers, 1/1 seqs), K (f16): 448.00 MiB, V (f16): 448.00 MiB llama_context: CPU compute buffer size = 958.01 MiB llama_context: graph nodes = 1070 llama_context: graph splits = 1 time=2025-09-29T21:26:29.053+02:00 level=INFO source=server.go:1289 msg="llama runner started in 1.34 seconds" time=2025-09-29T21:26:29.053+02:00 level=INFO source=sched.go:470 msg="loaded runners" count=1 time=2025-09-29T21:26:29.053+02:00 level=INFO source=server.go:1251 msg="waiting for llama runner to start responding" time=2025-09-29T21:26:29.054+02:00 level=INFO source=server.go:1289 msg="llama runner started in 1.34 second