Misc. bug: OpenCL: Issue with Adreno 610
Name and Version
Version : https://github.com/ggml-org/llama.cpp/commit/295354ea6848a77bdee204ee1c971d9b92ffcca9
Built with
cmake -B build-android -G "Ninja" -DCMAKE_TOOLCHAIN_FILE=D:\Android_Studio_SDK\ndk\28.0.12433566\build\cmake\android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-34 -DCMAKE_MAKE_PROGRAM=D:\Android_Studio_SDK\cmake\3.6.4111459\bin\ninja.exe -DBUILD_SHARED_LIBS=OFF -DGGML_OPENCL=ON -DLLAMA_CURL=OFF -DGGML_OPENMP=OFF
Operating systems
Android
Which llama.cpp modules do you know to be affected?
llama-bench, llama-cli, llama-server, llama-quantize
Command line
1|xun:/data/local/tmp $ LD_LIBRARY_PATH=/vendor/lib64 ./llama-bench -m Llama-3.2-1B-Instruct-Q4_0.gguf -ngl 99
ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) (OpenCL 2.0 Adreno(TM) 610)'
ggml_opencl: clGetDeviceInfo(device, CL_DEVICE_OPENCL_C_ALL_VERSIONS, 0, nullptr, ¶m_size) error -30 at D:/ggml/llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:142
D:/ggml/llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:142: GGML_ASSERT(0) failed
0: 0x5912702e4c
1: 0x5912702da4
2: 0x59126ed39c
3: 0x59126eb9bc
4: 0x59126947b8
5: 0x5912692b1c
6: 0x59125df370
7: 0x7dd92c89a0 __libc_init
Aborted
Problem description & steps to reproduce
After cross-compiling OpenCL Headers following the doc and pushing the binary with adb, I got the above error when trying to run any of the llama binaries. I'm on a Xiaomi Pad SE with Android 14 and Adreno 610. I thought with this commit https://github.com/ggml-org/llama.cpp/commit/80f19b41869728eeb6a26569957b92a773a2b2c6 support for Adreno 6xx has been added, but does it support 610 or is it too old? Or am I doing something wrong here? @max-krasnyansky @lhez
@rmatif Unfortunately we did not test on A610. It looks like the OpenCL driver on your device is OpenCL 2.0. By default, the target is OpenCL 3.0. Could you try adding -DGGML_OPENCL_TARGET_VERSION=200 to your cmake command, do a clean cmake configuration and build, and see if it helps?
@lhez Thanks for the hint. With -DGGML_OPENCL_TARGET_VERSION=200 now it does initialize but crashes after a few moments spent in the pp phase.
xun:/data/local/tmp $ LD_LIBRARY_PATH=/vendor/lib64 ./llama-bench -m Llama-3.2-1B-Instruct-Q4_0.gguf -ngl 99
ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) (OpenCL 2.0 Adreno(TM) 610)'
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit #67600309f5 changeid #I34f73d0972 Date: 07/10/23 Mon Local Branch: Remote Branch: refs/tags/AU_LINUX_ANDROID_LA.VENDOR.1.0.11.00.00.766.892 Compiler E031.38.11.14
ggml_opencl: vector subgroup broadcast support: false
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 128
ggml_opencl: max mem alloc size: 463 MB
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: false
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: false
ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
ggml_opencl: loading OpenCL kernels......................................
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
D:/ggml/llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:3834: GGML_ASSERT(0) failed
0: 0x63753e6b44
1: 0x63753e6a9c
2: 0x63753d99d8
3: 0x63753d1350
4: 0x63753e3c9c
5: 0x63753f37ac
6: 0x63752f84e4
7: 0x63752f8e90
8: 0x63752f9d50
9: 0x63752c8e90
10: 0x63752c4878
11: 0x727e93a9a0 __libc_init
Aborted
@rmatif Could you try running the model using llama-cli and see if you get the same error (you should be able to see the error code)?
@lhez
Here is the output log using llama-cli
130|xun:/data/local/tmp $ LD_LIBRARY_PATH=/vendor/lib64 ./llama-cli -m Llama-3.2-1B-Instruct-Q4_0.gguf -ngl 99 -v
ggml_opencl: selecting platform: 'QUALCOMM Snapdragon(TM)'
ggml_opencl: selecting device: 'QUALCOMM Adreno(TM) (OpenCL 2.0 Adreno(TM) 610)'
ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit #67600309f5 changeid #I34f73d0972 Date: 07/10/23 Mon Local Branch: Remote Branch: refs/tags/AU_LINUX_ANDROID_LA.VENDOR.1.0.11.00.00.766.892 Compiler E031.38.11.14
ggml_opencl: vector subgroup broadcast support: false
ggml_opencl: device FP16 support: true
ggml_opencl: mem base addr align: 128
ggml_opencl: max mem alloc size: 463 MB
ggml_opencl: SVM coarse grain buffer support: true
ggml_opencl: SVM fine grain buffer support: false
ggml_opencl: SVM fine grain system support: false
ggml_opencl: SVM atomics support: false
ggml_opencl: flattening quantized weights representation as struct of arrays (GGML_OPENCL_SOA_Q)
ggml_opencl: using kernels optimized for Adreno (GGML_OPENCL_USE_ADRENO_KERNELS)
ggml_opencl: loading OpenCL kernels......................................
build: 5191 (295354ea) with for x86_64-w64-windows-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device GPUOpenCL (QUALCOMM Adreno(TM)) - 0 MiB free
llama_model_loader: loaded meta data with 31 key-value pairs and 147 tensors from Llama-3.2-1B-Instruct-Q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Llama 3.2 1B Instruct
llama_model_loader: - kv 3: general.finetune str = Instruct
llama_model_loader: - kv 4: general.basename str = Llama-3.2
llama_model_loader: - kv 5: general.size_label str = 1B
llama_model_loader: - kv 6: general.license str = llama3.2
llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 9: llama.block_count u32 = 16
llama_model_loader: - kv 10: llama.context_length u32 = 131072
llama_model_loader: - kv 11: llama.embedding_length u32 = 2048
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 13: llama.attention.head_count u32 = 32
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 17: llama.attention.key_length u32 = 64
llama_model_loader: - kv 18: llama.attention.value_length u32 = 64
llama_model_loader: - kv 19: llama.vocab_size u32 = 128256
llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 64
llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128009
llama_model_loader: - kv 28: tokenizer.chat_template str = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv 29: general.quantization_version u32 = 2
llama_model_loader: - kv 30: general.file_type u32 = 2
llama_model_loader: - type f32: 34 tensors
llama_model_loader: - type q4_0: 113 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q4_0
print_info: file size = 663.16 MiB (4.50 BPW)
init_tokenizer: initializing tokenizer for type 2
load: control token: 128254 '<|reserved_special_token_246|>' is not marked as EOG
load: control token: 128252 '<|reserved_special_token_244|>' is not marked as EOG
load: control token: 128251 '<|reserved_special_token_243|>' is not marked as EOG
load: control token: 128250 '<|reserved_special_token_242|>' is not marked as EOG
load: control token: 128248 '<|reserved_special_token_240|>' is not marked as EOG
load: control token: 128247 '<|reserved_special_token_239|>' is not marked as EOG
load: control token: 128245 '<|reserved_special_token_237|>' is not marked as EOG
load: control token: 128244 '<|reserved_special_token_236|>' is not marked as EOG
load: control token: 128243 '<|reserved_special_token_235|>' is not marked as EOG
load: control token: 128240 '<|reserved_special_token_232|>' is not marked as EOG
load: control token: 128238 '<|reserved_special_token_230|>' is not marked as EOG
load: control token: 128235 '<|reserved_special_token_227|>' is not marked as EOG
load: control token: 128234 '<|reserved_special_token_226|>' is not marked as EOG
load: control token: 128229 '<|reserved_special_token_221|>' is not marked as EOG
load: control token: 128227 '<|reserved_special_token_219|>' is not marked as EOG
load: control token: 128226 '<|reserved_special_token_218|>' is not marked as EOG
load: control token: 128224 '<|reserved_special_token_216|>' is not marked as EOG
load: control token: 128223 '<|reserved_special_token_215|>' is not marked as EOG
load: control token: 128221 '<|reserved_special_token_213|>' is not marked as EOG
load: control token: 128219 '<|reserved_special_token_211|>' is not marked as EOG
load: control token: 128218 '<|reserved_special_token_210|>' is not marked as EOG
load: control token: 128217 '<|reserved_special_token_209|>' is not marked as EOG
load: control token: 128216 '<|reserved_special_token_208|>' is not marked as EOG
load: control token: 128215 '<|reserved_special_token_207|>' is not marked as EOG
load: control token: 128213 '<|reserved_special_token_205|>' is not marked as EOG
load: control token: 128211 '<|reserved_special_token_203|>' is not marked as EOG
load: control token: 128210 '<|reserved_special_token_202|>' is not marked as EOG
load: control token: 128209 '<|reserved_special_token_201|>' is not marked as EOG
load: control token: 128208 '<|reserved_special_token_200|>' is not marked as EOG
load: control token: 128207 '<|reserved_special_token_199|>' is not marked as EOG
load: control token: 128204 '<|reserved_special_token_196|>' is not marked as EOG
load: control token: 128202 '<|reserved_special_token_194|>' is not marked as EOG
load: control token: 128197 '<|reserved_special_token_189|>' is not marked as EOG
load: control token: 128195 '<|reserved_special_token_187|>' is not marked as EOG
load: control token: 128194 '<|reserved_special_token_186|>' is not marked as EOG
load: control token: 128191 '<|reserved_special_token_183|>' is not marked as EOG
load: control token: 128190 '<|reserved_special_token_182|>' is not marked as EOG
load: control token: 128188 '<|reserved_special_token_180|>' is not marked as EOG
load: control token: 128187 '<|reserved_special_token_179|>' is not marked as EOG
load: control token: 128185 '<|reserved_special_token_177|>' is not marked as EOG
load: control token: 128184 '<|reserved_special_token_176|>' is not marked as EOG
load: control token: 128183 '<|reserved_special_token_175|>' is not marked as EOG
load: control token: 128178 '<|reserved_special_token_170|>' is not marked as EOG
load: control token: 128177 '<|reserved_special_token_169|>' is not marked as EOG
load: control token: 128176 '<|reserved_special_token_168|>' is not marked as EOG
load: control token: 128175 '<|reserved_special_token_167|>' is not marked as EOG
load: control token: 128174 '<|reserved_special_token_166|>' is not marked as EOG
load: control token: 128173 '<|reserved_special_token_165|>' is not marked as EOG
load: control token: 128172 '<|reserved_special_token_164|>' is not marked as EOG
load: control token: 128169 '<|reserved_special_token_161|>' is not marked as EOG
load: control token: 128167 '<|reserved_special_token_159|>' is not marked as EOG
load: control token: 128166 '<|reserved_special_token_158|>' is not marked as EOG
load: control token: 128160 '<|reserved_special_token_152|>' is not marked as EOG
load: control token: 128159 '<|reserved_special_token_151|>' is not marked as EOG
load: control token: 128157 '<|reserved_special_token_149|>' is not marked as EOG
load: control token: 128156 '<|reserved_special_token_148|>' is not marked as EOG
load: control token: 128154 '<|reserved_special_token_146|>' is not marked as EOG
load: control token: 128152 '<|reserved_special_token_144|>' is not marked as EOG
load: control token: 128151 '<|reserved_special_token_143|>' is not marked as EOG
load: control token: 128150 '<|reserved_special_token_142|>' is not marked as EOG
load: control token: 128147 '<|reserved_special_token_139|>' is not marked as EOG
load: control token: 128144 '<|reserved_special_token_136|>' is not marked as EOG
load: control token: 128142 '<|reserved_special_token_134|>' is not marked as EOG
load: control token: 128141 '<|reserved_special_token_133|>' is not marked as EOG
load: control token: 128140 '<|reserved_special_token_132|>' is not marked as EOG
load: control token: 128133 '<|reserved_special_token_125|>' is not marked as EOG
load: control token: 128130 '<|reserved_special_token_122|>' is not marked as EOG
load: control token: 128128 '<|reserved_special_token_120|>' is not marked as EOG
load: control token: 128127 '<|reserved_special_token_119|>' is not marked as EOG
load: control token: 128126 '<|reserved_special_token_118|>' is not marked as EOG
load: control token: 128125 '<|reserved_special_token_117|>' is not marked as EOG
load: control token: 128124 '<|reserved_special_token_116|>' is not marked as EOG
load: control token: 128123 '<|reserved_special_token_115|>' is not marked as EOG
load: control token: 128122 '<|reserved_special_token_114|>' is not marked as EOG
load: control token: 128121 '<|reserved_special_token_113|>' is not marked as EOG
load: control token: 128120 '<|reserved_special_token_112|>' is not marked as EOG
load: control token: 128119 '<|reserved_special_token_111|>' is not marked as EOG
load: control token: 128116 '<|reserved_special_token_108|>' is not marked as EOG
load: control token: 128115 '<|reserved_special_token_107|>' is not marked as EOG
load: control token: 128114 '<|reserved_special_token_106|>' is not marked as EOG
load: control token: 128113 '<|reserved_special_token_105|>' is not marked as EOG
load: control token: 128111 '<|reserved_special_token_103|>' is not marked as EOG
load: control token: 128110 '<|reserved_special_token_102|>' is not marked as EOG
load: control token: 128107 '<|reserved_special_token_99|>' is not marked as EOG
load: control token: 128106 '<|reserved_special_token_98|>' is not marked as EOG
load: control token: 128105 '<|reserved_special_token_97|>' is not marked as EOG
load: control token: 128104 '<|reserved_special_token_96|>' is not marked as EOG
load: control token: 128103 '<|reserved_special_token_95|>' is not marked as EOG
load: control token: 128100 '<|reserved_special_token_92|>' is not marked as EOG
load: control token: 128097 '<|reserved_special_token_89|>' is not marked as EOG
load: control token: 128096 '<|reserved_special_token_88|>' is not marked as EOG
load: control token: 128094 '<|reserved_special_token_86|>' is not marked as EOG
load: control token: 128093 '<|reserved_special_token_85|>' is not marked as EOG
load: control token: 128090 '<|reserved_special_token_82|>' is not marked as EOG
load: control token: 128089 '<|reserved_special_token_81|>' is not marked as EOG
load: control token: 128087 '<|reserved_special_token_79|>' is not marked as EOG
load: control token: 128085 '<|reserved_special_token_77|>' is not marked as EOG
load: control token: 128080 '<|reserved_special_token_72|>' is not marked as EOG
load: control token: 128077 '<|reserved_special_token_69|>' is not marked as EOG
load: control token: 128076 '<|reserved_special_token_68|>' is not marked as EOG
load: control token: 128073 '<|reserved_special_token_65|>' is not marked as EOG
load: control token: 128070 '<|reserved_special_token_62|>' is not marked as EOG
load: control token: 128069 '<|reserved_special_token_61|>' is not marked as EOG
load: control token: 128067 '<|reserved_special_token_59|>' is not marked as EOG
load: control token: 128064 '<|reserved_special_token_56|>' is not marked as EOG
load: control token: 128062 '<|reserved_special_token_54|>' is not marked as EOG
load: control token: 128061 '<|reserved_special_token_53|>' is not marked as EOG
load: control token: 128060 '<|reserved_special_token_52|>' is not marked as EOG
load: control token: 128054 '<|reserved_special_token_46|>' is not marked as EOG
load: control token: 128045 '<|reserved_special_token_37|>' is not marked as EOG
load: control token: 128044 '<|reserved_special_token_36|>' is not marked as EOG
load: control token: 128043 '<|reserved_special_token_35|>' is not marked as EOG
load: control token: 128042 '<|reserved_special_token_34|>' is not marked as EOG
load: control token: 128038 '<|reserved_special_token_30|>' is not marked as EOG
load: control token: 128037 '<|reserved_special_token_29|>' is not marked as EOG
load: control token: 128035 '<|reserved_special_token_27|>' is not marked as EOG
load: control token: 128034 '<|reserved_special_token_26|>' is not marked as EOG
load: control token: 128033 '<|reserved_special_token_25|>' is not marked as EOG
load: control token: 128032 '<|reserved_special_token_24|>' is not marked as EOG
load: control token: 128030 '<|reserved_special_token_22|>' is not marked as EOG
load: control token: 128029 '<|reserved_special_token_21|>' is not marked as EOG
load: control token: 128028 '<|reserved_special_token_20|>' is not marked as EOG
load: control token: 128026 '<|reserved_special_token_18|>' is not marked as EOG
load: control token: 128025 '<|reserved_special_token_17|>' is not marked as EOG
load: control token: 128024 '<|reserved_special_token_16|>' is not marked as EOG
load: control token: 128022 '<|reserved_special_token_14|>' is not marked as EOG
load: control token: 128020 '<|reserved_special_token_12|>' is not marked as EOG
load: control token: 128017 '<|reserved_special_token_9|>' is not marked as EOG
load: control token: 128016 '<|reserved_special_token_8|>' is not marked as EOG
load: control token: 128015 '<|reserved_special_token_7|>' is not marked as EOG
load: control token: 128014 '<|reserved_special_token_6|>' is not marked as EOG
load: control token: 128013 '<|reserved_special_token_5|>' is not marked as EOG
load: control token: 128011 '<|reserved_special_token_3|>' is not marked as EOG
load: control token: 128010 '<|python_tag|>' is not marked as EOG
load: control token: 128006 '<|start_header_id|>' is not marked as EOG
load: control token: 128003 '<|reserved_special_token_1|>' is not marked as EOG
load: control token: 128002 '<|reserved_special_token_0|>' is not marked as EOG
load: control token: 128000 '<|begin_of_text|>' is not marked as EOG
load: control token: 128041 '<|reserved_special_token_33|>' is not marked as EOG
load: control token: 128063 '<|reserved_special_token_55|>' is not marked as EOG
load: control token: 128046 '<|reserved_special_token_38|>' is not marked as EOG
load: control token: 128007 '<|end_header_id|>' is not marked as EOG
load: control token: 128065 '<|reserved_special_token_57|>' is not marked as EOG
load: control token: 128171 '<|reserved_special_token_163|>' is not marked as EOG
load: control token: 128162 '<|reserved_special_token_154|>' is not marked as EOG
load: control token: 128165 '<|reserved_special_token_157|>' is not marked as EOG
load: control token: 128057 '<|reserved_special_token_49|>' is not marked as EOG
load: control token: 128050 '<|reserved_special_token_42|>' is not marked as EOG
load: control token: 128056 '<|reserved_special_token_48|>' is not marked as EOG
load: control token: 128230 '<|reserved_special_token_222|>' is not marked as EOG
load: control token: 128098 '<|reserved_special_token_90|>' is not marked as EOG
load: control token: 128153 '<|reserved_special_token_145|>' is not marked as EOG
load: control token: 128084 '<|reserved_special_token_76|>' is not marked as EOG
load: control token: 128082 '<|reserved_special_token_74|>' is not marked as EOG
load: control token: 128102 '<|reserved_special_token_94|>' is not marked as EOG
load: control token: 128253 '<|reserved_special_token_245|>' is not marked as EOG
load: control token: 128179 '<|reserved_special_token_171|>' is not marked as EOG
load: control token: 128071 '<|reserved_special_token_63|>' is not marked as EOG
load: control token: 128135 '<|reserved_special_token_127|>' is not marked as EOG
load: control token: 128161 '<|reserved_special_token_153|>' is not marked as EOG
load: control token: 128164 '<|reserved_special_token_156|>' is not marked as EOG
load: control token: 128134 '<|reserved_special_token_126|>' is not marked as EOG
load: control token: 128249 '<|reserved_special_token_241|>' is not marked as EOG
load: control token: 128004 '<|finetune_right_pad_id|>' is not marked as EOG
load: control token: 128036 '<|reserved_special_token_28|>' is not marked as EOG
load: control token: 128148 '<|reserved_special_token_140|>' is not marked as EOG
load: control token: 128181 '<|reserved_special_token_173|>' is not marked as EOG
load: control token: 128222 '<|reserved_special_token_214|>' is not marked as EOG
load: control token: 128075 '<|reserved_special_token_67|>' is not marked as EOG
load: control token: 128241 '<|reserved_special_token_233|>' is not marked as EOG
load: control token: 128051 '<|reserved_special_token_43|>' is not marked as EOG
load: control token: 128068 '<|reserved_special_token_60|>' is not marked as EOG
load: control token: 128149 '<|reserved_special_token_141|>' is not marked as EOG
load: control token: 128201 '<|reserved_special_token_193|>' is not marked as EOG
load: control token: 128058 '<|reserved_special_token_50|>' is not marked as EOG
load: control token: 128146 '<|reserved_special_token_138|>' is not marked as EOG
load: control token: 128143 '<|reserved_special_token_135|>' is not marked as EOG
load: control token: 128023 '<|reserved_special_token_15|>' is not marked as EOG
load: control token: 128039 '<|reserved_special_token_31|>' is not marked as EOG
load: control token: 128132 '<|reserved_special_token_124|>' is not marked as EOG
load: control token: 128101 '<|reserved_special_token_93|>' is not marked as EOG
load: control token: 128212 '<|reserved_special_token_204|>' is not marked as EOG
load: control token: 128189 '<|reserved_special_token_181|>' is not marked as EOG
load: control token: 128225 '<|reserved_special_token_217|>' is not marked as EOG
load: control token: 128129 '<|reserved_special_token_121|>' is not marked as EOG
load: control token: 128005 '<|reserved_special_token_2|>' is not marked as EOG
load: control token: 128078 '<|reserved_special_token_70|>' is not marked as EOG
load: control token: 128163 '<|reserved_special_token_155|>' is not marked as EOG
load: control token: 128072 '<|reserved_special_token_64|>' is not marked as EOG
load: control token: 128112 '<|reserved_special_token_104|>' is not marked as EOG
load: control token: 128186 '<|reserved_special_token_178|>' is not marked as EOG
load: control token: 128095 '<|reserved_special_token_87|>' is not marked as EOG
load: control token: 128109 '<|reserved_special_token_101|>' is not marked as EOG
load: control token: 128099 '<|reserved_special_token_91|>' is not marked as EOG
load: control token: 128138 '<|reserved_special_token_130|>' is not marked as EOG
load: control token: 128193 '<|reserved_special_token_185|>' is not marked as EOG
load: control token: 128199 '<|reserved_special_token_191|>' is not marked as EOG
load: control token: 128048 '<|reserved_special_token_40|>' is not marked as EOG
load: control token: 128088 '<|reserved_special_token_80|>' is not marked as EOG
load: control token: 128192 '<|reserved_special_token_184|>' is not marked as EOG
load: control token: 128136 '<|reserved_special_token_128|>' is not marked as EOG
load: control token: 128092 '<|reserved_special_token_84|>' is not marked as EOG
load: control token: 128158 '<|reserved_special_token_150|>' is not marked as EOG
load: control token: 128001 '<|end_of_text|>' is not marked as EOG
load: control token: 128049 '<|reserved_special_token_41|>' is not marked as EOG
load: control token: 128031 '<|reserved_special_token_23|>' is not marked as EOG
load: control token: 128255 '<|reserved_special_token_247|>' is not marked as EOG
load: control token: 128182 '<|reserved_special_token_174|>' is not marked as EOG
load: control token: 128066 '<|reserved_special_token_58|>' is not marked as EOG
load: control token: 128180 '<|reserved_special_token_172|>' is not marked as EOG
load: control token: 128233 '<|reserved_special_token_225|>' is not marked as EOG
load: control token: 128079 '<|reserved_special_token_71|>' is not marked as EOG
load: control token: 128081 '<|reserved_special_token_73|>' is not marked as EOG
load: control token: 128231 '<|reserved_special_token_223|>' is not marked as EOG
load: control token: 128196 '<|reserved_special_token_188|>' is not marked as EOG
load: control token: 128047 '<|reserved_special_token_39|>' is not marked as EOG
load: control token: 128083 '<|reserved_special_token_75|>' is not marked as EOG
load: control token: 128139 '<|reserved_special_token_131|>' is not marked as EOG
load: control token: 128131 '<|reserved_special_token_123|>' is not marked as EOG
load: control token: 128118 '<|reserved_special_token_110|>' is not marked as EOG
load: control token: 128053 '<|reserved_special_token_45|>' is not marked as EOG
load: control token: 128220 '<|reserved_special_token_212|>' is not marked as EOG
load: control token: 128108 '<|reserved_special_token_100|>' is not marked as EOG
load: control token: 128091 '<|reserved_special_token_83|>' is not marked as EOG
load: control token: 128203 '<|reserved_special_token_195|>' is not marked as EOG
load: control token: 128059 '<|reserved_special_token_51|>' is not marked as EOG
load: control token: 128019 '<|reserved_special_token_11|>' is not marked as EOG
load: control token: 128170 '<|reserved_special_token_162|>' is not marked as EOG
load: control token: 128205 '<|reserved_special_token_197|>' is not marked as EOG
load: control token: 128040 '<|reserved_special_token_32|>' is not marked as EOG
load: control token: 128200 '<|reserved_special_token_192|>' is not marked as EOG
load: control token: 128236 '<|reserved_special_token_228|>' is not marked as EOG
load: control token: 128145 '<|reserved_special_token_137|>' is not marked as EOG
load: control token: 128168 '<|reserved_special_token_160|>' is not marked as EOG
load: control token: 128214 '<|reserved_special_token_206|>' is not marked as EOG
load: control token: 128137 '<|reserved_special_token_129|>' is not marked as EOG
load: control token: 128232 '<|reserved_special_token_224|>' is not marked as EOG
load: control token: 128239 '<|reserved_special_token_231|>' is not marked as EOG
load: control token: 128055 '<|reserved_special_token_47|>' is not marked as EOG
load: control token: 128228 '<|reserved_special_token_220|>' is not marked as EOG
load: control token: 128206 '<|reserved_special_token_198|>' is not marked as EOG
load: control token: 128018 '<|reserved_special_token_10|>' is not marked as EOG
load: control token: 128012 '<|reserved_special_token_4|>' is not marked as EOG
load: control token: 128198 '<|reserved_special_token_190|>' is not marked as EOG
load: control token: 128021 '<|reserved_special_token_13|>' is not marked as EOG
load: control token: 128086 '<|reserved_special_token_78|>' is not marked as EOG
load: control token: 128074 '<|reserved_special_token_66|>' is not marked as EOG
load: control token: 128027 '<|reserved_special_token_19|>' is not marked as EOG
load: control token: 128242 '<|reserved_special_token_234|>' is not marked as EOG
load: control token: 128155 '<|reserved_special_token_147|>' is not marked as EOG
load: control token: 128052 '<|reserved_special_token_44|>' is not marked as EOG
load: control token: 128246 '<|reserved_special_token_238|>' is not marked as EOG
load: control token: 128117 '<|reserved_special_token_109|>' is not marked as EOG
load: control token: 128237 '<|reserved_special_token_229|>' is not marked as EOG
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch = llama
print_info: vocab_only = 0
print_info: n_ctx_train = 131072
print_info: n_embd = 2048
print_info: n_layer = 16
print_info: n_head = 32
print_info: n_head_kv = 8
print_info: n_rot = 64
print_info: n_swa = 0
print_info: n_swa_pattern = 1
print_info: n_embd_head_k = 64
print_info: n_embd_head_v = 64
print_info: n_gqa = 4
print_info: n_embd_k_gqa = 512
print_info: n_embd_v_gqa = 512
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-05
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 8192
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = 0
print_info: rope type = 0
print_info: rope scaling = linear
print_info: freq_base_train = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 131072
print_info: rope_finetuned = unknown
print_info: ssm_d_conv = 0
print_info: ssm_d_inner = 0
print_info: ssm_d_state = 0
print_info: ssm_dt_rank = 0
print_info: ssm_dt_b_c_rms = 0
print_info: model type = 1B
print_info: model params = 1.24 B
print_info: general.name = Llama 3.2 1B Instruct
print_info: vocab type = BPE
print_info: n_vocab = 128256
print_info: n_merges = 280147
print_info: BOS token = 128000 '<|begin_of_text|>'
print_info: EOS token = 128009 '<|eot_id|>'
print_info: EOT token = 128009 '<|eot_id|>'
print_info: EOM token = 128008 '<|eom_id|>'
print_info: LF token = 198 'Ċ'
print_info: EOG token = 128008 '<|eom_id|>'
print_info: EOG token = 128009 '<|eot_id|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: layer 0 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 1 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 2 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 3 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 4 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 5 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 6 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 7 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 8 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 9 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 10 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 11 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 12 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 13 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 14 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 15 assigned to device GPUOpenCL, is_swa = 0
load_tensors: layer 16 assigned to device GPUOpenCL, is_swa = 0
load_tensors: tensor 'token_embd.weight' (q4_0) (and 0 others) cannot be used with preferred buffer type CPU_AARCH64, using CPU instead
load_tensors: offloading 16 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 17/17 layers to GPU
load_tensors: CPU_Mapped model buffer size = 140.91 MiB
load_tensors: OpenCL model buffer size = 663.16 MiB
.............................................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = 0
llama_context: freq_base = 500000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
set_abort_callback: call
llama_context: CPU output buffer size = 0.49 MiB
llama_context: n_ctx = 4096
llama_context: n_ctx = 4096 (padded)
init: kv_size = 4096, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 16, can_shift = 1
init: layer 0: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 1: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 2: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 3: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 4: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 5: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 6: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 7: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 8: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 9: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 10: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 11: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 12: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 13: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 14: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: layer 15: n_embd_k_gqa = 512, n_embd_v_gqa = 512, dev = GPUOpenCL
init: OpenCL KV buffer size = 128.00 MiB
llama_context: KV self size = 128.00 MiB, K (f16): 64.00 MiB, V (f16): 64.00 MiB
llama_context: enumerating backends
llama_context: backend_ptrs.size() = 2
llama_context: max_nodes = 65536
llama_context: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 0
llama_context: reserving graph for n_tokens = 512, n_seqs = 1
llama_context: reserving graph for n_tokens = 1, n_seqs = 1
llama_context: reserving graph for n_tokens = 512, n_seqs = 1
llama_context: OpenCL compute buffer size = 280.00 MiB
llama_context: CPU compute buffer size = 12.01 MiB
llama_context: graph nodes = 550
llama_context: graph splits = 2
clear_adapter_lora: call
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
set_warmup: value = 1
D:/ggml/llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:3834: GGML_ASSERT(0) failed
ggml_opencl: clEnqueueNDRangeKernel(queue, kernel, 3, NULL, global_work_size, local_work_size, 0, NULL, NULL) error -54 at D:/ggml/llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp:3834
EDIT: Just tested with Qwen 2.5 0.5B/1.5B and same issue
@rmatif Looks like the error happens when launching the matmul kernel. Is it possible for you to run clinfo (https://github.com/Oblomov/clinfo) on this device? Unfortunately we don't have an Adreno 610 device right now.
@lhez Here is the output of clinfo run on Termux
~ $ LD_LIBRARY_PATH=/vendor/lib64:$PREFIX/lib clinfo
Number of platforms 1
Platform Name QUALCOMM Snapdragon(TM)
Platform Vendor QUALCOMM
Platform Version OpenCL 3.0 QUALCOMM build: commit #67600309f5 changeid #I34f73d0972 Date: 07/10/23 Mon Local Branch: Remote Branch: refs/tags/AU_LINUX_ANDROID_LA.VENDOR.1.0.11.00.00.766.892
Platform Profile FULL_PROFILE
Platform Extensions
Platform Extensions with Version
Platform Numeric Version 0xc00000 (3.0.0)
Platform Host timer resolution 0ns
Platform Name QUALCOMM Snapdragon(TM)
Number of devices 1
Device Name QUALCOMM Adreno(TM)
Device Vendor QUALCOMM
Device Vendor ID 0x5143
Device Version OpenCL 2.0 Adreno(TM) 610
Driver Version OpenCL 3.0 QUALCOMM build: commit #67600309f5 changeid #I34f73d0972 Date: 07/10/23 Mon Local Branch: Remote Branch: refs/tags/AU_LINUX_ANDROID_LA.VENDOR.1.0.11.00.00.766.892 Compiler E031.38.11.14
Device OpenCL C Version OpenCL C 2.0 Adreno(TM) 610
Device Type GPU
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 1
Max clock frequency 1MHz
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 1024
Preferred work group size multiple (kernel) 64
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 0
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 0 / 0 (n/a)
Half-precision Floating-point support (cl_khr_fp16)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity Yes
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity Yes
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (n/a)
Address bits 64, Little-Endian
Global memory size 1943857152 (1.81GiB)
Error Correction support No
Max memory allocation 485964288 (463.5MiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing No
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Page size (QCOM) 4096 bytes
External memory padding (QCOM) 0 bytes
Preferred alignment for atomics
SVM 128 bytes
Global 0 bytes
Local 0 bytes
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 1048576 (1024KiB)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 64 bytes
Pitch alignment for 2D image buffers 64 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 16384x16384x2048 pixels
Max number of read image args 128
Max number of write image args 64
Max number of read/write image args 64
Max number of pipe args 16
Max active pipe reservations 2048
Max pipe packet size 1024
Local memory type Global
Local memory size 32768 (32KiB)
Max number of constant args 8
Max constant buffer size 65536 (64KiB)
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 655376 (640KiB)
Max size 655376 (640KiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Device Extensions cl_img_egl_image cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_egl_event cl_khr_egl_image cl_khr_fp16 cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_mipmap_image cl_khr_srgb_image_writes cl_khr_subgroups cl_qcom_android_native_buffer_host_ptr cl_qcom_compressed_image cl_qcom_compressed_yuv_image_read cl_qcom_create_buffer_from_image cl_qcom_ext_host_ptr cl_qcom_extract_image_plane cl_qcom_ion_host_ptr cl_qcom_other_image cl_qcom_perf_hint cl_qcom_priority_hint cl_qcom_protected_context cl_qcom_subgroup_shuffle cl_qcom_vector_image_ops
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform
clCreateContext(NULL, ...) [default] No platform
clCreateContext(NULL, ...) [other] Success [P0]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name QUALCOMM Snapdragon(TM)
Device Name QUALCOMM Adreno(TM)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name QUALCOMM Snapdragon(TM)
Device Name QUALCOMM Adreno(TM)
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) Invalid device type for platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name QUALCOMM Snapdragon(TM)
Device Name QUALCOMM Adreno(TM)
This issue was closed because it has been inactive for 14 days since being marked as stale.