lhez comments

Results 11 comments of


                                            lhez

OpenCL: Performance comparison depending on gpu_offloads

@sparkleholic - currently Q4_0 is optimized, so you will need to use `--pure` when quantizing the model to Q4_0. Without `--pure`, some layers will be quantized in Q6_K, resulting in...

opencl: Add support for multiple devices

Thank you @linehill, it looks good.

opencl: Add support for multiple devices

@max-krasnyansky ping - I think this PR should be good to merge.

opencl: Add support for multiple devices

The problem is some of the kernels use subgroups and need to know the subgroup size and Nvidia's OpenCL implementation does not support subgroups. I think AMD has subgroups support...

Misc. bug: OpenCL context reference counting is wrong in llama-bench

@robquill Could you share the error message you got as well as your environment (which GPU, driver version)?

Misc. bug: OpenCL context reference counting is wrong in llama-bench

> I did this on a PowerVR GPU, where the extra releases actually cause a problem. I tested just now on my Intel GPU and llama-bench runs correctly. However, I...

Misc. bug: OpenCL: Issue with Adreno 610

@rmatif Unfortunately we did not test on A610. It looks like the OpenCL driver on your device is OpenCL 2.0. By default, the target is OpenCL 3.0. Could you try...

Misc. bug: OpenCL: Issue with Adreno 610

@rmatif Could you try running the model using `llama-cli` and see if you get the same error (you should be able to see the error code)?

Misc. bug: OpenCL: Issue with Adreno 610

@rmatif Looks like the error happens when launching the matmul kernel. Is it possible for you to run `clinfo` (https://github.com/Oblomov/clinfo) on this device? Unfortunately we don't have an Adreno 610...

opencl: add kernel to handle mat mul in attention to improve encoding speed

On X Elite (X1-85), master | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --:...