lhez

Results 11 comments of lhez

@sparkleholic - currently Q4_0 is optimized, so you will need to use `--pure` when quantizing the model to Q4_0. Without `--pure`, some layers will be quantized in Q6_K, resulting in...

Thank you @linehill, it looks good.

@max-krasnyansky ping - I think this PR should be good to merge.

The problem is some of the kernels use subgroups and need to know the subgroup size and Nvidia's OpenCL implementation does not support subgroups. I think AMD has subgroups support...

@robquill Could you share the error message you got as well as your environment (which GPU, driver version)?

> I did this on a PowerVR GPU, where the extra releases actually cause a problem. I tested just now on my Intel GPU and llama-bench runs correctly. However, I...

@rmatif Unfortunately we did not test on A610. It looks like the OpenCL driver on your device is OpenCL 2.0. By default, the target is OpenCL 3.0. Could you try...

@rmatif Could you try running the model using `llama-cli` and see if you get the same error (you should be able to see the error code)?

@rmatif Looks like the error happens when launching the matmul kernel. Is it possible for you to run `clinfo` (https://github.com/Oblomov/clinfo) on this device? Unfortunately we don't have an Adreno 610...

On X Elite (X1-85), master | model | size | params | backend | ngl | test | t/s | | ------------------------------ | ---------: | ---------: | ---------- | --:...