Svetlozar Georgiev comments

Results 13 comments of


                                            Svetlozar Georgiev

[nvidia] batch normalization primitive fails correctness check

To summarise, the forward pass of batchnorm calculates means close to the expected value but there is still a difference of near zero values. Hence, because the src for the...

generic: fix several gtest issues

The failing CI check is in a file not added by this PR.

generic: fix several gtest issues

make test disable device_cpu enable device_gpu enable thr_cuda enable arch_rtx

rfc: dnncompat compatibility layer

> @zhimingwang36, can you please provide your input on how you handle oneDNN/cuDNN scratchpad and workspace? > > Below are related questions from @mgouicem: > > > Other thing is...

rfc: dnncompat compatibility layer

> @sgeor255 In SYCLomaitc, If some workspace and scratchpad memory needs in cuDNN, while not in oneDNN. Then SYCLomatic will replace the cuDNN query API call with 0. If some...

sycl : Implemented reorder Q4_K mmvq

I rebased the PR on @Alcpz 's latest changes & updated the description with more performance numbers.

sycl : Implemented reorder Q4_K mmvq

@NeoZhangJianyu to answer your questions: > 1. Could you share the GPU type of above test result? I updated the PR description with results from more devices. > 2. Have...

sycl : Implemented reorder Q4_K mmvq

> @sgeor255 Here is a discussion about Q4_K. [#13120 (reply in thread)](https://github.com/ggml-org/llama.cpp/discussions/13120#discussioncomment-12957458) Could you test the model by this PR? If result is good, could you reply with your test...

sycl : Implemented reorder Q4_K mmvq

This PR is now rebased on master as #12858 was merged.

sycl: cleanup oneDNN related code

> llama.cpp use the official release of oneAPI (including oneDNN). Even if the PR of oneDNN is merged, the oneAPI will include it after a long time. > > So,...