Svetlozar Georgiev

Results 13 comments of Svetlozar Georgiev

To summarise, the forward pass of batchnorm calculates means close to the expected value but there is still a difference of near zero values. Hence, because the src for the...

The failing CI check is in a file not added by this PR.

make test disable device_cpu enable device_gpu enable thr_cuda enable arch_rtx

> @zhimingwang36, can you please provide your input on how you handle oneDNN/cuDNN scratchpad and workspace? > > Below are related questions from @mgouicem: > > > Other thing is...

> @sgeor255 In SYCLomaitc, If some workspace and scratchpad memory needs in cuDNN, while not in oneDNN. Then SYCLomatic will replace the cuDNN query API call with 0. If some...

I rebased the PR on @Alcpz 's latest changes & updated the description with more performance numbers.

@NeoZhangJianyu to answer your questions: > 1. Could you share the GPU type of above test result? I updated the PR description with results from more devices. > 2. Have...

> @sgeor255 Here is a discussion about Q4_K. [#13120 (reply in thread)](https://github.com/ggml-org/llama.cpp/discussions/13120#discussioncomment-12957458) Could you test the model by this PR? If result is good, could you reply with your test...

This PR is now rebased on master as #12858 was merged.

> llama.cpp use the official release of oneAPI (including oneDNN). Even if the PR of oneDNN is merged, the oneAPI will include it after a long time. > > So,...