Svetlozar Georgiev
Svetlozar Georgiev
To summarise, the forward pass of batchnorm calculates means close to the expected value but there is still a difference of near zero values. Hence, because the src for the...
The failing CI check is in a file not added by this PR.
make test disable device_cpu enable device_gpu enable thr_cuda enable arch_rtx
> @zhimingwang36, can you please provide your input on how you handle oneDNN/cuDNN scratchpad and workspace? > > Below are related questions from @mgouicem: > > > Other thing is...
> @sgeor255 In SYCLomaitc, If some workspace and scratchpad memory needs in cuDNN, while not in oneDNN. Then SYCLomatic will replace the cuDNN query API call with 0. If some...
I rebased the PR on @Alcpz 's latest changes & updated the description with more performance numbers.
@NeoZhangJianyu to answer your questions: > 1. Could you share the GPU type of above test result? I updated the PR description with results from more devices. > 2. Have...
> @sgeor255 Here is a discussion about Q4_K. [#13120 (reply in thread)](https://github.com/ggml-org/llama.cpp/discussions/13120#discussioncomment-12957458) Could you test the model by this PR? If result is good, could you reply with your test...
This PR is now rebased on master as #12858 was merged.
> llama.cpp use the official release of oneAPI (including oneDNN). Even if the PR of oneDNN is merged, the oneAPI will include it after a long time. > > So,...