FdyCN

Results 11 issues of FdyCN

Hi guys, i used jitify.hpp in my project, and it works just fine. However, i still have a question: according to this: https://github.com/NVIDIA/jitify/blob/b22bf7d1dce113c8f67d195c7e6e497858b6fdb8/jitify.hpp#L3507 seems it's no way to create progtam...

i compiled shardc from source in NDK r22b,and followed official guide below: https://developer.android.com/ndk/guides/graphics/shader-compilers CLI: step1: `cd ${NDK_ROOT}/sources/third_party/shaderc/ ` step2: `ndk-build NDK_PROJECT_PATH=. APP_BUILD_SCRIPT=Android.mk APP_STL:=c++_shared APP_ABI=all libshaderc_combined` after a long time building...

Hi guys, i got a problem. when I include opencl.hpp in my project. BUT, when i build executable for android 64 os using NDK. Here comes a compiling error, happened...

I tried to include in my kernel string like: "#include " or make a header named "JITFP16.cuh" and pass into jitify::Program::program() functions. can't work. so how can i include these...

### Expected behavior I try to test auto_tvm on real hexagon device by using `pytest tests/python/contrib/test_hexagon/test_autotvm.py` (After all steps are followed in [this](https://github.com/apache/tvm/blob/main/tests/python/contrib/test_hexagon/README.md) but i got the error log :...

type: bug
needs-triage

Hello, seems there is no developer guides for adding a new fusion pass?Could you give a little guide doc for us?

I really appreciate what job you have done, that's awesome!but I am confused about some datas meaning in the table below: ![image](https://github.com/philipturner/metal-benchmarks/assets/80800417/e2176d61-c2e0-474d-9e65-b6fa9d49bf1d) 1. Shared BW/Cycle means: shared memory register pass...

![image](https://github.com/philipturner/metal-benchmarks/assets/80800417/2e57d665-1ba8-4731-885c-bf044bab7b50) as the image shows, apple 7\8 has 16 bank and each bank size is 4B, while warp-size(or simdgroup_size) is 32. So when we loading 1 float per thread in...

In your conclusion. MPS performace is worse than llama.cpp cpu performance in the same fp16. Why? Is there any kernel which MPS doesn't support will fallback to CPU( so that...

I try to optimize GEMV using shared memory to speed up I\O,theoretically speaking,GEMV with sram will have better bandwidth. BUT here comes a weird performance result. **Device: M2 Ultra 128GB**...