Sushil Singh

Results 5 comments of Sushil Singh

Same problem for me, please post if you found a solution.

setting "trax.fastmath.set_backend('tensorflow-numpy')" seems to help, I can see the gpu cycles being used.

Looks like the kernel function was not parsed properly when __cudaRegisterFatBinary was called so the client code failed on kernel launch call.

Debugged it further, looks like the problem is with __cudaRegisterFatBinary, the cuda samples form nvidia are by default stored as ELF, and require further processing to extract the cuda-kernel details...

[vectorAdd.build_with_keep.tar.gz](https://github.com/user-attachments/files/19277085/vectorAdd.build_with_keep.tar.gz) These are the artifact generated when build with "--keep"