Runxin Zhong comments

Results 10 comments of


                                            Runxin Zhong

About voxel branch

Same question. It seems that the current code is not the final version and the attention still uses dense methods not sparse. Will it be open-source? @HaochengWan @2020zhangcheng @zhangcheng828 Thanks...

树莓派上如何使用？

遇到了同样的问题

[BUG] calling cast_smem_ptr_to_uint(device fn) from make_gmma_desc(host device fn) is not allowed

Any update on this? I got the same error and I cannot find such a special flag that can fix it. Thanks for any help!

[BUG] calling cast_smem_ptr_to_uint(device fn) from make_gmma_desc(host device fn) is not allowed

@thakkarV I follow the suggestion and copy the code provided by @lygztq into example directory to try compiling it but still get errors. Instructions I did are: ```bash # copy...

[BUG] calling cast_smem_ptr_to_uint(device fn) from make_gmma_desc(host device fn) is not allowed

I tried to add CUTE_HOST_DEVICE to `cast_smem_ptr_to_unit` and it works properly (see pr #2171).

missing tensorrt_bindings for tensorrt 10.0.1

fixed by `pip install tensorrt_cu12_libs==10.0.1 tensorrt_cu12_bindings==10.0.1 tensorrt==10.0.1 --extra-index-url https://pypi.nvidia.com`

missing tensorrt_bindings for tensorrt 10.0.1

> can we re-open this issue? ok

[QST] How to stop unroll in cute.copy in cute dsl?

Thanks a lot! The `cute.copy_atom_call` seems what I want and I will try it.

[QST][CuTe] How to dump ptxas compiling information in cute dsl?

I found that we can use driver api cuFuncGetAttribute to get local_size_bytes for analyzing register spill. The example code with cute dsl kernel is as follows (I don't know whether...

[QST][CuTe] How to dump ptxas compiling information in cute dsl?

The api is updated after cute dsl 4.3. The right way now is as following: (Note that the kernel should be compiled with --keep-cubin, and the `compiled_kernel` is the output...