Qingyi Liu

Results 7 comments of Qingyi Liu

> We can add warp tile size to the kernel name so that we can tell if a kernel is using slice-k or not. The downside is that the kernel...

https://github.com/NVIDIA/cutlass/issues/286

> Would you please show me the kernel name of gemm and conv now? I run `python3 generator.py` and one of the generated files: cutlass_tensorop_u8_i8816gemm_u8_256x128x64_4x2x1_2_n32t32_align16.cu ``` /* Generated by gemm_operation.py...

Filenames in `generated/gemm` ``` cutlass_simt_cgemm_128x128x8_4x2x1_2_nn_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_2_nt_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_2_tn_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_2_tt_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_cc_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_ch_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_cn_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_ct_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_hc_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_hh_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_hn_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_ht_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_nc_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_nh_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_nn_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_nt_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_tc_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_th_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_tn_align1.cu cutlass_simt_cgemm_128x128x8_4x2x1_5_tt_align1.cu cutlass_simt_dgemm_128x128x8_4x2x1_2_nn_align1.cu cutlass_simt_dgemm_128x128x8_4x2x1_2_nt_align1.cu cutlass_simt_dgemm_128x128x8_4x2x1_2_tn_align1.cu cutlass_simt_dgemm_128x128x8_4x2x1_2_tt_align1.cu cutlass_simt_dgemm_128x128x8_4x2x1_3_nn_align1.cu cutlass_simt_dgemm_128x128x8_4x2x1_3_nt_align1.cu...

> The conv kernel name is still the old one. Is it your intention to keep conv kernel name unchanged? Oh, my mistake :sweat_smile: I just updated this commit. Please...

Filenames generated for conv2d: ``` cutlass_simt_cf32_cdgrad_analytic_cf32_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_cf32_cdgrad_analytic_cf32_128x128x8_4x2x1_5_nhwc.cu cutlass_simt_cf32_cdgrad_optimized_cf32_128x128x8_4x2x1_2_nhwc_unity_stride.cu cutlass_simt_cf32_cdgrad_optimized_cf32_128x128x8_4x2x1_5_nhwc_unity_stride.cu cutlass_simt_cf32_cfprop_analytic_cf32_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_cf32_cfprop_analytic_cf32_128x128x8_4x2x1_5_nhwc.cu cutlass_simt_cf32_cfprop_optimized_cf32_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_cf32_cfprop_optimized_cf32_128x128x8_4x2x1_5_nhwc.cu cutlass_simt_cf32_cwgrad_analytic_cf32_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_cf32_cwgrad_analytic_cf32_128x128x8_4x2x1_5_nhwc.cu cutlass_simt_cf32_cwgrad_optimized_cf32_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_cf32_cwgrad_optimized_cf32_128x128x8_4x2x1_5_nhwc.cu cutlass_simt_sdgrad_analytic_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_sdgrad_analytic_256x128x8_4x2x1_5_nhwc.cu cutlass_simt_sdgrad_optimized_128x128x8_4x2x1_2_nhwc_unity_stride.cu cutlass_simt_sdgrad_optimized_256x128x8_4x2x1_5_nhwc_unity_stride.cu cutlass_simt_sfprop_analytic_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_sfprop_analytic_256x128x8_4x2x1_5_nhwc.cu cutlass_simt_sfprop_optimized_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_sfprop_optimized_256x128x8_4x2x1_5_nhwc.cu cutlass_simt_swgrad_analytic_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_swgrad_analytic_256x128x8_4x2x1_5_nhwc.cu cutlass_simt_swgrad_optimized_128x128x8_4x2x1_2_nhwc.cu cutlass_simt_swgrad_optimized_256x128x8_4x2x1_5_nhwc.cu cutlass_tensorop_bf16_s16816dgrad_analytic_bf16_256x128x32_4x2x1_3_nhwc.cu...

你好,根据你提供的日志,报错地点在这里: ``` def batch(self) -> Iterator[List[Any]]: r""" Batch method provides a batch indices generator. """ indices = list(self.sample()) # user might pass the world_size parameter without dist, # so dist.is_distributed()...