HYZhao

Results 16 comments of HYZhao

I also met the same ERROR on my Mac(Mojave) changed variable names and compiled successfully, then broken in ./train with the same ERROR following @sylviawangfr ![image](https://user-images.githubusercontent.com/37064550/56031015-e0b16600-5d50-11e9-80ca-4e0b01dbc735.png)

I had the same problem, now what's the solution to fixing this randomness?

Ok, thanks, I forgot to read this one, I'll give it a try, but I feel the documentation should be accompanied by some code examples

I see, thank you very much for your timely reply, I am experimenting

I have a new problem, and I think I fulfilled the predicate normally ``` // Allocate predicate tensors for m and n auto tApA = make_tensor(make_shape(size(tAsA_copy), size(tAsA_copy)), Stride{}); auto tBpB...

Supplementary gA printing thread 0 step 0: ![image](https://github.com/NVIDIA/cutlass/assets/37064550/d00281ba-c516-4171-8392-59af5d766e9d) thread 0 step 1: ![image](https://github.com/NVIDIA/cutlass/assets/37064550/d552fb76-424d-4d0f-9ec7-7c01300d8c90) post my run code: ``` #include "helper.h" #include #include #include #include template void gen_rand_data(T *data, int n);...

> It is. Our Sm80 mainloop implements predication: https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/collective/sm80_mma_multistage.hpp#L504 Where is there a unit test or something that allows me to run this code?

Thank you for your recovery, it helped me dispel part of the fog, I will try again; I do work on fp16 now; > assume you are using fp16 on...

I used stream_k mode for my calculations and found that executing `device_gemm()` was actually relatively faster, but the total time taken by these lines of code in this mode was...