HYZhao
HYZhao
I also met the same ERROR on my Mac(Mojave) changed variable names and compiled successfully, then broken in ./train with the same ERROR following @sylviawangfr 
I had the same problem, now what's the solution to fixing this randomness?
Ok, thanks, I forgot to read this one, I'll give it a try, but I feel the documentation should be accompanied by some code examples
I see, thank you very much for your timely reply, I am experimenting
I have a new problem, and I think I fulfilled the predicate normally ``` // Allocate predicate tensors for m and n auto tApA = make_tensor(make_shape(size(tAsA_copy), size(tAsA_copy)), Stride{}); auto tBpB...
Supplementary gA printing thread 0 step 0:  thread 0 step 1:  post my run code: ``` #include "helper.h" #include #include #include #include template void gen_rand_data(T *data, int n);...
> It is. Our Sm80 mainloop implements predication: https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/collective/sm80_mma_multistage.hpp#L504 Where is there a unit test or something that allows me to run this code?
Thank you for your recovery, it helped me dispel part of the fog, I will try again; I do work on fp16 now; > assume you are using fp16 on...
I used stream_k mode for my calculations and found that executing `device_gemm()` was actually relatively faster, but the total time taken by these lines of code in this mode was...