hydrated(tutor:张银奎)ic*b*rg r*bb*sh issues

Results 7 issues of


                                            hydrated(tutor:张银奎)ic*b*rg r*bb*sh

compiled to a version runs too slow

Have successfully compiled a version of apex on A100 hardware. But when running the test of fmha. It took 21 seconds to finish. What could be the cause?

hipcc does not hipify __nvvm_get_smem_pointer function

In many CUDA related project, we can see the line of code as the following: ``` extern "C" __device__ uint32_t __nvvm_get_smem_pointer(void *ptr); ``` It is used to convert the shared...

Inline assembly of DS_WRITE_B128 compile error Don't know how to handle indirect register inputs yet for constraint 'v'

By referencing [here](https://github.com/adityaatluri/gemm-vega64/blob/master/shared_ops.h), wrote the following inline assembly code: ``` inline __device__ void sts(uint32_t ptr, uint4 val) { asm volatile("DS_WRITE_B128 %0, %1;\n" : : "v"(ptr) , "v"(val)); } ``` But...

How to convert LDS memory address to the address can be passed into DS_READ_* and DS_WRITE_* instructions?

Have followed [here](https://github.com/RadeonOpenCompute/hcc/issues/693) to write the following code: The hip file: ``` #include "hip/hip_runtime.h" #include "hip/hcc_detail/device_library_decls.h" __global__ void halfVec_v_pk_sts_then_lds( uint16_t * dst , uint32_t * ptr , uint16_t * val...

Did not get w/ image Results for LMTraj-SUP model.

Followed the instructions and have run the the following commands: ` ./script/eval_all.sh ./script/eval_all_deterministic.sh ` in section **Evaluate LMTraj-SUP** [here](https://github.com/inhwanbae/lmtrajectory), but did not get w/ image results. Is there anything missing?

How to understand the organization of model weights of the PyTorch version?

On the Readme of this project, the links to both TensorFlow and PyTorch version of pretrained model weights are provided. But After an inspection of both of them, it seems...

The code of reverse engineering does not produce announced results.

Have followed the instructions on Readme.md to have configured the environment, but didn’t get the results announced in the paper by just run: `python gtsrb_visualize_example.py` The produced results as gtsrb_visualize_(mask/pattern/fusion)_label_x.png...