Li Li
Li Li
Make @weihanmines's PR https://github.com/ROCmSoftwarePlatform/FBGEMM/pull/13 upstreamable. @sryap, would you please review the PR and consider converting it to a draft? Thank you.
The credits of the pipelined-hip implementation belong to @carlushuang.
**Describe the bug** see the following error message when running the example in https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-scripts#run `deepspeed --num_gpus 8 bloom-inference-scripts/bloom-ds-inference.py --name microsoft/bloom-deepspeed-inference-fp16` ``` │ /opt/conda/lib/python3.8/site-packages/deepspeed/inference/engine.py:388 in _load_checkpoint │ │ │ │ 385...
`./tools/profiler/cutlass_profiler --m=16 --n=16 --k=1024 --A=fe5m2:\* --B=fe5m2:\*` works for me just fine, or any other combination of fp8 types, layouts etc. I also noticed that your A type if fp16 but...
Fix the v2 kernel forward_test for ROCm. One more unit test failure (test_forward_gpu_uvm_cache_fp16) is under investigation.