Prachi Gupta

Results 7 issues of Prachi Gupta

Fixes #ISSUE_NUMBER cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

module: rocm
open source
topic: not user facing

Fixes the following unit test failure: ``` distributed/tensor/parallel/test_micro_pipeline_tp.py::MicroPipelineTPTest::test_fuse_all_gather_scaled_matmul_A_dims_2_gather_dim_0 distributed/tensor/parallel/test_micro_pipeline_tp.py::MicroPipelineTPTest::test_fuse_scaled_matmul_reduce_scatter_A_dims_2_scatter_dim_0 With the errors: RuntimeError: torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+...

oncall: distributed
module: rocm
open source
ciflow/trunk
topic: not user facing
ciflow/periodic
rocm
rocm priority
ciflow/rocm

## Checklist before submitting - [x] Did you read the [contributor guide](https://github.com/horovod/horovod/blob/master/CONTRIBUTING.md)? - [ ] Did you update the docs? - [ ] Did you write any tests to validate...

wontfix

With the latest IFU into rocm6.3_internal_testing branch, we pulled in SymmetricMemory code which is being used by intra node communication also. SymmetricMemory also introduces a new memory allocator called CUDASymmetricMemoryAllocator....

On ROCm, hipification converts std::min to ::min, but ::min is not returning the right result. This impacts index_add_ operation on a large tensor, we end up picking the large values...

Tested using this action run: https://github.com/pragupta/pytorch/actions/runs/18321214861/job/52174466415 example PR: https://github.com/pragupta/pytorch/pull/23

We’ve created a new project to track all ROCm-specific skipped UTs: https://github.com/orgs/pytorch/projects/146/ . However, we’re currently unable to assign issues within this project to other PyTorch contributors on our team....