shivadbhavsar comments

Results 15 comments of


                                            shivadbhavsar

[Issue]: Lower than expected torch.compile speedup with torch_migraphx for SD1.5 and SD2.0

You can run the pytorch sdxl on its own on your system right? In general, we try and avoid duplicating weights when compiling but sometimes the compilation steps in migraphx...

Workgroup Reversal

Initial work resulted in no perf difference. Rocprofiler results on trimmed unet: 1. using `MIGRAPHX_MLIR_USE_SPECIFIC_OPS=attention` - Cache hits are mostly the same with and without reversal (with some being considerably...

Workgroup Reversal

Next Steps: Understand cache hits with even smaller graphs 1. Performed test with `mul -> dot -> add` program which is compiled as `mul -> dot_add` where mlir_dot_add is reverse...

Prevent collapsing batch dims in dot ops with constants

SDXL Pref results for reference: Torch-MIGraphX (end to end): Before PR: 2850 ms With PR: 2801 ms ONNX Unet (4x attn trim): Before PR: 5.54 ms After PR: 5.52 ms...

Remove layernorm fusion

Even with #3659, the flux model doesnt give a proper output when using `MIGRAPHX_DISABLE_LAYERNORM_FUSION=1`. Need to resolve that before we can remove this

unregistered operation 'migraphx.max' found in dialect ('migraphx')

Agreed, is that a change in migraphx or is that something mlir needs to support first?

blas_shape: GPU_GEMM: Batch dimension is not collapsible

Small repro: ``` p = migraphx.program() mm = p.get_main_module() s1 = migraphx.shape(lens=[4096, 768], type="float_type") in1 = mm.add_parameter("x", s1) in1 = mm.add_instruction(migraphx.op("reshape", dims=[2, 2048, 768]), [in1]) in1 = mm.add_instruction(migraphx.op("reshape", dims=[2, -1,...