Rasmus

Results 20 comments of Rasmus

I think the example conversion script is perhaps not very good. One thing that helps a lot is to use the Datasets `.map()` to batch tokenize the dataset. I'm not...

> Hi @rlrs ! Could you share the script to transform the weights from HF to dcp? Thanks! I'm using a modified script based on gpt-fast, will paste it here....

Thanks for asking over there. I didn't try to download the weights from anywhere other than HF, but I would be a bit surprised if there's some simple transformation you...

As discussed in the HF issue, there is indeed a permutation of the weights that causes the two implementations to be equivalent. I don't believe anything needs to be done...

Looking more into this, `HSA_OVERRIDE_GFX_VERSION` does impact what happens. Given that MI250X is on the gfx90a architecture, I tried `HSA_OVERRIDE_GFX_VERSION=9.0.0` which at least gives another error, ``` HSA_OVERRIDE_GFX_VERSION=9.0.0 python benchmark_throughput.py...

Sorry, I haven't tried this for months, so I don't know if it's fixed. I might have a chance to try it in a few weeks, but not before then.

Interested in getting this merged, willing to help if needed.

This is still an issue using the latest images with Apptainer.

Also voicing my interest in making this work, especially for fp8. I've been investigating the 'missing' kernel, but it's not really missing; `scaled_mm_sm100_fp8` is compiled for the correct sm120 target...

Wow, I didn't realize that it's not at all compatible. I tried ```python if (version_num == 120) { cutlass_scaled_mm_sm89(c, a, b, a_scales, b_scales, bias); return; } ``` in `scaled_mm_entry.cu` and...