Haocheng Xi

Results 12 comments of Haocheng Xi

You can try to use the WSL2, a linux subsystem work on windows computer :)

> There is a plan to support them, probably after triton-mlir is merged. As @Jokeren mentioned, we could probably get some slow support working pretty easily, but getting it right...

On my 4090 your code gives CUBLAS: 137.680676, Triton: 205.643462. My triton is a nightly built version at the end of Nov.

I am facing the similar problem: when I set num_gpu=2 and add gradient_accumulation_steps=4 (which makes the batch size still 32), the average of 5 random seeds on CoLA of roberta-large...

If you use ln -s /.cache /root/,cache since the space of /root/ is limited, you need to export HF_HOME=. This solves my problem.

I have exactly the same problem. 1 node works, but 2 node fails. I think this is a problem on huggingface side.

I met similar question with you: My model gives ########## First turn ########## score model turn zephyr-7b-dpo-full-self-ref 1 7.79375 zephyr-7b-dpo-full-self 1 7.43750 zephyr-7b-sft-full-self-ref 1 6.63125 zephyr-7b-sft-full-self 1 6.39375 ########## Second...

where the models ends with '-ref' is the official checkpoint from huggingface, and models ends with '-self' are my models when reproducing the experiment.

I further do evaluation on some other datasets: the alignment-handbook/zephyr-7b-dpo-full model still performs worse than HuggingFaceH4/zephyr-7b-beta. ![image](https://github.com/huggingface/alignment-handbook/assets/87399272/b030a1d8-e414-4b8e-95c5-e818229b4bfc)

For scripts/sampling/simple_video_sample.py vscode also can not view the mp4 file. after downgrade to imageio[ffmpeg]==2.26.1 it works fine.