Muchen Li comments

Results 7 comments of


                                            Muchen Li

an error of ddar

I run into those too, there seems to be some bug in DDAR

Could not find TensorRT, resulting inference without GPU

Happens to me as well, seems no trivial solution because the package dependency on cuda is pretty outdated

OOM with 8 A800

I ran into a similar issue with a deepseek-math-7b model on a100 80G GPU, cannot get things to work with er_device_train_batch_size = 1 and gradient_accumulation_steps = 1, I suspect there...

OOM with 8 A800

Hi, thanks so much for the reply! I'm pretty sure all memory was eaten up by this single process. I did a very detailed verification and see the forward process...

OOM with 8 A800

Actually I tried zephyr-7b-sft-full with the orginal setting on my data, I was able to get training going on with per-device batch size = 8, but not with deepseek some...

OOM with 8 A800

thanks for the pointer, I'll take a look

关于Qwen2.5 VL在一些视频benchmark上的结果

same issue appears for Qwen3-VL