fpcsong

Results 5 comments of fpcsong

It does not crash directly, but it create multiple processes on cuda 0.

It is our internal tool-kits and is adapted to many transformer based models.The script ``` deepspeed --num_gpus 8 benchmark.py \ -it \ -t_data $TRAINDATA \ -te \ -v_data $EVALDATA \...

It is our internal tool-kits. In short, can you please provide your version of cuda, torch, deepspeed, flash_attn, xformers, and other key packages.

I also failed to re-produce "58.55%" and I notice that CHAN-DST has been withdrawn from ACL 2020, so be it. :joy:

I encounter this bug even after setting rope base and torch.bfloat16...