fpcsong
fpcsong
It does not crash directly, but it create multiple processes on cuda 0.
It is our internal tool-kits and is adapted to many transformer based models.The script ``` deepspeed --num_gpus 8 benchmark.py \ -it \ -t_data $TRAINDATA \ -te \ -v_data $EVALDATA \...
It is our internal tool-kits. In short, can you please provide your version of cuda, torch, deepspeed, flash_attn, xformers, and other key packages.
I also failed to re-produce "58.55%" and I notice that CHAN-DST has been withdrawn from ACL 2020, so be it. :joy:
I encounter this bug even after setting rope base and torch.bfloat16...