BitCalSaul

Results 32 comments of BitCalSaul

I encounter the same problem when i used ddp with batch size =2. ``` UserWarning: Grad strides do not match bucket view strides. This may indicate grad was not created...

> Hi, I used `wandb.run.log_code(".")` but it still logs only the driver.py file to local wandb folder even though it logs all the library I am using to cloud. Here...

> > hi @hailuoS @LvQiangWen did you find the answer > > You can try my method. Bro, thanks for your reply. i solved this issue but still appreciate it...

Hi @Taimoor-R I also have interest in developing a model that performs this function and am also in the process of figuring out how to adjust the model to predict...

I'm using this commit from https://github.com/KimmiShi/DeepSpeed/tree/flops_profiler_attn since I want to get flops for @ operation in transformer-based models, which the released version doesn't have.

Hi, may i ask if there is any plan to release the public code recently?