Steven Chen

Results 5 comments of Steven Chen

Same here for LSTM examples. I used Torch Profiler to see the performance difference: CPU: MPS:

> The spike in microsecond-level overhead (CPU time avg) was discussed [here](https://github.com/pytorch/pytorch/issues/82707#issuecomment-1204672455). I think I’ve found a solution to it, but haven’t put it into practice with an RNN. Any...

Same issue, my deepspeed config is: ``` { "fp16": { "enabled": "auto", "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "scheduler": { "type": "WarmupLR", "params": { "warmup_min_lr":...