Muggle666

Results 1 issues of Muggle666

I am trying to enable cpu activation offload when training my custom LLAMA model. However, an error occur: ![image](https://github.com/microsoft/DeepSpeed/assets/104556055/4a500207-faa4-4414-968f-a9ff36b6f7aa) It seems like some inputs of the attention operation is offloaded...

bug
training