Song Yinghao
Song Yinghao
Hi, I am trying the 'cifar10_deepspeed.py' [here](https://www.deepspeed.ai/tutorials/cifar-10/) on a single node (2x3090). When I run the command below: ```bash deepspeed cifar10_deepspeed.py --deepspeed_config ds_config.json ``` The bug occurs: ```bash [2023-12-31 16:33:59,636]...
 Thank you so much for this enlightening and inspiring work.I don't quite understand why **LKC(i.e. Large Kernel Convolution) = DW-Conv + DW-D-Conv + 1x1Conv.** My current understanding can refer...
Hi, I successfully ran the ['cifar10_deepspeed.py' ](https://www.deepspeed.ai/tutorials/cifar-10/)example on a single node (2xNVIDIA 3090). Now I want to run the same program on multi-nodes (2 nodes each have 2 3090s.). I...
I am new to deepspeed. I noticed there are no '--hostfile' arg in the train command. : )