DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[REQUEST] Example of H5 dataloader based training on azure VM for multi-node

Open vishalghor opened this issue 3 years ago • 0 comments

Is your feature request related to a problem? Please describe. Deepspeed being a library for high speed training large model but most of the DL developers use Azure VMs and run multi-node training with their data being as H5 files. But there is no clear indication if H5 files are supported and how the training with deepspeed is being setup with H5 for multinode training. There is explanation on the communication time speed-ups when using multi-node training

Describe the solution you'd like An example of deepspeed training with data being stored as H5 files and used for training on Azure VMs under multi-node scenario

Examples like this will help in wider adoption and building efficient training pipeline using deepspeed

vishalghor avatar Aug 10 '22 21:08 vishalghor