hetseq
hetseq copied to clipboard
Ask a question about your Heterogeneous structure
Hello there! Thanks for your work. But I do have a little question about your Heterogeneous structure code. If there is an A100 on one node and a V100 on the other node, is hetseq able to apply different batchsize to different device? e.g. let batchsize be 16 on A100 and 8 on V100. You know A100 is better than V100, so I want to let A100 compute more. Great thanks!
You definately can do that. Since you have A100, probabily one A100 is good enough to train a large language model.