sunyuanx22
sunyuanx22
现在ray进行训练以及生成的时候,时间顺序上是分开的,导致训练模型的卡在生成的时候不参与,模型训练的时候用来生成的卡不参与,感觉效率有点低,有没有可能可以协作?
**Your question** Ask a clear and concise question about Megatron-LM. Could you let me know which version I should revert to if I want to use the previous checkpoint storage...
File "/share//code//NeMo/examples/nlp/language_modeling/megatron_ckpt_to_nemo.py", line 245, in convert(local_rank, rank, world_size, args) File "/share//code//NeMo/examples/nlp/language_modeling/megatron_ckpt_to_nemo.py", line 198, in convert model = MegatronGPTModel.load_from_checkpoint(checkpoint_path, hparams_file=args.hparams_file, trainer=trainer) File "/usr/local/lib/python3.10/dist-packages/nemo/collections/nlp/models/nlp_model.py", line 380, in load_from_checkpoint model = ptl_load_state(cls, checkpoint,...
### Required prerequisites - [x] I have read the documentation . - [x] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/omnisafe/issues) and [Discussions](https://github.com/PKU-Alignment/omnisafe/discussions) that this hasn't already been reported. (+1 or comment...