Shilei Liu

Results 10 comments of Shilei Liu

This is the evaluation result of the medium model and the large model. It can be seen that the gap between NIST/BLEU/DIST and the official results is relatively large. ###...

The evaluation code is almost the same as the official. ```python # Copyright (c) Microsoft Corporation. # Licensed under the MIT license. import re from collections import defaultdict import argparse...

适配一下python3

Hello @JThh, after uncomment the following codes, only rank 0 can save the optimizer states. However, due to the zero3 mechanism, each worker only keeps part of the optimizer states....

@JThh, this method has the same effect as the previous one.

Yes, users hope to have a demo to show how to save (and load) the states of the model, optimizer, and lr scheduler in hybrid parallel scenarios, so as to...

I suggest creating two unified functions to save and load the above parameters. The storage format can follow the example of deepspeed: each worker only stores and loads its corresponding...

Hello @Kingsleyandher , I meet the same question, is your problem solved?