Shilei Liu comments

Results 10 comments of


                                            Shilei Liu

作者搞得这个东西啊，Excited

吼啊

Can not reproduce the evaluation results of small model on 6k multi-ref dataset

This is the evaluation result of the medium model and the large model. It can be seen that the gap between NIST/BLEU/DIST and the official results is relatively large. ###...

Can not reproduce the evaluation results of small model on 6k multi-ref dataset

The evaluation code is almost the same as the official. ```python # Copyright (c) Microsoft Corporation. # Licensed under the MIT license. import re from collections import defaultdict import argparse...

UPdate

适配一下python3

[BUG]: SaveCheckpointHook does not save optimizer and scheduler parameters

Hello @JThh, after uncomment the following codes, only rank 0 can save the optimizer states. However, due to the zero3 mechanism, each worker only keeps part of the optimizer states....

[BUG]: SaveCheckpointHook does not save optimizer and scheduler parameters

@JThh, this method has the same effect as the previous one.

[FEATURE]: Clear Instruction for Checkpoint

Yes, users hope to have a demo to show how to save (and load) the states of the model, optimizer, and lr scheduler in hybrid parallel scenarios, so as to...

[FEATURE]: Clear Instruction for Checkpoint

I suggest creating two unified functions to save and load the above parameters. The storage format can follow the example of deepspeed: each worker only stores and loads its corresponding...

Using Megatron Train GPT3

Hello @Kingsleyandher , I meet the same question, is your problem solved?

How to allow the merging of consecutive newline tokens \n when training a byte-level bpe tokenizer?

Hi, @Narsil @ArthurZucker I need some help.