DeepLearningExamples icon indicating copy to clipboard operation
DeepLearningExamples copied to clipboard

[BART] Documentation references run_pretraining.py that doesn't exist

Open Lauler opened this issue 4 years ago • 2 comments

Related to Model/Framework(s) BART's documentation

Describe the bug The documentation to your BART example is unclear and ambiguous about whether it supports pre-training or not. In my opinion you should clearly state that there is no out-of-the-box support for pretraining at the top of the documentation.

For example: Under the heading "Parameters" it references arguments for run_pretraining.py.

Aside from the options to set hyperparameters, the relevant options to control the behaviour of the run_pretraining.py script are:

However, there exists no run_pretraining.py in the BART folder. The reference should likely be to finetune.py instead.

After digging around the BART DeepLearningExample codebase, I am unable to find any file that preprocesses and implements the creation of pretraining data according to the BART paper (e.g. masking words, masking sequences of words, deleting words, etc...). There is a risk that BART documentation is misleading to users, since some of your other examples (i.e. BERT) tend to have files for creating pretraining data.

In short, I would appreciate if README.md for BART was updated to not reference a pretraining script that does not exist, but also to clearly state that the codebase does not support pretraining out-of-the-box without users creating their own method for masking/corrupting input documents.

There is a different Neural Machine Translation toolkit YANMTT which implements BART style mask creation for anyone interested.

Lauler avatar Sep 27 '21 07:09 Lauler

Yes, that is a typo in the document. Thanks for the correction. The pretraining for BART is still working in progress. We will add the pretraining feature in the future.

meatybobby avatar Sep 27 '21 19:09 meatybobby

Hi, any progress ?

Hannibal046 avatar Jul 26 '22 08:07 Hannibal046