CogView2 Can you give an example of the first stage of training?

Dear all, Thank you very much for your work, but I don't know how to train your first stage, can you give some examples how it works?

Sep 26 '22 02:09 yangsenwxy

pretrain_coglm.py This is used for pretraining the first stage, and the launching scripts should be similar to those in https://github.com/THUDM/SwissArmyTransformer. You need to seed hyper-parameters and environments.

Sep 26 '22 02:09 Sleepychord

Hi, I want to train the model on my own dataset. How should I process the dataset? Thanks!

Jan 30 '23 02:01 victorup

@victorup The dataset is binary files created by https://github.com/Sleepychord/cogdata IcetkImageTextTask, which is the fastest way to load data during training. You should prepare zip/rar/tar files for images and text, see https://github.com/Sleepychord/cogdata/tree/dev/downloads/testcase/test_image_text_tokenization_task for format examples.

Feb 01 '23 06:02 Sleepychord

Can you please share complete scripts for processing datasets and training? Because there are a lot of parameters in commands in https://github.com/Sleepychord/cogdata. I'm not very clear about that. And I also need a training script after getting the processed data. Thanks!

Feb 06 '23 16:02 victorup