Can you give an example of the first stage of training?
Dear all, Thank you very much for your work, but I don't know how to train your first stage, can you give some examples how it works?
pretrain_coglm.py This is used for pretraining the first stage, and the launching scripts should be similar to those in https://github.com/THUDM/SwissArmyTransformer. You need to seed hyper-parameters and environments.
Hi, I want to train the model on my own dataset. How should I process the dataset? Thanks!
@victorup The dataset is binary files created by https://github.com/Sleepychord/cogdata IcetkImageTextTask, which is the fastest way to load data during training. You should prepare zip/rar/tar files for images and text, see https://github.com/Sleepychord/cogdata/tree/dev/downloads/testcase/test_image_text_tokenization_task for format examples.
Can you please share complete scripts for processing datasets and training? Because there are a lot of parameters in commands in https://github.com/Sleepychord/cogdata. I'm not very clear about that. And I also need a training script after getting the processed data. Thanks!