MFTCoder icon indicating copy to clipboard operation
MFTCoder copied to clipboard

How can i do continue pretraining?

Open hwaking opened this issue 1 year ago • 1 comments

can you share the continue pretraining scripts and sample data?

hwaking avatar Mar 18 '24 06:03 hwaking

We apologize for the delayed response. To address your issue, please follow these steps:

  1. First, use the run_offline_tokenization.sh script to tokenize your data.
  2. Then, make the following modifications in the full_train_config.json file:
    • "data_paths": "[/workspace/data/data1,/workspace/data/data2]"
    • "data_weights": "[1.,1.]"
    • "tokenize_mode": "sst"
  3. In the ds_multinode_launch.sh file, modify the entry script from pefts/mft_accelerate.py to mpt/mpt_accelerate.py.

Let us know if you need any further clarification or have additional questions.

GoneZ5 avatar Mar 19 '25 07:03 GoneZ5