MFTCoder
MFTCoder copied to clipboard
How can i do continue pretraining?
can you share the continue pretraining scripts and sample data?
We apologize for the delayed response. To address your issue, please follow these steps:
- First, use the
run_offline_tokenization.shscript to tokenize your data. - Then, make the following modifications in the
full_train_config.jsonfile:- "data_paths": "[/workspace/data/data1,/workspace/data/data2]"
- "data_weights": "[1.,1.]"
- "tokenize_mode": "sst"
- In the
ds_multinode_launch.shfile, modify the entry script frompefts/mft_accelerate.pytompt/mpt_accelerate.py.
Let us know if you need any further clarification or have additional questions.