Chen Jie comments

Repositories
Issues
Comments

Results 4 comments of


                                            Chen Jie

fine tuning the updated Phi-2 with flash-attn-2 produces very high loss > 2

The first graph is a comparison between using and not using flash attention 2. It seems that the loss doesn't change much with fa2 (yellowish curve).

about training code

> will the training code be released? We will organize and release the training code as soon as possible. The continual pre-training is done based on the slurm workload manager.

We are quite sorry for the delayed release of the code. We just released the [code](https://github.com/RUC-GSAI/Llama-3-SynE/blob/main/src) used for continual pre-training and data preparation. The code contains detailed documentation comments.

性能评测

[LLMBox](https://github.com/RUCAIBox/LLMBox).