kiseliu issues

Results 5 issues of


                                            kiseliu

关于 PLATO-2 和 PLATO 的模型区别

除了论文中提到的 pre-norm 和 post-norm 的区别，以及 tokenizer 的区别，我对比了下 plato 的网络结构和 plato-2 (stage 2.1 PLATO模型) 的网络结构，发现也有细微区别： 1、在预测 latent variable 的时候，plato 1 中的实现的是 mask token 的 final hidden state 经过 post_network；而plato-2...

For the error of running the command 'bash ./scripts/local/job.sh ./projects/PLATO-2/pretrain/24L_infer.conf'

For the following parameters in the config of https://github.com/PaddlePaddle/Knover/blob/develop/projects/PLATO-2/pretrain/24L_infer.conf: ``` 16 init_params="./24L/Plato" 17 nsp_init_params="./24L/NSP" ``` How can I get these two models? Do I need transform from the model https://dialogue.bj.bcebos.com/Knover/projects/PLATO-2/24L.tar...

kiseliu

关于 PLATO-2 和 PLATO 的模型区别

For the error of running the command 'bash ./scripts/local/job.sh ./projects/PLATO-2/pretrain/24L_infer.conf'

The OOM problem caused by the Transformers version

Question about Memory usage (GB) when training LLaMA-7B under different settings.

Cannot reproduce the results shown in Github repo with the 120M reference model on A800 (8*80G).