ivankxt issues

Results 2 issues of


                                            ivankxt

分布式训练为什么出现 Variables not initialized？

hello，咨询一下，我们进行分布式训练时（1 PS + 2 Worker），跑的是官网ppi数据模型，启动worker-0启动后会立刻才是计算，如下： ==> /tmp/log.worker.0

基于school_math_0.25M.json数据集进行微调训练后得到的模型推理效果很差，是什么原因？

deepspeed --num_gpus=4 --master_port $MASTER_PORT main.py \ --deepspeed deepspeed.json \ --quantization_bit 8 \ ... 在V100机器上进行4卡训练，加上--quantization_bit 8避免oom，训练一个epoch后，得到的模型进行推理，推理效果非常差。另外通过web_demo2.py启动web服务，经常回答输出一点就停了，观测推理进程是正常的。 `tokenizer = AutoTokenizer.from_pretrained("/xxx/ChatGLM2-6B/THUDM/chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("/xxx/ChatGLM2-6B/output/adgen-chatglm2-6b-ft-1e-4/checkpoint-15000", trust_remote_code=True).cuda(1)`