hjc3613 comments

Results 12 comments of


                                            hjc3613

predict result repeat the same result

I reprocess the train/valid/test files with spacy tokenization, and using opennmt-tf re-train the data, now the result becames normal . the only problem is tha the ppl on valid dataset...

freezing layers have differenct behaves for different models

> @hjc3613 sorry for the inconvenience, this feature is not well tested thats why we didn't mention it much. If you are interested, would love to work with you and...

Can't find adapter_config.json

> > 参考 #361 十分感谢！

weight must be 2-D error

> 是stage 3下才有的错误吗，stage 2能跑么？ stage2下没跑，因为单卡现存小，只有12G，会爆显存，一共6张12G的卡，貌似只能尝试stage3，我想在CPU下调试，但是加了--no_cuda后不起作用，依然会挪到GPU上，如果方便的话，麻烦告知一下怎么在cpu上跑？

> > 是stage 3下才有的错误吗，stage 2能跑么？ > > stage2下没跑，因为单卡现存小，只有12G，会爆显存，一共6张12G的卡，貌似只能尝试stage3，我想在CPU下调试，但是加了--no_cuda后不起作用，依然会挪到GPU上，如果方便的话，麻烦告知一下怎么在cpu上跑？我用的是alpaca模型，就是将原始llama+chinese_llama_lora_plus+_chinese_alpaca_lora_plus三个模型合并之后，再去微调的，这个步骤有问题吗？还是说应该在 llama+chinese_llama_lora_plus这两个模型进行合并去微调，不该加alpaca adapter？

weight must be 2-D error

> 查到一些信息：是因为用了stage3，有人跟我遇到一样的情况：https://github.com/huggingface/transformers/issues/22705，https://github.com/microsoft/DeepSpeed/issues/842，https://discuss.huggingface.co/t/deepspeed-zero3-does-not-work-with-diffusion-models-does-anyone-know-how-to-fix-this/36293，楼主如果能有解决方案的话，烦请及时告知，不胜感激。

does deepspeed support pure bf16 training?

> Hi, @hjc3613 , you can offload to nvme instead of cpu memory, please checkout out [nvme offload](https://www.deepspeed.ai/tutorials/zero/#offloading-to-cpu-and-nvme-with-zero-infinity). thanks for your reply, I have test nvme offload, but failed, related...