WU Junyan

Results 3 comments of WU Junyan

I met the same error, and I found that the ema model has the same weight as my SFT model while actor model saved well with different weights I guess...

we usually resize the model when using add speical tokens

Maybe you are looking for [https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/modeling_deepseek.py](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/modeling_deepseek.py)