WU Junyan
Results
3
comments of
WU Junyan
I met the same error, and I found that the ema model has the same weight as my SFT model while actor model saved well with different weights I guess...
we usually resize the model when using add speical tokens
Maybe you are looking for [https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/modeling_deepseek.py](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite/blob/main/modeling_deepseek.py)