zhenhao Shang
Results
4
comments of
zhenhao Shang
嗨嗨嗨兄弟我找到错了,44应该是你用的数据集的class数量吧 在teacher.py里面,teacher模型在初始化的时候根本没用到nc参数,所以默认直接用的80,所以前向传播出来的值的那个维度就是80。所以把teacher.py里的初始化按照train_distill里面初始化student模型的方法重写一遍就好了
> > huh there's no requirement d_state / head_dim % 8 == 0 there's d_model / head_dim % 8 == 0 you can try the dimensions similar to the language...
OK I understand, thank you