Weihang Wang

Results 10 comments of Weihang Wang

Hello, I have received your email.

> Hey! Not a paper author here, but I'm currently working on reproducing the results of OpenMoe paper specificaly on token routing. Take a look: https://github.com/Misterion777/moe-experiments/blob/main/notebooks/routing_eda.ipynb Would appreciate any collaboration!...

Hello, I have received your email.

Why have you added warnings only for the initialization process and not for renaming during loading as well? The model I'm using is timm's convnext (which is even the companion...

> One more thing is that the model you are using is not quantized to FP8. It is FP16. Hello, thank you for your reply. My launch command follows the...

> One more thing is that the model you are using is not quantized to FP8. It is FP16. I'm curious about this. According to the calculations on the website...

> > Have you guys added special tokens to your tokenizer but do not resize lm_embedding leads to a mismatch between labels class and lm_head. It seems that they are...

> > 如果显存报OOM 建议减小max_length的长度 以及打开offload > > 挺奇怪的是,如果进行简单的显存需求计算,32B模型需要的显存为32✖️16=512GB,zero 3策略的话,除以8等于64GB,就是说,再不考虑激活值的情况下,单卡需要六十多GB的显存,那么A800显卡应该是够的。但是现在OOM了,就挺奇怪的 你好,想请教一下这个计算方式有参考链接吗