wintersurvival
wintersurvival
I encountered the same problem. Did you solve it?
Thanks. Could the op using multi-gpu in training?
Using 8 GPU with Horovod, GPU 0 has additional 7 processes, making GPU 0 using much more memory than other GPUs. What cause that? Did I do anything wrong?
> Hi, I got the same problem, have you solved it? Not yet. Maybe there is problem in code. I'll look into the code if I have time. If you...
commenting out importance_processer in transformers/models/bert/modeling_bert_moe.py will work: #self.importance_processor = ImportanceProcessor(config, layer_idx, config.moebert_expert_num, 0) #if not self.importance_processor.is_moe: # raise RuntimeError("Need to turn the model to a MoE first.") @shadymcy @CaffreyR @Harry-zzh