Zhaofeng Lin
Zhaofeng Lin
I also have the same question, but it looks like the scores are merged in [self.merge_scores](https://github.com/espnet/espnet/blob/0d0428d3498a904fc5ee63e218fa392da7807a9b/espnet/nets/beam_search.py#L371)
I also have this issue, I find that renaming "config.yaml" in folder /configs works for me.
@ghostplant Thanks for the quick response. I removed `system.cache()`, because I added the aux_loss alongside with `return x, attn, aux_loss` in the transformer layer and stacked the aux_loss from each...
Hi, thanks for the very detailed explanation! I very appreciate your help. I think I have made it work by setting 'skip_allreduce' to True, and also inequivalent_tokens=True in forward (due...
However, I still remain confused about the `parallel_type`. I set `parallel_type = "data" `, and basically had `num_experts_per_device = int(args.moe_expert_num / num_gpus)`. In my current case, moe_expert_num=8, num_gpus=2, so num_experts_per_device=4....
Can I also ask about the checkpoints saving when training on multiple gpus? I'm checking https://github.com/microsoft/Tutel/blob/main/doc/CHECKPOINT.md and also some issues. It seems it's needed to save the checkpoints for different...
Thanks loads for the help!!! I will try figuring out how to save checkpoints for different ranks on Faiseq.
Hi, I'm reopening this issue because I encountered some problems: As you mentioned before ``` qkv_prog: 1.2 qkv_prog: 1.2 (expected to be identical) gate.wg: -0.6 gate.wg: -0.6 (expected to be...