Cerberous issues

Results 9 issues of


                                            Cerberous

[QA] Internevo是否支持tied_embedding?

### Describe the question. 请问 Internevo是否支持tied_embedding?有的话怎么使用呢？

question

[Bug] 仅支持了GShard模式的MoE模型转huggingface

### Describe the bug 1. 之前给出的脚本仅支持了GShard MoE训练的方式转化hf的脚本，但是如果用MegaBlock进行训练的话权重转换脚本就不适用了。 2. 仍然未提供已经训练好的Internevo的权重转换成internevo MoE权重的脚本。 ### Environment 官方镜像 ### Other information _No response_

bug

[Bug] 训练bf16 infer fp16出现NaN

### Describe the bug 我来重新描述一下我的问题，我在用internevo训练的时候用的bf16，然后转换成hf后用fp16推理遇到了下述报错 ``` Traceback (most recent call last): File "/InternLM/hf_test.py", line 15, in output = model.generate(**inputs, **gen_kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File...

bug

[Bug] 使用internevo训练，转换成hf模型用opencompass测试时候有一定概率会nan

### Describe the bug 使用internevo训练，转换成hf模型用opencompass测试ppl的时候有一定概率会nan，opencompass默认是用fp16测试的，是因为这个原因导致的嘛？切换成bf16后这个问题能够解决，但是其他的hf模型并没有这个问题，请问和use_fp32_norm有关嘛，训练用的bf16 ### Environment 官方镜像 ### Other information _No response_

bug

[Bug] 好像没有把internevo的MoE权重转换成huggingface版本的脚本？

### Describe the bug 我好像没有找到用internevo训练然后转换成对应的hf的脚本？请问有提供嘛？ ### Environment 官方代码 ### Other information _No response_

bug

[Bug] TFLOPS计算不准

### Describe the bug 现在Internevo代码中的tflops直接按照公式计算，但是当使用tp或者pp的时候模型被切开了，导致tflops不准确 ### Environment 官方镜像代码 ### Other information _No response_

bug

[Bug] 用MoE训练的时候tflop超级低

### 描述该错误训练MoE模型时，模型的tflops只有几十，正常训练的时候是正常的 ### 环境信息官方镜像代码 ### 其他信息 _No response_

bug

[QA] 用Internevo已经训练出来了一个7B模型，如何用这个internevo权重跑MoE？

### Describe the question. 我用internevo跑了一个7B的模型，拿到了一个internevo的模型权重，现在我要基于这个权重跑一个MoE的模型，我发现load进来会报这个错，请问如何解决？ AssertionError: /beegfs/workspace/nlp/leo/model_ckpt/7B_v7/715255/model_moe_layer0_expert0_tp0.pt is not found!

question

[QA] Internevo这个框架里面MoE支持expert parallel嘛？

### Describe the question. 请问大佬们 Internevo这个框架里面MoE支持expert parallel嘛？如果有的话怎么使用呢？不然直接训练MoE感觉tflops很低

question