Echo Valor issues

Results 12 issues of


                                            Echo Valor

很强，学习学习

Value Error("Please setting '--model_type <model_type>'")

**Describe the bug** 推理llama-3-8B模型显示没有这个类型的模型 ![image](https://github.com/modelscope/swift/assets/28260618/5ce84365-e23b-4883-b43d-a37f25e33b6c) **Your hardware and system info** 这个是因为还未支持llama3吗？看README说已经支持了 **Additional context**

question

What to support GPT-4O tokenizer？

Error response from daemon: unauhorized: authentication required

**Describe the bug** ![image](https://github.com/NVIDIA/NeMo/assets/28260618/74b49ff2-3f2e-4536-b136-3732d6bb3600) When pulling Nemo mirror for training: 23.08.03 image, there is an issue with authentication required. **Steps/Code to reproduce bug** **Expected behavior** How to solve this bug?

bug

stale

参数设置疑问

很有价值的工作，但还有两个问题想请教一下： - Q1：Weight Decay和Gradient Clip与其他开源模型似乎不一致，一般Weight Decay设置为0.01，梯度裁减设置为1，请问贵团队这样设计的理由及物理含义是什么？ - Q2：贵团队有打算开源预训练数据集吗？ Thanks

模型何时能开源

如题

请问这边是怎样通过tiktoken获取到词表文件的

在tiktoken中调用什么方法可以获得词表

更大模型进行编辑会报OOM问题

如何使用多卡进行编辑？使用--checkpointing 参数也会OOM，如13B 70B等

When will megatron Flash attention 3 be supported？

**Your question** Flash attention 3 has been updated, link: - Blogpost: https://tridao.me/blog/2024/flash3/ - Paper: https://tridao.me/publications/flash3/flash3.pdf Megatron's support for Flash attention 3 to improve training efficiency。FlashAttention-3 is optimized for Hopper GPUs...

Using Megatron backend, OOM occurs when running the PPO of qwen25-32b model on 4-node H800

- The error is as follows: ![Image](https://github.com/user-attachments/assets/ee57b438-6af6-4ea7-85aa-80fdcc66a90c) - The training parameters are set as follows: python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/train.parquet \ data.val_files=$HOME/test.parquet \ data.train_batch_size=32 \ data.max_prompt_length=1024 \ data.max_response_length=1024 \ actor_rollout_ref.model.path=/private/online_llf/model/Qwen2.5-32B-Instruct...