Echo Valor

Results 12 issues of Echo Valor

**Describe the bug** 推理llama-3-8B模型显示没有这个类型的模型 ![image](https://github.com/modelscope/swift/assets/28260618/5ce84365-e23b-4883-b43d-a37f25e33b6c) **Your hardware and system info** 这个是因为还未支持llama3吗?看README说已经支持了 **Additional context**

question

**Describe the bug** ![image](https://github.com/NVIDIA/NeMo/assets/28260618/74b49ff2-3f2e-4536-b136-3732d6bb3600) When pulling Nemo mirror for training: 23.08.03 image, there is an issue with authentication required. **Steps/Code to reproduce bug** **Expected behavior** How to solve this bug?

bug
stale

很有价值的工作,但还有两个问题想请教一下: - Q1:Weight Decay和Gradient Clip与其他开源模型似乎不一致,一般Weight Decay设置为0.01,梯度裁减设置为1,请问贵团队这样设计的理由及物理含义是什么? - Q2:贵团队有打算开源预训练数据集吗? Thanks

在tiktoken中调用什么方法可以获得词表

如何使用多卡进行编辑?使用--checkpointing 参数也会OOM,如13B 70B等

**Your question** Flash attention 3 has been updated, link: - Blogpost: https://tridao.me/blog/2024/flash3/ - Paper: https://tridao.me/publications/flash3/flash3.pdf Megatron's support for Flash attention 3 to improve training efficiency。FlashAttention-3 is optimized for Hopper GPUs...

- The error is as follows: ![Image](https://github.com/user-attachments/assets/ee57b438-6af6-4ea7-85aa-80fdcc66a90c) - The training parameters are set as follows: python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/train.parquet \ data.val_files=$HOME/test.parquet \ data.train_batch_size=32 \ data.max_prompt_length=1024 \ data.max_response_length=1024 \ actor_rollout_ref.model.path=/private/online_llf/model/Qwen2.5-32B-Instruct...