Echo Valor
Echo Valor
很强,学习学习
**Describe the bug** 推理llama-3-8B模型显示没有这个类型的模型  **Your hardware and system info** 这个是因为还未支持llama3吗?看README说已经支持了 **Additional context**
**Describe the bug**  When pulling Nemo mirror for training: 23.08.03 image, there is an issue with authentication required. **Steps/Code to reproduce bug** **Expected behavior** How to solve this bug?
参数设置疑问
很有价值的工作,但还有两个问题想请教一下: - Q1:Weight Decay和Gradient Clip与其他开源模型似乎不一致,一般Weight Decay设置为0.01,梯度裁减设置为1,请问贵团队这样设计的理由及物理含义是什么? - Q2:贵团队有打算开源预训练数据集吗? Thanks
如题
在tiktoken中调用什么方法可以获得词表
如何使用多卡进行编辑?使用--checkpointing 参数也会OOM,如13B 70B等
**Your question** Flash attention 3 has been updated, link: - Blogpost: https://tridao.me/blog/2024/flash3/ - Paper: https://tridao.me/publications/flash3/flash3.pdf Megatron's support for Flash attention 3 to improve training efficiency。FlashAttention-3 is optimized for Hopper GPUs...
- The error is as follows:  - The training parameters are set as follows: python3 -m verl.trainer.main_ppo \ data.train_files=$HOME/train.parquet \ data.val_files=$HOME/test.parquet \ data.train_batch_size=32 \ data.max_prompt_length=1024 \ data.max_response_length=1024 \ actor_rollout_ref.model.path=/private/online_llf/model/Qwen2.5-32B-Instruct...