Anna Shors comments

Results 19 comments of


                                            Anna Shors

[QUESTION] Sample idx, bin files in public domain for trying out pretrain_gpt.py?

Hi, we've made some changes to Megatron recently to remove the required dependency on Transformer Engine. You should no longer need to install Transformer Engine to run this script. The...

[NeMo-UX] Support `load_strictness`

> > If only we can propagate this flag to dist_checkpointing.load then using this flag would be ideal > > Yeah, an example of how PL does that can be...

ERROR: Failed building wheel for fasttext, ERROR: Could not build wheels for fasttext, which is required to install pyproject.toml-based projects

Hi, could you share the exact error you're seeing?

70B GRPO training is slower than reported

Hi, I just rebuilt the Dockerfile on the v0.3.1 branch and reran the Megatron 70b experiment, and my results matched with what is reported in the blog. Could you share...

Cosine Decay Not Properly Enforced

Hi, the cosine decay will happen across the length of the training run and depends on `max_num_steps`. If `max_num_steps` is large, the decay might happen very slowly. Could you try...

Cosine Decay Not Properly Enforced

Cosine decay seems to be working as expected for me. Here's my config: ``` sft: max_num_steps=60 policy: megatron_cfg: optimizer: lr: 5.0e-6 min_lr: 5e-9 ... scheduler: start_weight_decay: ${policy.megatron_cfg.optimizer.weight_decay} end_weight_decay: ${policy.megatron_cfg.optimizer.weight_decay} weight_decay_incr_style:...

Anna Shors

[QUESTION] Sample idx, bin files in public domain for trying out pretrain_gpt.py?

[NeMo-UX] Support `load_strictness`

ERROR: Failed building wheel for fasttext, ERROR: Could not build wheels for fasttext, which is required to install pyproject.toml-based projects

70B GRPO training is slower than reported

Cosine Decay Not Properly Enforced

Cosine Decay Not Properly Enforced

fix: add tooltip, fix get wrong percentage, change cursor when not allowed

fix: add tooltip, fix get wrong percentage, change cursor when not allowed

fix: add tooltip, fix get wrong percentage, change cursor when not allowed

pruning-distillation guidance or docs for Multimodal model