akashc1

Results 4 comments of akashc1

> I found this to be reproducible with the following settings: @zhanwenchen thanks for the pointer, could you please clarify how that comment addresses this issue? Are you proposing that...

@felipemello1 @RdoubleA thank you for the comments, can you please check the updated implementation? I set the correct default from the [HF model config](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct/blob/main/config.json#L18) for Llama 3.1 models in the...

@felipemello1 yes I understand that, however the transformer implementation [does throw an error if it gets a `seq_len` longer than it was expecting from init](https://github.com/pytorch/torchtune/blob/main/torchtune/modules/transformer.py#L528-L532). I've run into this when...

@felipemello1 @joecummings have a fix for the wandb one here: #2196 I'm happy to add the other changes to that PR too if you'd like, let me know! I definitely...