Prashant Tandon

Results 4 comments of Prashant Tandon

I've encountered the same error, tried it on linux as well as windows

@hkproj I think the implementation deviates from the architecture proposed in the paper. The paper states that normalization is applied after each sublayer i.e. there is the output of the...

@dhantule @laxmareddyp I'd like to work on adding Magistral.

> [@dhantule](https://github.com/dhantule) [@laxmareddyp](https://github.com/laxmareddyp) I'd like to work on adding Magistral. Please confirm if these are the appropriate references - Paper link - [Magistral](https://arxiv.org/pdf/2506.10910) - HF link for the model - ...