Rafi Ayub

Results 141 comments of Rafi Ayub

@noforit Yes based on those benchmarks I didn't observed a significant difference between packing with relative position encoding vs absolute. Actually, I realized some of the numbers in the PR...

Thanks for this PR! You are right that Gemma does not expose the output projection in the transformer decoder, which makes it hards to replace it with a linear classifier...

> why not just create a new checkpoint with randomly-initialized output projection and load it into e.g. a llama2-style builder? Yeah actually I'm in favor of this approach for the...

@Optimox let us know how we can help out with what you're trying to achieve!

@xingyaoww Looks like just a lint error which you should be able to fix with `pre-commit`

Thanks for opening this feature request. Indeed, this very thing is being worked on in #875. I am currently investigating how to make the sample masking work with flash attention...

Currently being worked on in #1115

covered by #1451

Hi @Titus-von-Koeller, it's been some time so pinging this thread again to check if you're still open to us adding a torchtune integration to your docs page?