Guillaume Lample
Guillaume Lample
Mmm I never tried on Windows. In the `run.sh` script there is `$PWD` at the beginning of the model location, but this may not work on Windows as it works...
looks like some people have been complaining about the link. it will need more seeders before we can merge into main
if by "lm head" you are referring to the output layer on top of the transformer (the `Linear(hidden_dim, vocab_size)`), then no, they are not shared with the input word embeddings.
It was trained with 2048 tokens, so you can use up to that. If you want to use more tokens, you will need to fine-tune the model so that it...
Yes, next versions will come with multi-query.