Guillaume Lample comments

Results 15 comments of


                                            Guillaume Lample

No such file or directory, confusing concatenation?

Mmm I never tried on Windows. In the `run.sh` script there is `$PWD` at the beginning of the model location, but this may not work on Windows as it works...

Save bandwidth by using a torrent to distribute more efficiently

looks like some people have been complaining about the link. it will need more seeders before we can merge into main

Are the weights of the lm head of the model tied with the word embeddings?

if by "lm head" you are referring to the output layer on top of the transformer (the `Linear(hidden_dim, vocab_size)`), then no, they are not shared with the input word embeddings.

Inquiry about the maximum number of tokens that Llama can handle

It was trained with 2048 tokens, so you can use up to that. If you want to use more tokens, you will need to fine-tune the model so that it...

Multi-query attention

Yes, next versions will come with multi-query.