Samuel Kriman issues

Repositories
Issues
Comments

Results 2 issues of


                                            Samuel Kriman

Number of training steps

I have been trying to replicate the results from the paper, but I'm confused about the number of training steps. The paper mentions 240k steps, but when running this code...

Issue with only adding sink tokens in cache

It seems that in this implementation you are only adding the "sink" token to the cache, and not using in the original forward pass, so if you are using windowed...