Thirvin comments

Repositories
Issues
Comments

Results 1 comments of


                                            Thirvin

Model loses information very quickly

I train Infini-llama with arxiv-paper. The result is alike to yours. It can't handle the attention compressed in memory. Its outputs have little relation to the content I prvided.