Thirvin

Results 1 comments of Thirvin

I train Infini-llama with arxiv-paper. The result is alike to yours. It can't handle the attention compressed in memory. Its outputs have little relation to the content I prvided.