Piotr Nawrot

Results 9 comments of Piotr Nawrot

We have released [nanoT5](https://github.com/PiotrNawrot/nanoT5) for pre-training and evaluating T5-style (Encoder-Decoder) models. You can use it to pre-train your own model in one day on a single GPU :).

We've released [nanoT5](https://github.com/PiotrNawrot/nanoT5) that reproduces T5-model (similar to BART) pre-training in PyTorch (not Flax). You can take a look! Any suggestions are more than welcome.

Other relevant paper about hierarchical processing in Transformer decoder models would be [this one](https://arxiv.org/pdf/2211.09761.pdf)

+1, I'm getting exactly the same results

Hey @Kyriection - Thanks a lot for your response and extra clarification. I'm having one more issue with reproducing Figure 8 from the latest version of the paper. I followed...

Moreover I'm also having issues with reproducing Table 2 results from the paper for OPT-30B. Again I believe that I'm strictly following the commands from the README. It would be...

> "and for practical use, you can use the accumulation attention scores obtained from the whole prefilling stage" Did you use scores from prefilling stage for any of the downstream...

Yes, I understand - is this logic implemented somewhere in the code? Also, do you have any idea what could be the reason behind my suboptimal results?