Piotr Nawrot comments

Results 9 comments of


                                            Piotr Nawrot

Making nano chatgpt

We have released [nanoT5](https://github.com/PiotrNawrot/nanoT5) for pre-training and evaluating T5-style (Encoder-Decoder) models. You can use it to pre-train your own model in one day on a single GPU :).

BART Pretraining Script

We've released [nanoT5](https://github.com/PiotrNawrot/nanoT5) that reproduces T5-model (similar to BART) pre-training in PyTorch (not Flax). You can take a look! Any suggestions are more than welcome.

GPT with UNet architecture gets the loss down to ~1.0 with no significant computation costs.

Other relevant paper about hierarchical processing in Transformer decoder models would be [this one](https://arxiv.org/pdf/2211.09761.pdf)

Can not reproduce results by LLAMA-7B on OpenBook QA

+1, I'm getting exactly the same results

Can not reproduce results by LLAMA-7B on OpenBook QA

Hey @Kyriection - Thanks a lot for your response and extra clarification. I'm having one more issue with reproducing Figure 8 from the latest version of the paper. I followed...

Can not reproduce results by LLAMA-7B on OpenBook QA

Moreover I'm also having issues with reproducing Table 2 results from the paper for OPT-30B. Again I believe that I'm strictly following the commands from the README. It would be...

Can not reproduce results by LLAMA-7B on OpenBook QA

> "and for practical use, you can use the accumulation attention scores obtained from the whole prefilling stage" Did you use scores from prefilling stage for any of the downstream...

Can not reproduce results by LLAMA-7B on OpenBook QA

Yes, I understand - is this logic implemented somewhere in the code? Also, do you have any idea what could be the reason behind my suboptimal results?

Moving a WezTerm window between monitors causes erratic jumping and resizing of the window

This is still an issue for me as well!