Vincent Hellendoorn
Vincent Hellendoorn
Hi, that's a surprising error: it looks like the model is trying to predict a token (index 50,269) that is outside of its vocabulary (sized 50,267). That is technically possible...
Hi Aftab, Thanks for submitting this issue. It took a while to debug; the main thing I have found so far is that I can run this just fine on...
Hi, that's great to hear. The basic steps should be the following: 1. Download a checkpoint and convert it to the HuggingFace format. [This PR](https://github.com/EleutherAI/gpt-neox/pull/480) contains a file named [`convert_to_huggingface.py`](https://github.com/EleutherAI/gpt-neox/pull/480/files#diff-503107e2e8659542f2aca1df0f1ba8fbff76845eac37cc1c867c91f5b6d41d27)...
Yes, thanks @NinedayWang! I'll try it out as soon as I have some time. In terms of next steps: if this just works with the HF classes, which it sounds...
Hi, a few others have had this error. It is typically either an out-of-memory issue or a matter of a mismatch between the CUDA version within and outside the container....
Sounds good! No problem; I am using it in my fork for now. I also just realized the inital PR version had a wrong condition that I'd fixed locally (hence...
Hi, the repository we used to parse Python code and generate program graph has been open-sourced [here](https://github.com/google-research/python-graphs). This won't output samples in exactly the same format as in this dataset,...
Sounds good. FWIW, I just noticed that this PR messes with the printed loss [here](https://github.com/karpathy/nanoGPT/blob/master/train.py#L249) because each loss term is normalized. One obvious fix is to scale that loss back...
Great, glad I could help! Minor note: I realized on my own end that the number of eval steps is also affected by this, in that it now refers to...
This is useful info! I hadn't used DDP yet (training a sweep of smaller models instead), but it's nice that the sync overhead becomes neglible with more accumulation steps. I...