Vincent Hellendoorn

Results 15 comments of Vincent Hellendoorn

Hi, that's a surprising error: it looks like the model is trying to predict a token (index 50,269) that is outside of its vocabulary (sized 50,267). That is technically possible...

Hi Aftab, Thanks for submitting this issue. It took a while to debug; the main thing I have found so far is that I can run this just fine on...

Hi, that's great to hear. The basic steps should be the following: 1. Download a checkpoint and convert it to the HuggingFace format. [This PR](https://github.com/EleutherAI/gpt-neox/pull/480) contains a file named [`convert_to_huggingface.py`](https://github.com/EleutherAI/gpt-neox/pull/480/files#diff-503107e2e8659542f2aca1df0f1ba8fbff76845eac37cc1c867c91f5b6d41d27)...

Yes, thanks @NinedayWang! I'll try it out as soon as I have some time. In terms of next steps: if this just works with the HF classes, which it sounds...

Hi, a few others have had this error. It is typically either an out-of-memory issue or a matter of a mismatch between the CUDA version within and outside the container....

Sounds good! No problem; I am using it in my fork for now. I also just realized the inital PR version had a wrong condition that I'd fixed locally (hence...

Hi, the repository we used to parse Python code and generate program graph has been open-sourced [here](https://github.com/google-research/python-graphs). This won't output samples in exactly the same format as in this dataset,...

Sounds good. FWIW, I just noticed that this PR messes with the printed loss [here](https://github.com/karpathy/nanoGPT/blob/master/train.py#L249) because each loss term is normalized. One obvious fix is to scale that loss back...

Great, glad I could help! Minor note: I realized on my own end that the number of eval steps is also affected by this, in that it now refers to...

This is useful info! I hadn't used DDP yet (training a sweep of smaller models instead), but it's nice that the sync overhead becomes neglible with more accumulation steps. I...