Ashvini Jindal
Ashvini Jindal
Hi @mmliang , apologies for the late response. Have you already made changes to existing code-base to predict arc_label as well. Few things that will change are data reading, the...
Hi @shihe123 , Apologies for the late response. Model will be saved under `data/params_*` Please have a look at method: `def compute_dependencies()`
Hi @yohan-pg , on 1x 4090 GPU, I am getting 29% MFU while training GPT-2 124M model. What is your GPU setup?
I am also seeing similar issue where loss is trending downwards but quite unstable and it seems to learn very slowly. I am running full fine-tuning of latest Phi2 model...
Hi @vgoklani , I am running CUDA 11.8 (nvidia-smi shows maximum supported CUDA version rather than actual version being used). I installed pytorch 2.0 via conda with `python 3.9` ...
Hi @rsliu94 , thank you for pointing this out. I was able to train `shakespeare_char` in 3 minutes on 1x 4090 GPU after removing the changes from commit.
Hi @darien-schettler , I am running on 1x 4090 GPU and using pytorch 2.0 nighty build. Running `python train.py config/train_shakespeare_char.py` with default parameters: > eval_interval = 250 # keep frequent...
@otaviogood is your 5.5 minutes in above logs measured on 1x A100 or 8x A100? May be I misunderstood but Andrej's 3 mins training time was on 8x A100 and...
Sorry for typo I meant 1x A100. My primary question was around number of iterations since it is not mentioned in Readme. @otaviogood I was using flash attention before and...
@otaviogood it makes lot of sense now! After reverting your commit changes, I am able to run full training in 2.5 minutes on 1x 4090. Like you said, it is...