gramesh-amd

Results 11 comments of gramesh-amd

@ZhiyuLi-goog thanks again for your help with other issues. Do you see any problems with the config or know why the loss is much higher?

with attention: "dot_product" : completed step: 4000, seconds: 91.772, TFLOP/s/device: 24.021, Tokens/s/device: 22.316, total_weights: 65504, loss: 7.644, perplexity: 2088.295 To see full metrics 'tensorboard --logdir=/ckpts/paxml/gpt3-conversion/gpt3-conversion/tensorboard/' completed step: 4001, seconds: 39.677,...

I tested these out. First running ``` python3 MaxText/train.py MaxText/configs/base.yml run_name="${RUNNAME}" model_name=gpt3-175b ``` and then also adding the other relevant flags you posted one by one and all of them...

[maxtext_gpt3_logs.txt](https://github.com/user-attachments/files/17023063/maxtext_gpt3_logs.txt) Thanks. Here are the logs

Thanks for checking yeah its strange that its starting with a bad loss. I also tried testing the tokenizer and it also seems fine

Tried the weight_dtype as float32 as well. Same problem im wondering if we can send you our converted ckpt for you to load and verify its an ckpt problem?

im not sure if it will be useful. We also loaded the pax ckpt directly in paxml and the ckpt starts at the right loss. So at this point, we...

great we will share the converted ckpt and the conversion logs. Do you have a gcloud bucket that i could push it to? or do you recommend some other way?

ok, let me do that We tried both versions and with both, we are getting the same problem

We have created the bucket and will share the access with you soon (I got your google email from one of your [commits](https://github.com/mlcommons/training_results_v4.0/commit/62f111b7690f163b269f32e4f93dcaaa13717c9c))