Sharath TS comments

Results 13 comments of


                                            Sharath TS

Electra-small pretraining

The small model might work if you can come up with a config for it based on the model parameters. A PR would definitely be appreciated. Feel free to make...

[Electra] Convert Electra pretrained checkpoint into Huggingface pytorch model

You only need to extract the generator or discriminator from the pretrained checkpoint for conversion. Typically the discriminator. Follow steps listed here to extract the individual parts. 1. Step 6...

[BERT/PyTorch] How to get accuracy in prediciton mode?

you don't have to modify the script. When `mode="prediction"`, only predictons are output. When `mode="train eval"` or `mode="eval"`, metrics are computed and output.

[ELECTRA / TF2] docker build error: returned a non-zero code 1

Are you trying to build on ARM? This repo doesn't support ARM currently.

[BERT/PyTorch] Unable to reproduce bert benchmark under A100

For containers > 21.11, you will need to add an additional `torch._C._jit_set_autocast_mode(True)` [here](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/run_pretraining.py#L59) Also note that in the image shared, batchsize/gpu = 512 for phase1 and batchsize/gpu = 56 for...

[BERT/PyTorch] Unable to reproduce bert benchmark under A100

The benchmarking script does not set frequency, nor any system setting. This is something you will have to ensure if running as expected. Can you match the reported performance on...

[Bert/Pytorch] During pretraining, checkpoints won't be saved automatically.

It's not a bug in the code, but an artifact of the run that is happening. In your log, after step 204000, no actual training step (weight update) has happened...

[Bert/Pytorch] During pretraining, checkpoints won't be saved automatically.

Whenever count of skipped_steps increases, that means the loss scaler gets divided by 2. The training is progressing as long as skipped_steps doesn't increase, but Training_Iteration step count increases, as...

[BERT/Pytorch] Same Results for run_glue.py

This seems like an incorrect checkpoint load, few keys are missing or mismatch. Could you inspect the weight names in the model and the checkpoint?

Unable to understand throughput calculation

The amount of training is determined by the parameters `max_steps` and `num_train_epochs`, whichever is minimum. The former defaults to -1. The throughput computation accounts for these parameters in the `if...