Sharath TS

Results 13 comments of Sharath TS

The small model might work if you can come up with a config for it based on the model parameters. A PR would definitely be appreciated. Feel free to make...

You only need to extract the generator or discriminator from the pretrained checkpoint for conversion. Typically the discriminator. Follow steps listed here to extract the individual parts. 1. Step 6...

you don't have to modify the script. When `mode="prediction"`, only predictons are output. When `mode="train eval"` or `mode="eval"`, metrics are computed and output.

Are you trying to build on ARM? This repo doesn't support ARM currently.

For containers > 21.11, you will need to add an additional `torch._C._jit_set_autocast_mode(True)` [here](https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/LanguageModeling/BERT/run_pretraining.py#L59) Also note that in the image shared, batchsize/gpu = 512 for phase1 and batchsize/gpu = 56 for...

The benchmarking script does not set frequency, nor any system setting. This is something you will have to ensure if running as expected. Can you match the reported performance on...

It's not a bug in the code, but an artifact of the run that is happening. In your log, after step 204000, no actual training step (weight update) has happened...

Whenever count of skipped_steps increases, that means the loss scaler gets divided by 2. The training is progressing as long as skipped_steps doesn't increase, but Training_Iteration step count increases, as...

This seems like an incorrect checkpoint load, few keys are missing or mismatch. Could you inspect the weight names in the model and the checkpoint?

The amount of training is determined by the parameters `max_steps` and `num_train_epochs`, whichever is minimum. The former defaults to -1. The throughput computation accounts for these parameters in the `if...