deep-learning-containers
deep-learning-containers copied to clipboard
[bug] TF imagenet performance script does not respect timeout
Checklist
- [ ] I've prepended issue tag with type of change: [bug]
- [ ] (If applicable) I've attached the script to reproduce the bug
- [ ] (If applicable) I've documented below the DLC image/dockerfile this relates to
- [ ] (If applicable) I've documented below the tests I've run on the DLC image
- [ ] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
- [ ] I've built my own container based off DLC (and I've attached the code used to build my own image)
Concise Description: The TF imagenet performance script run_tensorflow_training_performance_gpu_imagenet does not respect the timeout
timeout "$TRAINING_TIME" python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py .......
Even if the time goes over the supplied $TRAINING_TIME, the script is not able to exit. This leads to timeout on the test build in PR context.
Expected behavior:
The TF imagenet performance script run_tensorflow_training_performance_gpu_imagenet respects the timeout.