Karan Jariwala issues

Results 4 issues of


                                            Karan Jariwala

Sharding of data for Horovod distributed training for SSD and YOLOv3

- Data was not sharded across GPUs when running Horovod distributed Training. This PR will fix that issue. - Fixed the issue with the `validation` condition where it was doing...

Keras+Tensorflow Benchmark on Synthetic LSTM Dataset

Hi, I am running the `lstm_benchmark.py` test on CPU and multi GPU device(Amazon EC2) and I am not getting scaling as expected. Below are the pieces of information: **Instance:** P3.8xLarge(Amazon...

Moved local directory creation and existence check from CloudUploader to Writer class

## Description of changes: - Moved local directory creation and existence check from CloudUploader to Writer class ## Issue #, if available: ## Merge Checklist: _Put an `x` without space...

CUDA initialization error at raw_delete when crossing epoch boundary

**Environment** - OS: [Ubuntu 20.04] - Hardware (GPU, or instance type): [A100] >= 2 GPUs **To reproduce** Steps to reproduce the behavior: When trying to run [examples/bert mlm training](https://github.com/mosaicml/examples/tree/main/examples/bert#mlm-pre-training) (using...

bug