Chinthaka
Chinthaka
dump_tensorflow_weights.py works well with the downloaded ssdlite_mobilenet_v2_coco_2018_05_09 model. But stuck at converting it to caffe model using the load_caffe_weights.py script Traceback (most recent call last): File "load_caffe_weights.py", line 82, in...
To train much larger model variations (2B, 7B, etc), we need larger GPU memory allocations for parameters, optimizer states, and gradients. [Zero Redundancy Optimizer](https://www.deepspeed.ai/tutorials/zero/) introduce the methodology to shard these...
mpirun with multiple GPUs is hanging after `allocated 474 MiB for master copy of params` Most probably due to the introduction of cudastreams. @karpathy @PeterZhizhin
Fix for #369
Scheduling jobs using Slurm seems much easier in a multi-node training setup compared to setting up MPI for the cluster. This draft contains the changes to use mpirun for single-node...
Additional feature to checkpoint optimizer state and model parameters using a non blocking background thread. Memcopy device buffers to pined host buffer in one shot and let the background thread...
When number of process is higher eval dataloader goes out of bound when processing 10042 hellswag samples. Recreated the issue with debug printfs using 200 process. Samples per proc become...
Just an additional script to visualize and track metrics in realtime using wandb. This will be useful when have longer training runs and multinode training lasting many hours. **Metric graphs...