Training time for ImageNet and CIFAR on 4xV100 GPUs
Hi, there! May I ask how long would the training normally takes for training on the CIFAR10 and ImageNet? I'm using 4x 16GB V100 GPUs
I used the following settings
export OPENAI_LOGDIR="improved_diffusion" MODEL_FLAGS="--image_size 64 --num_channels 192 --num_res_blocks 3 --learn_sigma True --class_cond True" DIFFUSION_FLAGS="--diffusion_steps 4000 --noise_schedule cosine --rescale_learned_sigmas False --rescale_timesteps False" TRAIN_FLAGS="--lr 3e-4 --batch_size 256 --microbatch 16" NUM_GPUS=4 DATA_DIR="cifar_train/" CUDA_VISIBLE_DEVICES=0,1,2,3 mpiexec -n $NUM_GPUS python scripts/image_train.py --data_dir $DATA_DIR $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS
and it outputs 3 lines
Logging to improved_diffusion/ creating model and diffusion... creating data loader... training...
and stays like this forever without new output.
Hello, How did you setup your hyperparameter? or where can i find them? I trying to setup mine to train the model Thanks.
Hello, I met the same problem, have you already solved it? Thanks.