BigGAN-PyTorch icon indicating copy to clipboard operation
BigGAN-PyTorch copied to clipboard

IndexError: tuple index out of range

Open JanineCHEN opened this issue 5 years ago • 4 comments

Hey, I am a student trying to reproduce the training process using my own dataset. I got the following error right after the first Epoch of training is finished:

Traceback (most recent call last):
  File "train.py", line 227, in <module>
    main()
  File "train.py", line 224, in main
    run(config)
  File "train.py", line 184, in run
    metrics = train(x, y)
  File "/home/projects/BIGGAN/train_fns.py", line 41, in train
    x[counter], y[counter], train_G=False, 
IndexError: tuple index out of range

I execute the training using sh scripts/launch_BigGAN_bs256x8.sh with my own dataset, the dataset has been transformed into HDF5 format without any error. The content of launch_BigGAN_bs256x8.sh I used:

#!/bin/bash
python train.py \
--dataset I128_hdf5 --parallel --shuffle  --num_workers 8 --batch_size 128 --load_in_mem  \
--num_G_accumulations 16 --num_D_accumulations 16 \
--num_D_steps 1 --G_lr 1e-4 --D_lr 4e-4 --D_B2 0.999 --G_B2 0.999 \
--G_attn 64 --D_attn 64 \
--G_nl inplace_relu --D_nl inplace_relu \
--SN_eps 1e-6 --BN_eps 1e-5 --adam_eps 1e-6 \
--G_ortho 0.0 \
--G_shared \
--G_init ortho --D_init ortho \
--hier --dim_z 120 --shared_dim 128 \
--G_eval_mode \
--which_best FID \
--G_ch 32 --D_ch 32 \
--ema --use_ema --ema_start 20000 \
--test_every 200 --save_every 100 --num_best_copies 5 --num_save_copies 2 --seed 0 \
--use_multiepoch_sampler \

I am not sure if this has something to do with the size of my dataset or number of classes? If so, how could I adjust the parameters? Or any other idea why this issue comes into place and how to tackle it? Any help would be very much appreciated! Thanks a bunch in advance.

JanineCHEN avatar Sep 15 '20 04:09 JanineCHEN

What was the solution?

Baran-phys avatar Sep 21 '20 17:09 Baran-phys

What was the solution?

Hi, it was the residual batch that caused the problem, you can either drop_last when constructing the dataloader or increase the number of epochs to avoid using the last batch.

JanineCHEN avatar Sep 23 '20 16:09 JanineCHEN

Well, neither of them were solving this error on my side. I get this error when I use num_G_accumulations or num_D_accumulations more than 2.

Baran-phys avatar Oct 27 '20 07:10 Baran-phys

I use drop_last and it works. I am using 4 GPU and batch size 52.

datduong avatar Nov 26 '20 03:11 datduong