net2net icon indicating copy to clipboard operation
net2net copied to clipboard

GPU memory

Open joaanna opened this issue 5 years ago • 1 comments

Hi,

Thank you for your amazing work! I am trying to replicate your results and training using python translation.py --base configs/translation/sbert-to-biggan256.yaml -t --gpus 0, I was wondering what gpu was used to train your model and what batch size did you use? I am only able to fit batch_size=2 on a TITAN XP, the default batch_size in the config was 16 but I am not able to launch it using 4 TITANs XP without running into memory issues. Is the BigGan or Sentence Transformer fine-tuned during the training (from your paper it seems like it was not), do you have any insight on what am I missing?

Thank you in advance

joaanna avatar Jan 09 '21 17:01 joaanna

also when running the command, it seems like the model weights are not correctly loaded, I get Missing keys in state-dict: ['encoder.resnet.1.num_batches_tracked', 'encoder.resnet.4.0.bn1.num_batches_tracked', 'encoder.resnet.4.0.bn2.num_batches$ tracked', 'encoder.resnet.4.0.bn3.num_batches_tracked', 'encoder.resnet.4.0.downsample.1.num_batches_tracked', 'encoder.resnet.4.1.bn1.num_batches_tra$ ked', 'encoder.resnet.4.1.bn2.num_batches_tracked', 'encoder.resnet.4.1.bn3.num_batches_tracked', 'encoder.resnet.4.2.bn1.num_batches_tracked', 'encod$ r.resnet.4.2.bn2.num_batches_tracked', 'encoder.resnet.4.2.bn3.num_batches_tracked', 'encoder.resnet.5.0.bn1.num_batches_tracked', 'encoder.resnet.5.0$ bn2.num_batches_tracked', 'encoder.resnet.5.0.bn3.num_batches_tracked', 'encoder.resnet.5.0.downsample.1.num_batches_tracked', 'encoder.resnet.5.1.bn1$ num_batches_tracked', 'encoder.resnet.5.1.bn2.num_batches_tracked', 'encoder.resnet.5.1.bn3.num_batches_tracked', 'encoder.resnet.5.2.bn1.num_batches_$ racked', 'encoder.resnet.5.2.bn2.num_batches_tracked', 'encoder.resnet.5.2.bn3.num_batches_tracked', 'encoder.resnet.5.3.bn1.num_batches_tracked', 'en$ oder.resnet.5.3.bn2.num_batches_tracked', 'encoder.resnet.5.3.bn3.num_batches_tracked', 'encoder.resnet.6.0.bn1.num_batches_tracked', 'encoder.resnet.6 .0.bn2.num_batches_tracked', 'encoder.resnet.6.0.bn3.num_batches_tracked', 'encoder.resnet.6.0.downsample.1.num_batches_tracked', 'encoder.resnet.6.1.b n1.num_batches_tracked', 'encoder.resnet.6.1.bn2.num_batches_tracked', 'encoder.resnet.6.1.bn3.num_batches_tracked', 'encoder.resnet.6.2.bn1.num_batche s_tracked', 'encoder.resnet.6.2.bn2.num_batches_tracked', 'encoder.resnet.6.2.bn3.num_batches_tracked', 'encoder.resnet.6.3.bn1.num_batches_tracked', ' encoder.resnet.6.3.bn2.num_batches_tracked', 'encoder.resnet.6.3.bn3.num_batches_tracked', 'encoder.resnet.6.4.bn1.num_batches_tracked', 'encoder.resne t.6.4.bn2.num_batches_tracked', 'encoder.resnet.6.4.bn3.num_batches_tracked', 'encoder.resnet.6.5.bn1.num_batches_tracked', 'encoder.resnet.6.5.bn2.num _batches_tracked', 'encoder.resnet.6.5.bn3.num_batches_tracked', 'encoder.resnet.6.6.bn1.num_batches_tracked', 'encoder.resnet.6.6.bn2.num_batches_trac ked', 'encoder.resnet.6.6.bn3.num_batches_tracked', 'encoder.resnet.6.7.bn1.num_batches_tracked', 'encoder.resnet.6.7.bn2.num_batches_tracked', 'encode r.resnet.6.7.bn3.num_batches_tracked', 'encoder.resnet.6.8.bn1.num_batches_tracked', 'encoder.resnet.6.8.bn2.num_batches_tracked', 'encoder.resnet.6.8. bn3.num_batches_tracked', 'encoder.resnet.6.9.bn1.num_batches_tracked', 'encoder.resnet.6.9.bn2.num_batches_tracked', 'encoder.resnet.6.9.bn3.num_batch es_tracked', 'encoder.resnet.6.10.bn1.num_batches_tracked', 'encoder.resnet.6.10.bn2.num_batches_tracked', 'encoder.resnet.6.10.bn3.num_batches_tracked ', 'encoder.resnet.6.11.bn1.num_batches_tracked', 'encoder.resnet.6.11.bn2.num_batches_tracked', 'encoder.resnet.6.11.bn3.num_batches_tracked', 'encode r.resnet.6.12.bn1.num_batches_tracked', 'encoder.resnet.6.12.bn2.num_batches_tracked', 'encoder.resnet.6.12.bn3.num_batches_tracked', 'encoder.resnet.6 .13.bn1.num_batches_tracked', 'encoder.resnet.6.13.bn2.num_batches_tracked', 'encoder.resnet.6.13.bn3.num_batches_tracked', 'encoder.resnet.6.14.bn1.nu m_batches_tracked', 'encoder.resnet.6.14.bn2.num_batches_tracked', 'encoder.resnet.6.14.bn3.num_batches_tracked', 'encoder.resnet.6.15.bn1.num_batches_ tracked', 'encoder.resnet.6.15.bn2.num_batches_tracked', 'encoder.resnet.6.15.bn3.num_batches_tracked', 'encoder.resnet.6.16.bn1.num_batches_tracked', 'encoder.resnet.6.16.bn2.num_batches_tracked', 'encoder.resnet.6.16.bn3.num_batches_tracked', 'encoder.resnet.6.17.bn1.num_batches_tracked', 'encoder.r esnet.6.17.bn2.num_batches_tracked', 'encoder.resnet.6.17.bn3.num_batches_tracked', 'encoder.resnet.6.18.bn1.num_batches_tracked', 'encoder.resnet.6.18 .bn2.num_batches_tracked', 'encoder.resnet.6.18.bn3.num_batches_tracked', 'encoder.resnet.6.19.bn1.num_batches_tracked', 'encoder.resnet.6.19.bn2.num_b atches_tracked', 'encoder.resnet.6.19.bn3.num_batches_tracked', 'encoder.resnet.6.20.bn1.num_batches_tracked', 'encoder.resnet.6.20.bn2.num_batches_tra cked', 'encoder.resnet.6.20.bn3.num_batches_tracked', 'encoder.resnet.6.21.bn1.num_batches_tracked', 'encoder.resnet.6.21.bn2.num_batches_tracked', 'en coder.resnet.6.21.bn3.num_batches_tracked', 'encoder.resnet.6.22.bn1.num_batches_tracked', 'encoder.resnet.6.22.bn2.num_batches_tracked', 'encoder.resn et.6.22.bn3.num_batches_tracked', 'encoder.resnet.7.0.bn1.num_batches_tracked', 'encoder.resnet.7.0.bn2.num_batches_tracked', 'encoder.resnet.7.0.bn3.n um_batches_tracked', 'encoder.resnet.7.0.downsample.1.num_batches_tracked', 'encoder.resnet.7.1.bn1.num_batches_tracked', 'encoder.resnet.7.1.bn2.num_b atches_tracked', 'encoder.resnet.7.1.bn3.num_batches_tracked', 'encoder.resnet.7.2.bn1.num_batches_tracked', 'encoder.resnet.7.2.bn2.num_batches_tracke d', 'encoder.resnet.7.2.bn3.num_batches_tracked']

joaanna avatar Jan 09 '21 19:01 joaanna