Can I train this model with multiple GPUs?
Yes, you can do that by wrapping the model into nn.DataParallel. Check out the parallelism tutorials: 1, 2. (Currently the code in this repo does not use nn.DataParallel.)
nn.DataParallel