JarvisKevin issues

Results 2 issues of


                                            JarvisKevin

Hi @VeritasYin, why you pad x with zeros, rather than employ 1*1 convoution, when the size of input channel is less than the output?

lr=0.0001*(samples_per_gpu/2)*num_nodes 请问这个不考虑单机多卡吗？我看num_nodes是”for multi-machine parallel training“ 多机的数量？