JarvisKevin
Results
2
issues of
JarvisKevin
Hi @VeritasYin, why you pad x with zeros, rather than employ 1*1 convoution, when the size of input channel is less than the output?
lr=0.0001*(samples_per_gpu/2)*num_nodes 请问这个不考虑单机多卡吗?我看num_nodes是”for multi-machine parallel training“ 多机的数量?