Yezipiaomu
Yezipiaomu
also cannot reproduce the map @zhreshold , would mind get the dealt training optimizer stage? !
vgg model map 0.738 , Batchsize 32 , start lr at 0.004, and In four gpu
It's the freeze parameters that leads to the result ?!
77.3 map In Resnet50 data_shape 512, would mind tell me how to improve?
77.142 map in Vgg-reduced 512, lr = 0.004 batch_size =32 @zhreshold Also cannot reproduce the result.
Yes, I know, your makeloss layer use the valid normalization, which I support be rescaled by the 1/[num of valid] = 1/ [the sum of num of valid each sample]...
But the mxnet average the arg params of each devices, which implies division by len(ctx)