Tongcheng Li

Results 14 comments of Tongcheng Li

Hello @shamangary , regarding the memory cost of feature maps, currently we have a Caffe implementation which trys to address the memory hungry problem (listed under much more spatial efficient...

@liuzhuang13 @shicai I think there might be differences in the EMA procedure of BN between torch and BN. The default cudnn-torch BN have momentum parameter = 0.1 for EMA of...

@jiangxuehan Hi! Actually because in my implementation of the model I can specify an entire DenseBlock (tens of transitions) as one layer, so the entire DenseBlock was manually created by...

@jiangxuehan Thanks for pointing out! I currently have the same result, which is about 0.8% lower than torch counterpart. This is actually a known issue: https://github.com/liuzhuang13/DenseNet/issues/10 . In my caffe,...

@jiangxuehan It turns out caffe's datalayer is feeding data without permutation, now I add a flag to permute the data, which turns the accuracy to 95.2%

@jiangxuehan Currently I have no definitive conclusion of the remaining 0.3% divergence, but there are several hypothesis: (1) Source of randomness: besides different random seed, one additional source of randomness...

@jiangxuehan Also, I think my datalayer with random option should be superior than the default imageDataLayer implementation because ImageDataLayer did the shuffling on a vector of Datum, which are quite...

Hi @John1231983 , the torch version's use cudnn version of BatchNormalization, which already includes the scale layer in the function, so in my version of caffe's modified BatchNorm, there is...

Hello @WenzhMicrosoft , I am not sure what is the question we are having, but my understanding is that during layer initialization, it only reads the configuration from NeuralNetwork's proto...

Hello @GuohongWu , good question: for DenseNet-C, it is coded as a ConvolutionLayer in .prototxt whose numOutput is smaller. For DenseNet-B, we implicitly assume that the bottleneck channel = 4*growthRate,...