@winstywang Thanks for this great work! I tried it on 4 Titan x with Inception-BN.conf. In the case of the 4490 categories, train 40 batch cost 43 sec. However, in the case of 44900 categories, it cost 200 sec.

May 27 '15 11:05 chengchengowen

This makes sense, because you need a much larger connection in the last fc layer in 44900 categories.

May 27 '15 11:05 winstywang

@winstywang I also tried it on caffe with googlenet(without batch normalization), 4490 categories and 44900 categories would take 20 sec and 32 sec to train 40 batch respectively. There is little difference between the two case.

May 28 '15 01:05 chengchengowen

It is a bit hard to know which part causes this issue only according to your description.... Could you pass me a minimum set of samples could reproduce this issue?

May 28 '15 10:05 winstywang

Although my CUDA GPU is GTX970, the training for 40*40 color image with batch size 32 is very slow ,it cost much more and stopped in round 0: " round 0:[ 2000]246 sec escaped"

training iterator

data = train iter = img image_list = "./image_list_train.txt" image_root = "./data2/train/" input_flat = 0 divideby = 256 shuffle = 0 iter = end

evaluation iterator

eval = test iter = img input_flat = 0 image_list = "./image_list_test.txt" image_root = "./data2/test/" divideby = 256 shuffle = 0 iter = end

global parameters

label_width = 10 label_vec[0,10) = landmarks

netconfig=start #3_40_40 layer[0->1] = conv:cv1 kernel_size = 5 nchannel = 30 stride = 2 layer[1->2] = relu:relu1 layer[2->3] = max_pooling:mp1 kernel_size = 2 stride = 2

#30_18_18 layer[3->4] = conv:cv2 kernel_size = 3 nchannel = 30 no_bias=0 layer[4->5] = relu:relu2 layer[5->6] = max_pooling:mp2 kernel_size = 2 stride = 2

layer[6->7] = flatten #layer[7->7] = dropout

threshold = 0.5

layer[7->8] = fullc:fc1 nhidden = 100 init_sigma = 0.01 layer[8->9] = sigmoid:se1 layer[9->10] = fullc:fc2 nhidden = 10 init_sigma = 0.01 layer[10->10] = l2_loss target = landmarks netconfig=end

input shape not including batch

input_shape = 3,40,40 batch_size = 32

global parameters

dev = gpu save_model = 100 max_round = 15 num_round = 15 train_eval = 1 random_type = xavier #random_type = gaussian

learning parameters

eta = 0.1 momentum = 0.9 wd = 0.0

evaluation metric

metric = error eval_train = 1

end of config

May 28 '15 15:05 wguo68

I wish you could provide more details for us to reproduce this issue...

May 28 '15 15:05 winstywang

I changed the conf and updated the details in last post. Now round 0:[ 2400]300 sec escaped [1], train-error : 1 test-error:1 round 1:[ 2400]586 sec escaped [2], train-error : 1 test-error:1

It must be something wrong. My train_image_list like this: 1 0.0851194 0.150562 0.130048 0.10518 0.160999 0.0888869 0.0784231 0.135541 0.167709 0.159056 Aaron_Eckhart_0001_0000000.jpg 2 0.149247 0.0820331 0.105284 0.131967 0.0746481 0.0844444 0.0764162 0.133879 0.165702 0.159134 Aaron_Eckhart_0001_0000001.jpg 3 0.174822 0.106676 0.131367 0.159006 0.100897 0.0902925 0.0835475 0.141192 0.172833 0.167361 Aaron_Eckhart_0001_0000002.jpg 4

May 28 '15 15:05 wguo68

Is your problem related to @chengchengowen's problem? Seems you only have 10 classes in total.

May 28 '15 15:05 winstywang

It may not related to @chengchengowen's problem. Just bacause the training is slow and didn't converge ,so I post here. My problem is multi label regression. I am tired and will go to sleep now.

May 28 '15 15:05 wguo68

First, I am not sure about the speed of GTX970. 260pic/s seems reasonable to me. To diagnose the problem, you need first to check IO is not the bottleneck, since you are using img list iterator. You can check the GPU usage by nvidia-smi to see whether GPU is fully occupied.

Second, if the network does not converge, you should first try a smaller learning rate to see whether it helps. Since I cannot access to your data, I cannot know what is the exact problem.

May 28 '15 16:05 winstywang

I am using your old version CXXNET the time you release the multi label traning doc. For new version CXXNET, start the traning will crash at the code line " net_trainer->Update(itr_train->Value());" for the first round 0. I use the nvidia-si to check if CUDA is used, it seems the CUDA is not used. The utilization of GPU are all N/A when I am running cxxnet for training: ..... Utilization: GPU N/A Memory N/A Encoder N/A Decoder N/A .... What's the problem?

May 29 '15 01:05 wguo68

@winstywang I am sorry, our database is non-public. I suggest that random generate some labels form imagenet to reproduce this issue.

May 29 '15 01:05 chengchengowen

@chengchengowen Stay tuned. We will try full imagenet in the following month.

May 29 '15 01:05 winstywang

It is slow, when the category is big

training iterator

evaluation iterator

global parameters

threshold = 0.5

input shape not including batch

global parameters

learning parameters

evaluation metric

end of config