py-R-FCN Very low test accuracy using OHEM.

Hi, I have used the R-FCN to train my data and got normal accuracy. But when I trained using R-FCN with OHEM, I got the training accuracy and loss like this:

I1122 22:07:07.366756  5303 solver.cpp:228]     Iteration 108240, loss = 0.142503
I1122 22:07:07.366806  5303 solver.cpp:244]     Train net output #0: accuarcy = 1
I1122 22:07:07.366821  5303 solver.cpp:244]     Train net output #1: loss_bbox = 0 (* 1 = 0 loss)
I1122 22:07:07.366824  5303 solver.cpp:244]     Train net output #2: loss_cls = 3.44035e-06 (* 1 = 3.44035e-06 loss)
I1122 22:07:07.366829  5303 solver.cpp:244]     Train net output #3: rpn_cls_loss = 0.00108962 (* 1 = 0.00108962 loss)
I1122 22:07:07.366834  5303 solver.cpp:244]     Train net output #4: rpn_loss_bbox = 0.201997 (* 1 = 0.201997 loss)
I1122 22:07:07.366838  5303 sgd_solver.cpp:106] Iteration 108240, lr = 0.0001
I1122 22:07:16.283577  5303 solver.cpp:228]     Iteration 108260, loss = 0.221437
I1122 22:07:16.283648  5303 solver.cpp:244]     Train net output #0: accuarcy = 1
I1122 22:07:16.283655  5303 solver.cpp:244]     Train net output #1: loss_bbox = 0 (* 1 = 0 loss)
I1122 22:07:16.283660  5303 solver.cpp:244]     Train net output #2: loss_cls = 1.05492e-05 (* 1 = 1.05492e-05 loss)
I1122 22:07:16.283664  5303 solver.cpp:244]     Train net output #3: rpn_cls_loss = 0.00136944 (* 1 = 0.00136944 loss)
I1122 22:07:16.283668  5303 solver.cpp:244]     Train net output #4: rpn_loss_bbox = 0.317057 (* 1 = 0.317057 loss)
I1122 22:07:16.283674  5303 sgd_solver.cpp:106] Iteration 108260, lr = 0.0001
I1122 22:07:25.176756  5303 solver.cpp:228]     Iteration 108280, loss = 0.123934
I1122 22:07:25.176801  5303 solver.cpp:244]     Train net output #0: accuarcy = 1
I1122 22:07:25.176810  5303 solver.cpp:244]     Train net output #1: loss_bbox = 0 (* 1 = 0 loss)
I1122 22:07:25.176815  5303 solver.cpp:244]     Train net output #2: loss_cls = 3.25407e-06 (* 1 = 3.25407e-06 loss)
I1122 22:07:25.176818  5303 solver.cpp:244]     Train net output #3: rpn_cls_loss = 0.00123868 (* 1 = 0.00123868 loss) 
I1122 22:07:25.176822  5303 solver.cpp:244]     Train net output #4: rpn_loss_bbox = 0.030079 (* 1 = 0.030079 loss)

and the final testing accuracy was also very low:

Results:
0.065
0.006
0.010
0.003
0.021

To train the OHEM, I followed the instruction that changing the batch_size in config.py to -1. I want to know what's happened and how can I fix it. Is there any other changes I omited? @YuwenXiong Really need your kind help. Welcome suggestions from all friends on the same boat. Thank you very much!

Nov 23 '17 02:11 MissDores

loss_bbox = 0 means all the boxes are back-ground boxes. Your data might be extremely unbalanced.

Nov 23 '17 05:11 YuwenXiong

Hi, Yuwen, thanks for your quick comment. In fact what I used is a publicly available vehicle dataset, it seems not existing the problem of more back-ground objects. What's more, training process without OHEM performs normally. I observed that in /py-R-FCN/experiments/cfgs/rfcn_end2end_ohem.yml, the batch_size has been set -1. Do I still need to change batch_size in /py-R-FCN/lib/fast_rcnn/config.py to -1? I have done this operation, but with no more other changes in other files. What other problems would you imagine? @YuwenXiong

Nov 23 '17 06:11 MissDores

#85 Maybe here is the answer to your problem

Mar 26 '18 12:03 starxhong