object_centric_VAD The results were not satisfactory

Hello, thanks for your help. I have finished the experiment. However, my result on avenue is only 0.448 auc, which is far from the experimental result. May I ask what is the result of your test? And how can I improve the results?

Aug 27 '19 01:08 AndyHon

The num of object extracted from Avenue is about 5500, which makes the result ugly, while the author comfirmed that it is exactly the num they got.

I don't know what params you used, from my experiments, no bn, no normalization, no class_add get better results. However, none of them perform as good as the author did. Mine is below:

Ped2 | 86.51% Avenue | 64.11% ShanghaiTech | 80.35%

Maybe the last part, SVM, is where the problem lies. You can use cffi to link dynamic library of vlfeat, maybe better. Considering the framework includes 3 stages, it may works better with unit testing. Sorry that I have no time to better it. If you are interested, you can imporve it yourself.

Aug 27 '19 05:08 fjchange

I really appreciate your guidance. I really admire your successful implementation of this paper.I still have two questions, please help me to answer them. There is a function compute_auc_average in the evaluate. py file with the following code: for sub_loss_file in loss_file_list: # the name of dataset, loss, and ground truth dataset, psnr_records, gt = load_psnr_gt(loss_file=sub_loss_file) if dataset=='shanghaitech': gt[51][5]=0 elif dataset=='ped2': for i in range(7,11): gt[i][0]=0 elif dataset=='ped1': gt[13][0]=0 # the number of videos num_videos = len(psnr_records) I don't quite understand the ground truth corresponding to different data sets is set to 0 in different places. Could you please help me explain it?

Why are they set to 0 in these places?
How is the corresponding place of this different data set obtained?
How can I know to set 0 in the corresponding place of ground truth on my own data set?

In addition, I look at the pre-training model on the coco data set you directly use. If I retrain a detection model of ssd-resnet50 on the experimental data set, the detected box will be more accurate, and the experimental effect will be better? I'm sorry to bother you again and again. I sincerely appreciate your help and wish you a happy life.

Aug 28 '19 15:08 AndyHon

The code you showed is a stupid way to avoid nan in calculating the AUC of a video that every frame annotated as abnormaly. That doesn't have any trick and do little influence to the result. You can use the original annotation, which doesn't have this problem. This part of code is use to calculate the AUC as the author did, which is not reasonable.
I can assure that better object detector can get a better result. However, there is no object annotation in all abnormaly detection dataset, maybe you can do domain adaptation work. It should be better. Considering the speed, this paper use ssd-resnet50 to reduce calculation to make it possible to work on-line. If you don't care about the speed, a heavier but better detector can lead to nicer result, especially for the Avenue dataset.
I would spend some time this two days to rewrite the last two stages, clustering and SVM in matlab as the author did. Thanks for you attention on my work, you are welcome.

Aug 29 '19 09:08 fjchange

Thank you both for your work and useful insights. @fjchange , what results are you getting using the reimplementation when scoring as Liu et al.?

Sep 05 '19 15:09 amirmk89

I have tested the author's result as Liu et al. as below. ( He sent me the anomaly_scores.txt in email )

dataset	AUC	paper said	gap
Avenue	86.56%	90.4%	-3.84%
ShanghaiTech	78.5645%	84.9%	-6.3356%

They have comfirmed that.

Sep 06 '19 05:09 fjchange

Thank you, and what are the best results you are able to achieve using your reimplementation? both calculated as the authors and as Liu et al.?

Sep 08 '19 06:09 amirmk89

At my bst, as the authors,

Avenue 64.11% (only 5500 object detected in training part)
ped2 86.51%
ShanghaiTech 80.35% No norm, Smoothed (the param may change), no class add ( but the author say there should be), not on matlab.

Sep 12 '19 06:09 fjchange

I follow your steps using newest code, but can't reproduce your auc on shanghaitech, all using default param, not on matlab. Have you tested your code on newest version?

Sep 12 '19 06:09 xiadingZ

At my bst, as the authors,

Avenue 64.11% (only 5500 object detected in training part)

ped2 86.51%

ShanghaiTech 80.35% No norm, Smoothed (the param may change), no class add ( but the author say there should be), not on matlab.

Thank you again, but properly scoring as Liu et al, for the score the authors claim they got 78.56 AuC, what is your result?

Sep 12 '19 08:09 amirmk89

when i train model on shanghaitech, three stream's loss are about 0.0010~0.0014, are they correct?

Sep 15 '19 09:09 xiadingZ

Hello, I found a problem today. Since my detection effect was not good, I redrew the rectangular box according to the coordinates of box, and found that the box did not correspond to the object. In line 133 of test.py, box=[int(box[0] * image_height), int(box[1] * image_height), int(box[2] * image_height), int(box[3] * image_width)] But I have some questions about this.The box coordinates of the object detection with SSD are upper left and lower right, So i think box=[int(box[0] * image_width), int(box[1] * image_height), int(box[2] * image_width), int(box[3] * image_height )] I look forward to your valuable Suggestions.

Sep 17 '19 06:09 AndyHon

In util.py, `def box_image_crop(image_path,box,target_size=64): image=cv2.imread(image_path,0) box=[box[0],box[1],box[0]+box[2],box[1]+box[3]] crop_image=image[box[0]:box[2],box[1]:box[3]] crop_image=cv2.resize(crop_image,dsize=(target_size,target_size)) crop_image=np.array(crop_image).reshape((target_size, target_size, 1)).astype(np.float32)/255.0

return crop_image`

Why do you scale the image size to 64 here?Is the coordinate position in the upper left corner with width and hight in SSD?

Sep 18 '19 03:09 AndyHon

@AndyHon 1. Using tensorboard can help you to know about the box. 2. 64*64 is designed by the author in the paper.

Sep 19 '19 07:09 fjchange

@fjchange Thank you for you job. I have a question about gradients. In the paper, "For each object, we obtain two image gradients, one representing the change in motion from frame t−3 to frame t and one representing the change in motion from frame t to frame t + 3." But in your code, the gradients are only calculated by the frame t-3 and t+3， is it right?

Nov 26 '19 03:11 fanzijuan0625

hi, I have some problem in improve the result, can you share your code for my reference, thank you!

Dec 16 '20 07:12 horizonly

I really appreciate your guidance. I really admire your successful implementation of this paper.I still have two questions, please help me to answer them. There is a function compute_auc_average in the evaluate. py file with the following code: for sub_loss_file in loss_file_list: # the name of dataset, loss, and ground truth dataset, psnr_records, gt = load_psnr_gt(loss_file=sub_loss_file) if dataset=='shanghaitech': gt[51][5]=0 elif dataset=='ped2': for i in range(7,11): gt[i][0]=0 elif dataset=='ped1': gt[13][0]=0 # the number of videos num_videos = len(psnr_records) I don't quite understand the ground truth corresponding to different data sets is set to 0 in different places. Could you please help me explain it?

Why are they set to 0 in these places?

How is the corresponding place of this different data set obtained?

How can I know to set 0 in the corresponding place of ground truth on my own data set?

In addition, I look at the pre-training model on the coco data set you directly use. If I retrain a detection model of ssd-resnet50 on the experimental data set, the detected box will be more accurate, and the experimental effect will be better? I'm sorry to bother you again and again. I sincerely appreciate your help and wish you a happy life.

Hello, what's your final AUC on avenue？

Apr 13 '21 12:04 lss0510