Not very clearly about ActorObserverFC7.py line 28
Sorry to disturb you!
Should base_x be replaced with base_y?
w_y = self.third_fc(base_x).view(-1) * torch.exp(self.third_scale)
And What does this function do? (in tasks.py)
def best_one_sec_moment(mat, winsize=6):
thank you!
You are correct. Fixed in c26adeb :) These weights for alignment didn't actually end up being used for the alignment task in the final paper, but I experimented with it a bit, and it seems promising.
best_one_sec_moment is the "Alignment and localization" experiment for the paper. That is, given a matrix of distances between all possible pairs of frames in the two videos (one first person, and one third person), it first smooths this matrix to obtain the distance between all 1 second clips in the two videos, and then finds the "best match".
Hope that helps! Let me know if you have any questions.
PS. ActorObserver is a part of the new PyVideoResearch codebase, that should be better organized and have more features.
Thank you for your kind answer! I have another problem. https://github.com/gsig/actor-observer/blob/c26adeb47f8905fd6d69e351386f11b4bdce9c03/models/layers/ActorObserverLoss.py#L68 When I train, weight median always equals 1 and all weights are 1 . Do you know what the reason may be?
Yea, I ran into that problem sometimes. I am not really sure what the reason for that is. If I remember correctly it doesn't happen when you use '--arch', 'ActorObserverBaseNoShare', but it would be interesting to understand why the network doesn't use the weights in this case. You can try using the improved codebase in PyVideoResearch and see if it happens as well.
Hope that helps!
During the training , the accuracy of the output sometimes becomes 1 after iterating a few times. What is the reason for this? Is it over fitting? For example: alignment 8.5 top1 0.498481507273 top1val 0.496781428405 topk1 1.0 topk10 1.0 topk2 1.0 topk5 1.0 topk50 0.778544797856 wtop1 0.978047796308 wtop1val 0.927929341565
The network is giving a weight to each of the test examples, and then evaluating just on the top fraction. topk1, topk2, topk5, and topk10 are the top 1%, top 2%, top 5% and top 10% weighted examples of the test set, respectively. Since this is the test set (I'm assuming?) it just means that the network has gotten really good at classifying the "easiest" 10% of the examples. Note that for the top 50% (topk50) the performance is still only 77.85%.