TemporalAlignNet icon indicating copy to clipboard operation
TemporalAlignNet copied to clipboard

About video timestamps in HTM_AA dataset

Open zjr2000 opened this issue 3 years ago • 0 comments

Hi, thank you for the fantastic work! I have some questions about the timestamp in the HTM_AA dataset and hope you can help me.

As mentioned in issue #3, the timestamp provided by your model is the center timestamp of the sentence. And the shifted timestamps have the same length as the original ASR timestamp, which is mentioned in sec 3.4.2 of your paper. To build the timestamp in the following format: [t_start, t_end], I use the following steps:

For the i-th sentence in video V:

$t^i = [t^i_{center} - l^i_v / 2, t^i_{center} + l^i_v / 2]$

, where $l$ is the timestamp duration queried from HTM-1.2M and $t_{center}$ is queried from HTM_AA.

But I found that not all sentences in HTM_AA can be queried from HTM-1.2M. Could you please explain it? (Sorry if I missed some details in your paper)

zjr2000 avatar Sep 06 '22 11:09 zjr2000