About video timestamps in HTM_AA dataset
Hi, thank you for the fantastic work! I have some questions about the timestamp in the HTM_AA dataset and hope you can help me.
As mentioned in issue #3, the timestamp provided by your model is the center timestamp of the sentence.
And the shifted timestamps have the same length as the original ASR timestamp, which is mentioned in sec 3.4.2 of your paper. To build the timestamp in the following format: [t_start, t_end], I use the following steps:
For the i-th sentence in video V:
$t^i = [t^i_{center} - l^i_v / 2, t^i_{center} + l^i_v / 2]$
, where $l$ is the timestamp duration queried from HTM-1.2M and $t_{center}$ is queried from HTM_AA.
But I found that not all sentences in HTM_AA can be queried from HTM-1.2M. Could you please explain it? (Sorry if I missed some details in your paper)