OpenTAD icon indicating copy to clipboard operation
OpenTAD copied to clipboard

Roadmap and Feedback

Open sming256 opened this issue 1 year ago • 17 comments

We keep this issue open to collect feature requests and feedback from users, and thus keep improving this codebase.

If you didn't find the features you need in the Road Map, please leave a message here.

Thank you!

sming256 avatar Mar 28 '24 12:03 sming256

cool

rixejzvdl649 avatar Apr 01 '24 06:04 rixejzvdl649

Hi @sming256 Thank you for sharing the toolkit. Its really amazing, I wanted to the ask, the paper AdaTAD also has the parallel adapter approach, AdaTAD' (75.4 in paper) but when implementing it using the codebase shared I am able to achieve somewhere around 73.4 mAP, can you please share if there are any parameters are different between AdaTAD and AdaTAD'?

akshitac8 avatar May 03 '24 18:05 akshitac8

Only a few hyperparameters are changed, such as the mlp ratio in the adapter is changed from 1/4 to 1/8, and the learning rate is searched between 1e-4 to 5e-5. I will update the parallel backbone and checkpoint later.

sming256 avatar May 03 '24 19:05 sming256

That would be really helpful @sming256 for reproducing the results 😃.

akshitac8 avatar May 05 '24 01:05 akshitac8

Hi @sming256 wanted to check if you could please upload the parallel backbone code as well that would be great.

akshitac8 avatar May 06 '24 17:05 akshitac8

Are you planning to also release tridetplus in this toolkit? (https://github.com/dingfengshi/tridetplus) thanks!

Caspeerrr avatar May 23 '24 09:05 Caspeerrr

@Caspeerrr Thanks for your suggestion! Integrating TriDetPlus into OpenTAD seems straightforward. However, TriDetPlus only released the VideoMAEv2 feature, not the DINO2 feature. This is the only reason we haven't integrated it now.

sming256 avatar May 27 '24 18:05 sming256

Hi @akshitac8 , the side tuning model is released here, and we provide a training example here . When implementing the side-tuning with the latest OpenTAD, we find a performance drop of around 1% on THUMOS. In our released checkpoint, we achieve 74.65% mAP using VideoMAEv2-g.

sming256 avatar Jun 06 '24 08:06 sming256

Hi @sming256, Thank you for an excellent work. I have a question about open source datasets not yet included in this repo.

You mentioned as follows: Support multiple TAD datasets. We support 9 TAD datasets, including ActivityNet-1.3, THUMOS-14, HACS, Ego4D-MQ, EPIC-Kitchens-100, FineAction, Multi-THUMOS, Charades, and EPIC-Sounds Detection datasets.

What about other open source datasets (e.g, AVA2, UCF24)?

aidevveloper avatar Aug 19 '24 05:08 aidevveloper

Hi @Harry-KIT, thanks for your question. OpenTAD is designed for temporal action detection task. I think AVA2 and UCF24 are datasets of spatial-temporal action detection task, which requires additional spatial bounding boxes.

sming256 avatar Aug 19 '24 06:08 sming256

I see. Thank you very much

aidevveloper avatar Aug 19 '24 09:08 aidevveloper