jw xiong

Results 9 issues of jw xiong

您好,请问下,在windows10上编译dll_detector出现下面错误,需要怎么解决呀? 环境: - CUDA: 11.1 - TensorRT: TensorRT-7.2.3.4.Windows10.x86_64.cuda-11.1.cudnn8.1 - Opencv: 4.5.2 ![image](https://user-images.githubusercontent.com/35716657/154103470-10bca042-388d-4baa-a708-6be0e876d9b2.png)

作者您好,最近我尝试了下UFO模型,并且训练了Co-Segmentation和Video Salient Object Detection任务。但在测试过程中,有些令我比较疑惑的问题,特地来问下。 1. 采用原始模型权重进行测试的结果与我重新训练模型的测试结果不一致。 - Co-Segmentation任务 原始模型权重的测试结果 ![image](https://github.com/suyukun666/UFO/assets/35716657/4ff2d272-0d3d-4755-a278-335495960627) 重新训练模型的测试结果 ![image](https://github.com/suyukun666/UFO/assets/35716657/ba1e30a7-fcc5-4151-9398-0ef691d15041) - Video Salient Object Detection任务 原始模型权重的测试结果 ![image](https://github.com/suyukun666/UFO/assets/35716657/fc08c039-058c-48ad-91d6-f515c13c3e4d) 重新训练模型的测试结果 ![image](https://github.com/suyukun666/UFO/assets/35716657/0756fcca-89b6-4b1d-90f9-98a4402ef104) 两者间相差是比较大的,重新训练的模型均采用仓库中的配置,未作任何改动。目前,我还不清楚是什么原因。

Hi, when to release the training code?

Can you explain why encoder_hidden_state is used in the motion module? The motion module as expressed in the paper is a vanilla temporal attention, not cross-attention. ![image](https://github.com/guoyww/AnimateDiff/assets/35716657/7ed76732-c7b7-4597-83da-b48ff19b5724) https://github.com/guoyww/AnimateDiff/blob/cf80ddeb47b69cf0b16f225800de081d486d7f21/animatediff/models/unet_blocks.py#L411

I'm curious, the total length of the test dataset is not enough for the batch size 512 in the paper, how did you train it? Is there any other data...

I'm curious why encoder_hidden_state is used in the motion module? The motion module as expressed in the paper is a vanilla temporal attention, not cross-attention. https://github.com/guoqincode/Open-AnimateAnyone/blob/f3e014e0c985cd06e1955169cb381aa61482a968/models/unet_3d_blocks.py#L391-L392

Can you share how to obtain the LAION-Human dataset, which is used for the first stage of the paper?

Thank you for providing this amazing work. The demo test has been tried without any problem, but I want to test on other data. The code of how to construct...

How about the quality when using more than 100 frames for training?