LokiXun issues

Results 14 issues of


                                            LokiXun

Is there anywhere to get Test, validation set for dataset?

Hi, it such a amazing work! I am trying to train the model on another training data, and I am wondering Is there anywhere to get Test, validation set for...

loss explosion when training on custom Dataset

Hi, it an awesome work! May I ask some help, I met some problems when training the model on REDS video dataset. When the training elapses about 40K iterations, the...

Any support to torch2.x CUDA11.7+

Hi, I have question on employing DCNv2 on torch2.0.0+CUDA11.7. Is there any DCNv2 repository could support this configuration? If not, what is the resaon for the failed compatibility? And what...

video denoising: Failure in latter patches

Hi, I am using pretrained model for video denoising. Checking the code, the model split the frames into overlapped patches to process. However, for the result, the first few patches...

Question for the extra conv layers in Encoder

Hi, I have a question about the **Why adding extra conv layers in the Encoder**. And this encoder structure is used in the latter SOTA methods like E2FGVI, Pro-painter. In...

Question in paper: why the text-prompt embeding uses the penultimate text embeddings of a CLIP ViT-H/14 text-encoder?

Hi, I am wondering why the Prompt-embedding in StableDiffusion extracted from the penultimate layer of CLIP ViT-H/14 text-encoder? Why not using the origin clip feature just like the image feature...

Question in paper: 1. how to get the mask region as input; 2. Any comparison with ProPainter, just curious

For Q1. how to get the mask region as input; Maybe it just doing frame-wise decrease with GT and binarize the mask?

Question in paper: For ablation study "Compare ControlNet and LAControlNet"

Hi, I have a question in paper. May I ask a question In the ablation study of "Compare ControlNet and LAControlNet". Figure7(c), the w/ ControlNet result is trained from scratch...

question in Visual Grounding, what format of the region should i give?

Hi, I have a question for visual grounding. I have a 720x1280 image and i want to describe the region in `[0,0, 512,512]` (x1,y1, x2,y2)so I follow the CogVLM1's suggestion...

Question in paper for accelerated sampling formula

Hi, It's a work with great impact. And I had a problem to understand the formula in eq52 (Accelerate sampling). May I wondering that: 1. why eq52 have a additional...