How to get the consistency measurement?
E(Oi, Oj) = LPIPS(Oi, Mi,j, Wi,j(Oj)), how to get the mask and how to apply it to lpips?
E(Oi, Oj) = LPIPS(Oi, Mi,j, Wi,j(Oj)), how to get the mask and how to apply it to lpips?
We totally followed the link (https://github.com/phoenix104104/fast_blind_video_consistency) to calculate lpips, please find the detail in there. thanks
I have read their code. in evaluate_LPIPS.py they use LPIPS to get the perceptual distance between processed image P and their model output O. But, P and O are the same frame of the video. in evaludate_WarpError.py, they use optical flow predicted by FlowNet2 betweent frame1 and frame2 to warp frame2 to frame1, then calculate the L2 distance on non-occlude pixels. They do not use masks on LPIPS metric. As far as I know, LPIPS use vgg/squeeze/alex net to extract feature maps of differen layers of two input images, then calculate the L2 distance. So I am really confused about the mask Mi,j used in the equation. Could you please explain this detail more clearly? thank you.
@kigane Have you solved this problem? It strange that none of StylizedNeRF, StyleRF, Learning to Stylize Novel Views, etc. provide a calculation method of consistency.
@kigane Have you solved this problem? It strange that none of StylizedNeRF, StyleRF, Learning to Stylize Novel Views, etc. provide a calculation method of consistency.
I have the same doubt as well. Why hasn't the calculation method for quantitative indicators been provided, even though it's the only evaluation criterion?
I have read their code. in evaluate_LPIPS.py they use LPIPS to get the perceptual distance between processed image P and their model output O. But, P and O are the same frame of the video. in evaludate_WarpError.py, they use optical flow predicted by FlowNet2 betweent frame1 and frame2 to warp frame2 to frame1, then calculate the L2 distance on non-occlude pixels. They do not use masks on LPIPS metric. As far as I know, LPIPS use vgg/squeeze/alex net to extract feature maps of differen layers of two input images, then calculate the L2 distance. So I am really confused about the mask Mi,j used in the equation. Could you please explain this detail more clearly? thank you.
Have you tried testing the generated results using the code from "warperror.py"? If so, are the results close to those in the paper?