VBench strange numbers for different dimensions

Hi I am trying to run vbench for several metric over some mochi generations from G-drive, and I am getting strange numbers for almost all metrics.

python evaluate.py --dimension $dimension --videos_path $videopath

for example:

{
    "overall_consistency": [
        323.9935025374095,
        [
            {
                "video_path": "Flying through fantasy landscapes.-1.mp4",
                "video_results": 3.139169692993164
            },
            {
                "video_path": "Splash of turquoise water in extreme slow motion, alpha channel included.-0.mp4",
                "video_results": 2.416114568710327
            },
            {
                "video_path": "Ashtray full of butts on table, smoke flowing on black background, close-up-2.mp4",
                "video_results": 0.885326623916626
            },
            {
                "video_path": "A boat sailing leisurely along the Seine River with the Eiffel Tower in background by Vincent van Gogh-4.mp4",
                "video_results": -6.3663105964660645
            },
            {
                "video_path": "A future where humans have achieved teleportation technology-1.mp4",
                "video_results": -26.551517486572266
            },
            {
                "video_path": "An oil painting of a couple in formal evening wear going home get caught in a heavy downpour with umbrellas-1.mp4",
                "video_results": 1970.438232421875
            }
        ]
    ]
}

or

{
    "appearance_style": [
        4.6276485505523374e-08,
        [
            {
                "video_path": "A beautiful coastal beach in spring, waves lapping on sand, Van Gogh style-4.mp4",
                "video_results": 0.0,
                "frame_results": [
                    0.0,
                    0.0,
                    0.0,
                    0.0,
                    0.0,
                    0.0,
                    0.0,
                    0.0,
                    0.0, ...., 0.0
                    ],
                    "cur_sim": 0.0
            },
{
                "video_path": "A cute happy Corgi playing in park, sunset by Hokusai, in the style of Ukiyo-0.mp4",
                "video_results": 5.902761330633807e-07,
                "frame_results": [
                    9.59014892578125e-05,
                    1.8405914306640624e-06,
                    -1.4293193817138672e-06,
                    -7.092952728271484e-08,
                    -1.9669532775878907e-08,
                    -7.152557373046875e-09,
                    -0.0,
                    -0.0,
                    -0.0,
                    -0.0,
                    -0.0,
                    0.0,
                    -0.0,
                    0.0, ..., 0.0
                    ],
                "cur_sim": 0.0
            }
             ]
    ]
}

or

{
    "subject_consistency": [
        0.0,
        [
            {
                "video_path": "subject_consistency/a person drinking coffee in a cafe-2.mp4",
                "video_results": 0.0
            },
            {
                "video_path": "subject_consistency/a motorcycle cruising along a coastal highway-2.mp4",
                "video_results": 0.0
            },
            {
                "video_path": "subject_consistency/a dog playing in park-3.mp4",
                "video_results": 0.0
            },
            {
                "video_path": "subject_consistency/an elephant spraying itself with water using its trunk to cool down-1.mp4",
                "video_results": 0.0
            },
            {
                "video_path": "subject_consistency/an elephant running to join a herd of its kind-1.mp4",
                "video_results": 0.0
            },
            {
                "video_path": "subject_consistency/a bear climbing a tree-2.mp4",
                "video_results": 0.0
            }
        ]
    ]
}

Dec 02 '24 09:12 rahimentezari

The numbers are fine for me,

{
    "subject_consistency": [
        0.9419299250653854,
        [
            {
                "video_path": "../Mochi/subject_consistency/a person swimming in ocean-0.mp4",
                "video_results": 0.9451899988415801
            },
            {
                "video_path": "../Mochi/subject_consistency/a person swimming in ocean-1.mp4",
                "video_results": 0.9257821295364403
            }, ...

{
    "overall_consistency": [
        0.25248718899877176,
        [
            {
                "video_path": "../Mochi/overall_consistency/Close up of grapes on a rotating table.-0.mp4",
                "video_results": 0.25615277886390686
            },
            {
                "video_path": "../Mochi/overall_consistency/Close up of grapes on a rotating table.-1.mp4",
                "video_results": 0.22180302441120148
            },

Can you check if the mp4 files are not corrupted? and Can you also run the same setup but with our sample videos?

Dec 02 '24 10:12 NattapolChan

I downloaded the videos from the google drive uploaded by vbench team. Do you have these 4 videos above somewhere so that I can quickly check?

Dec 02 '24 14:12 rahimentezari

The videos i tested are actually from the mochi google drive you mentioned, so it should be the same videos you tested. Any mp4 video should work for testing, and it should not produce that strange number.

You can also use this video: https://cdn.openai.com/tmp/s/interp/d0.mp4.

then run python VBench/evaluate.py --videos_path 'd0.mp4' --dimension subject_consistency --mode=custom_input

I just want to know whether it's the problem with the setup or the video itself.

Dec 02 '24 19:12 NattapolChan

full_info.json

[
    {
        "prompt_en": "d0",
        "dimension": [
            "subject_consistency"
        ],
        "video_list": [
            "sampled_videos/d0.mp4"
        ]
    }
]

eval_results.json

{
    "subject_consistency": [
        0.0,
        [
            {
                "video_path": "sampled_videos/d0.mp4",
                "video_results": 0.0
            }
        ]
    ]
}

Dec 04 '24 09:12 rahimentezari

also tried your two video from mochi above for overall_consistency. can you test these two on your end? I am getting different numbers than yours. python evaluate.py --dimension 'overall_consistency' --videos_path Mochi_test2/overall_consistency

Dec 04 '24 09:12 rahimentezari