CLOVER Vision encoder output dimension does not match

Hi, thanks for your excellent work! I'm trying to run bash eval_calvin.sh. When running to FeedbackPolicy/models/policy.py, there is an issue where the shape of the vision_x input to vision_encoder is 192 * 192, which does not match the model size of 224 * 224. So I interpolated vision_x to 224 * 224, and the shape of output by vision_encoder is 8 * 768, which does not match the dimension of the rearrange operation.vision_x = rearrange(vision_x, "(b T) d h w -> b T (h w) d", b=b, T=T)

Oct 18 '24 07:10 yuaoze

Thanks for your interests in our work! We modify the default input size of VC1-Base model (from 224 to 192) in its corresponding config file. Just a small tweak to the config will let you use our evaluation scripts effectively.

Further updates are welcome if it fails to solve your issue. 😃

Oct 18 '24 10:10 retsuh-bqw

Thanks for your interests in our work! We modify the default input size of VC1-Base model (from 224 to 192) in its corresponding config file. Just a small tweak to the config will let you use our evaluation scripts effectively.

Further updates are welcome if it fails to solve your issue. 😃

Hi, I followed your advice and modified the config file of VC1-Base model, but error still occurred. Here is the details.

Oct 21 '24 01:10 yuaoze

Thanks for your interests in our work! We modify the default input size of VC1-Base model (from 224 to 192) in its corresponding config file. Just a small tweak to the config will let you use our evaluation scripts effectively. Further updates are welcome if it fails to solve your issue. 😃

Hi, I followed your advice and modified the config file of VC1-Base model, but error still occurred. Here is the details.

I solved this issue by specified output_size: 192 under "transform" in config file But output of vision_encoder is shape of 8 * 768, which can not match the dimension of the rearrange operation.vision_x = rearrange(vision_x, "(b T) d h w -> b T (h w) d", b=b, T=T) Can you give me some advice?

Oct 21 '24 03:10 yuaoze

But output of vision_encoder is shape of 8 * 768, which can not match the dimension of the rearrange operation.vision_x = rearrange(vision_x, "(b T) d h w -> b T (h w) d", b=b, T=T) Can you give me some advice?

My bad. You should also set use_cls to False in the config file. Then the encoder will return all feature tokens.

Oct 21 '24 04:10 retsuh-bqw

Hello! I met the same problem. After I set img_size to 192 and use_cls to False, the error still occurred: AssertionError("Input image height (224) doesn't match model (192)."). Can you give me more advice?

Oct 28 '24 11:10 hkz103

Hello! I met the same problem. After I set img_size to 192 and use_cls to False, the error still occurred: AssertionError("Input image height (224) doesn't match model (192)."). Can you give me more advice?

Is it because the sanity check in the load_model function (line 26 - 29) of VC-1? You may change the function as following:

def load_model(
    model,
    transform,
    metadata=None,
    checkpoint_dict=None,
):
    if checkpoint_dict is not None:
        msg = model.load_state_dict(checkpoint_dict)
        log.warning(msg)

    return model

Oct 29 '24 04:10 retsuh-bqw

Hello! I met the same problem. After I set img_size to 192 and use_cls to False, the error still occurred: AssertionError("Input image height (224) doesn't match model (192)."). Can you give me more advice?

Is it because the sanity check in the load_model function (line 26 - 29) of VC-1? You may change the function as following:
def load_model(
    model,
    transform,
    metadata=None,
    checkpoint_dict=None,
):
    if checkpoint_dict is not None:
        msg = model.load_state_dict(checkpoint_dict)
        log.warning(msg)

    return model

It works! But I met a new problem: bug

Oct 30 '24 07:10 hkz103

It works! But I met a new problem:

It seems to be an issue within CALVIN. Is your CALVIN env properly installed?

Oct 30 '24 07:10 retsuh-bqw

微信图片_20241030164904 Hi, I run `bash eval_calvin.sh`, but `failed to EGL with glad.`, Do you know how to solve this?

Oct 30 '24 08:10 gouyinghong

It works! But I met a new problem:

It seems to be an issue within CALVIN. Is your CALVIN env properly installed?

You are right. I didn't properly install CALVIN. However, the packages uesd in CALVIN and CLOVER seem contradictory. Can you provide a requirements.txt?

Oct 30 '24 10:10 hkz103

You are right. I didn't properly install CALVIN. However, the packages uesd in CALVIN and CLOVER seem contradictory. Can you provide a requirements.txt?

There is a provided requirements.txt at visual_planner/requirements.txt. What packages conflicts are you getting exactly?

Oct 30 '24 10:10 retsuh-bqw

You are right. I didn't properly install CALVIN. However, the packages uesd in CALVIN and CLOVER seem contradictory. Can you provide a requirements.txt?

There is a provided requirements.txt at visual_planner/requirements.txt. What packages conflicts are you getting exactly?

Now I met the problem of "Cannot load URDF file" again. And the packages conflicts are listed below. Can you give me more advice? Thanks for your help! problem

Oct 31 '24 05:10 hkz103

Now I met the problem of "Cannot load URDF file" again. And the packages conflicts are listed below. Can you give me more advice? Thanks for your help!

You can try to downgrade your networkx to 2.2. I think the other packages are fine.

Oct 31 '24 05:10 retsuh-bqw

Now I met the problem of "Cannot load URDF file" again. And the packages conflicts are listed below. Can you give me more advice? Thanks for your help!

You can try to downgrade your networkx to 2.2. I think the other packages are fine.

When using networkx2.2，AttributeError"module 'numpy' has no attribute 'int'." is reported, because the high version of numpy no longer uses int and networkx2.2 may use int in numpy.

Oct 31 '24 06:10 hkz103

When using networkx2.2，AttributeError"module 'numpy' has no attribute 'int'." is reported, because the high version of numpy no longer uses int and networkx2.2 may use int in numpy.

You may try to downgrade the numpy as well. I'll update relavant information in a new Troubleshooting section.

Oct 31 '24 12:10 retsuh-bqw