Yixuan Wang

Results 29 comments of Yixuan Wang

Another suggestion: I think now modern cilantro is supported. You may change it in cilandro_build.md

Thank you for your reply! Just to double-check, `encoded_text` is the text feature used, right?

I followed the description from the paper and tested the example elephant images from the paper. The following code can segment out the foreground. It is kinda able to learn...

> > I havent gotten there to this point, but let me see if i can help So assuming you have 16 x 16 patches , and an embedding dimension...

> Progress! ![pil_image](https://user-images.githubusercontent.com/40346852/235347098-eb3f816a-5c8d-4e5a-900f-026d01e324b9.png) ![vit_large_desc](https://user-images.githubusercontent.com/40346852/235347100-085f0fef-9aee-45af-8068-603cf6165587.png) Could you please share more details on how you obtained this result?

Hi, thank you for your interest in our work! Yes, it is possible to optimize without dynamic models. If the action is only pick and place, you can assume the...

I may consider that if many people want it, but it may take a while. (pls leave a thumbup if you want this feature).

I will leave this issue open so that people could comment if they want this feature.

Currently, we are manually setting the reference camera pose. But it is possible to do it automatically.