ViT-Lens issues

SUN RGB-D is not in millimeters

4

I was trying to apply this model to my own data and not getting good results. I ran the NYUv2 dataset through my code, and the results seem to be...

jbrownkramer

InstructBLIP and SEED Implementation

2

Hi, I have checked the Clip-Vision embedding (last hidden state) of Blip2&InstructBlip on huggingface (instructblip-vicuna-7b), the dimension is 257x1408. However, the multi-modal matching space of ViT-Lens uses 1x768 dimension. I...

MichaelMaiii

Alternate depth normalization

4

The justification in the paper for using disparity is "scale normalization". I know that this comes from OmniVore and ImageBind. However, this does not actually achieve scale normalization. What could...

jbrownkramer

combining modalities

Hi, thanks for this amazing work. Could you share a demo code on how to combine different modalities into a single image, as mentioned in the paper: _Moreover, the model...

bakachan19

ViT-Lens
ViT-Lens copied to clipboard

Metadata

SUN RGB-D is not in millimeters

InstructBLIP and SEED Implementation

Alternate depth normalization

combining modalities

← Metadata

Owner

Metadata

ViT-Lens ViT-Lens copied to clipboard

Metadata

SUN RGB-D is not in millimeters

InstructBLIP and SEED Implementation

Alternate depth normalization

combining modalities

← Metadata

Owner

Metadata

ViT-Lens
ViT-Lens copied to clipboard