Thermal x Vision Support
Following issue 14, I created a small example for thermal embedding. While the Vision x Text and Thermal x Text are working properly, it seems the Vision x Thermal does not yield the correct result.
def load_and_transform_thermal_data(thermal_paths, device):
if image_paths is None:
return None
thermal_ouputs = []
for thermal_path in thermal_paths:
data_transform = transforms.Compose(
[
transforms.Resize(
224, interpolation=transforms.InterpolationMode.BICUBIC
),
transforms.CenterCrop(224),
transforms.ToTensor(),
# transforms.Normalize(
# mean=(0.5),
# std=(0.5),
# ),
]
)
with open(thermal_path, "rb") as fopen:
thermal = Image.open(fopen).convert("L")
thermal = data_transform(thermal).to(device)
thermal_ouputs.append(thermal)
return torch.stack(thermal_ouputs, dim=0)
And the results are:
Vision x Text:
[[9.9997604e-01 2.3943641e-05]
[6.0792509e-06 9.9999392e-01]]
Thermal x Text x :
[[1.0000000e+00 1.2433221e-11]
[2.8220674e-02 9.7177935e-01]]
Vision x Thermal Cosine:
[[0.1554441 0.02945926]
[0.16725276 0.03671783]]
Vision x Thermal Softmax:
[[0.7789999 0.22100005]
[0.7867338 0.21326624]]
What dataset did you use for your thermal data? Did you use LLVIP in the paper?
Done?
Where did you add the function "load_and_transform_thermal_data" exactly? I am facing a different issue though but this might help!, my issue is this: Given groups=1, weight of size [768, 1, 16, 16], expected input[3, 3, 224, 224] to have 1 channels, but got 3 channels instead
Thanks in advance!
Also I think there is a typo in line 2, replace image_paths with thermal_paths
Hi, here to recommend our work, which is LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment. We open source all training and validation code.
LanguageBind can be disassembled into different branches to handle different tasks.
print("Video x Audio: \n", torch.softmax(embeddings['video'] @ embeddings['audio'].T, dim=-1).detach().cpu().numpy()) print("Video x Thermal: \n", torch.softmax(embeddings['video'] @ embeddings['thermal'].T, dim=-1).detach().cpu().numpy()) print("Image x Thermal: \n", torch.softmax(embeddings['image'] @ embeddings['thermal'].T, dim=-1).detach().cpu().numpy()) print("Image x Depth: \n", torch.softmax(embeddings['image'] @ embeddings['depth'].T, dim=-1).detach().cpu().numpy())
@LinB203 I have tried your work, but I have run inference.py in the code multiple times and the output results are inconsistent each time. Therefore, I guess there may be an error somewhere. Please verify this issue.