Can I know why clip model outputs are changed by batch size??

Open larcane97 opened this issue 4 years ago • 1 comments

Hi, first of all, Thank you for sharing clip model coed and brilliant idea!! :) during the model experiments, I found that clip model's outputs are changed by batch size, But I couldn't come up with why this values are changed.. could you tell me why this is happening?

below is the code to reproduce !

# The length of token_labels : 11
with torch.no_grad():
    dummy = torch.rand((64,3,224,224)).cuda()
    image_logit,text_logit = clip_model.forward(dummy[0].unsqueeze(0),token_labels)
    batch_image_logit,batch_text_logit = clip_model.forward(dummy,token_labels)

print(image_logit.squeeze().tolist())
print(batch_image_logit[0].tolist())
print()
print(text_logit.squeeze().tolist())
print(batch_text_logit.T[0].tolist())

output:
[23.125, 22.1875, 21.375, 21.84375, 22.578125, 22.4375, 20.625, 23.203125, 20.5, 24.140625, 19.453125]
[23.125, 22.171875, 21.375, 21.828125, 22.5625, 22.4375, 20.625, 23.1875, 20.484375, 24.140625, 19.453125]

[23.125, 22.1875, 21.375, 21.84375, 22.578125, 22.4375, 20.625, 23.203125, 20.5, 24.140625, 19.453125]
[23.125, 22.171875, 21.375, 21.828125, 22.5625, 22.4375, 20.625, 23.1875, 20.484375, 24.140625, 19.453125]

Dec 16 '21 18:12 larcane97

This is unfortunately true for most of CUDA programs. The runtime is free to select different implementations (for matrix multiplication, etc.) depending on the batch size, even between subsequent calls, and they might introduces slightly numerical differences similar to what you're seeing. Running models in fp32 will likely reduce the discrepancy, and check out some documentations regarding the determinism issue from NVIDIA and PyTorch which might also help.

Apr 11 '22 02:04 jongwook