🐛 [Bug] timm_efficientnet non-pretrained+fp32 returns incorrect result
Bug Description
Torch-TensorRT returns incorrect result with non-pretrained timm_efficientnet and fp32 inputs.
I use torch.nn.CosineSimilarity to compare the output between eager mode and Torch-TensorRT.
To Reproduce
Steps to reproduce the behaviour:
- Run the script
import torch
import torch_tensorrt
import timm
import time
import numpy as np
import torch.backends.cudnn as cudnn
torch.hub._validate_not_a_forked_repo=lambda a,b,c: True
nfnet = timm.create_model('efficientnet_b0',pretrained=False)
model = nfnet.eval().to("cuda")
detections_batch = model(torch.randn(128, 3, 224, 224).to("cuda"))
detections_batch.shape
cudnn.benchmark = True
def benchmark(model, input_shape=(1024, 3, 512, 512), dtype='fp32', nwarmup=50, nruns=1000):
input_data = torch.randn(input_shape)
input_data = input_data.to("cuda")
if dtype=='fp16':
input_data = input_data.half()
print("Warm up ...")
with torch.no_grad():
for _ in range(nwarmup):
features = model(input_data)
torch.cuda.synchronize()
print("Start timing ...")
timings = []
pred_loc = None
with torch.no_grad():
for i in range(1, nruns+1):
start_time = time.time()
pred_loc = model(input_data)
torch.cuda.synchronize()
end_time = time.time()
timings.append(end_time - start_time)
if i%10==0:
print('Iteration %d/%d, avg batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))
print("Input shape:", input_data.size())
print('Average throughput: %.2f images/second'%(input_shape[0]/np.mean(timings)))
return pred_loc
trt_model = torch_tensorrt.compile(model,
inputs= [torch_tensorrt.Input((1, 3, 224, 224))],
enabled_precisions= { torch_tensorrt.dtype.float } # Run with FP32
)
eager_out = benchmark(model, input_shape=(1, 3, 224, 224), nruns=100, dtype="fp32")
trt_out = benchmark(trt_model, input_shape=(1, 3, 224, 224), nruns=100, dtype="fp32")
cos = torch.nn.CosineSimilarity(dim=0, eps=1e-4)
result = cos(eager_out.flatten().float(), trt_out.flatten().float())
print(f"Cosine similarity between eager and trt output: {float(result)}")
Expected behavior
Cosine similarity >= 0.99
Actual behavior:
Cosine similarity < 0.002
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0): git commit 3b7cd2a1d
- PyTorch Version (e.g. 1.0): 1.10.0+cu113
- CPU Architecture: AWS p3d.24xlarge instance
- OS (e.g., Linux): Linux
- How you installed PyTorch (
conda,pip,libtorch, source): pip - Build command you used (if compiling from source): python setup.py bdist_wheel
- Are you using local sources or building from archives: Using local sources
- Python version: Python 3.8
- CUDA version: CUDA 11.1
- GPU models and configuration: Nvidia A100
- Any other relevant information:
Additional Information
The problem doesn't exist when using fp16, or creating the model with pretrained=True.
@bowang007 can you take a look
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
Bo needs to ask user to see if they are experiencing issue with latest codebase.
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days