TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

🐛 [Bug] timm_efficientnet non-pretrained+fp32 returns incorrect result

Open xuzhao9 opened this issue 3 years ago • 2 comments

Bug Description

Torch-TensorRT returns incorrect result with non-pretrained timm_efficientnet and fp32 inputs. I use torch.nn.CosineSimilarity to compare the output between eager mode and Torch-TensorRT.

To Reproduce

Steps to reproduce the behaviour:

  1. Run the script
import torch
import torch_tensorrt
import timm
import time
import numpy as np
import torch.backends.cudnn as cudnn

torch.hub._validate_not_a_forked_repo=lambda a,b,c: True

nfnet = timm.create_model('efficientnet_b0',pretrained=False)

model = nfnet.eval().to("cuda")
detections_batch = model(torch.randn(128, 3, 224, 224).to("cuda"))
detections_batch.shape
cudnn.benchmark = True

def benchmark(model, input_shape=(1024, 3, 512, 512), dtype='fp32', nwarmup=50, nruns=1000):
    input_data = torch.randn(input_shape)
    input_data = input_data.to("cuda")
    if dtype=='fp16':
        input_data = input_data.half()

    print("Warm up ...")
    with torch.no_grad():
        for _ in range(nwarmup):
            features = model(input_data)
    torch.cuda.synchronize()
    print("Start timing ...")
    timings = []
    pred_loc = None
    with torch.no_grad():
        for i in range(1, nruns+1):
            start_time = time.time()
            pred_loc  = model(input_data)
            torch.cuda.synchronize()
            end_time = time.time()
            timings.append(end_time - start_time)
            if i%10==0:
                print('Iteration %d/%d, avg batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))

    print("Input shape:", input_data.size())
    print('Average throughput: %.2f images/second'%(input_shape[0]/np.mean(timings)))
    return pred_loc

trt_model = torch_tensorrt.compile(model,
    inputs= [torch_tensorrt.Input((1, 3, 224, 224))],
    enabled_precisions= { torch_tensorrt.dtype.float } # Run with FP32
)
eager_out = benchmark(model, input_shape=(1, 3, 224, 224), nruns=100, dtype="fp32")
trt_out = benchmark(trt_model, input_shape=(1, 3, 224, 224), nruns=100, dtype="fp32")
cos = torch.nn.CosineSimilarity(dim=0, eps=1e-4)
result = cos(eager_out.flatten().float(), trt_out.flatten().float())
print(f"Cosine similarity between eager and trt output: {float(result)}")

Expected behavior

Cosine similarity >= 0.99

Actual behavior:

Cosine similarity < 0.002

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): git commit 3b7cd2a1d
  • PyTorch Version (e.g. 1.0): 1.10.0+cu113
  • CPU Architecture: AWS p3d.24xlarge instance
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source): python setup.py bdist_wheel
  • Are you using local sources or building from archives: Using local sources
  • Python version: Python 3.8
  • CUDA version: CUDA 11.1
  • GPU models and configuration: Nvidia A100
  • Any other relevant information:

Additional Information

The problem doesn't exist when using fp16, or creating the model with pretrained=True.

xuzhao9 avatar Feb 23 '22 13:02 xuzhao9

@bowang007 can you take a look

narendasan avatar May 18 '22 21:05 narendasan

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Aug 22 '22 00:08 github-actions[bot]

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Nov 21 '22 00:11 github-actions[bot]

Bo needs to ask user to see if they are experiencing issue with latest codebase.

Christina-Young-NVIDIA avatar Dec 20 '22 02:12 Christina-Young-NVIDIA

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Mar 21 '23 00:03 github-actions[bot]