TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

How to force layernorm to run at FP32 precision with "config->setFlag(BuilderFlag::kFP16);"

Open w1005444804 opened this issue 1 year ago • 0 comments

I just want to use VIT with fp16 precision , but [W]Running layernorm after self-attention in FP16 may cause overflow. Now I hope other layers run in FP16,but layernorm layers run in fp32! Can you provide some C++examples for manually setting the layernorm layers to run in FP32 with "config->setFlag(BuilderFlag::kFP16);"

onnx opset == 17; TRT == 8.6,

[W] [TRT] Detected layernorm nodes in FP16: /neck/neck.1/ReduceMean_1, /neck/neck.3/ReduceMean_1, /neck/neck.1/Sqrt, /neck/neck.3/Sqrt, /neck/neck.1/Pow, /neck/neck.3/Add, /neck/neck.1/Add_1, /neck/neck.3/Sub, /neck/neck.3/Div, /neck/neck.1/Mul, /neck/neck.1/Div, /neck/neck.3/Add_1, /neck/neck.3/Pow, /neck/neck.1/Add, /neck/neck.3/Mul, /neck/neck.1/Sub [03/03/2024-21:53:14] [W] [TRT] Running layernorm after self-attention in FP16 may cause overflow. Exporting the model to the latest available ONNX opset (later than opset 17) to use the INormalizationLayer, or forcing layernorm layers to run in FP32 precision can help with preserving accuracy. forcing layernorm layers to run in FP32 precision can help with preserving accuracy.!!!!! forcing layernorm layers to run in FP32 precision can help with preserving accuracy.!!!!!

### Tasks

w1005444804 avatar Mar 03 '24 11:03 w1005444804