NaN on ARM FP16, but Successful on x86_64 Linux
The model consists of operations like ['ExpandDims', 'Split', 'Convolution1D', 'Permute', 'UnaryOp', 'BinaryOp', 'MemoryData', 'MatMul', 'Clip', 'Reduction', 'Reshape', 'Gemm', 'Concat', 'Slice', 'Softmax', 'LayerNorm', 'GELU', 'InnerProduct'].
On x86_64 Linux, ncnn works perfectly with fp16. However, on ARM, the final results show NaN. When I disable the option by setting net.opt.use_fp16_storage = false, it works fine, but there is additional latency.
As an alternative, I tested onnx-runtime on ARM, and it runs as fast as ncnn with fp16, providing correct results.
Does onnx-runtime optimize the model better, or could there be another reason? I'm aiming to make ncnn work with fp16 but am uncertain how to debug the issue. Any suggestions would be greatly appreciated. Do you have any insights?
Thank you for your assistance.
Hi, please provide the problematic model files (param and bin)
You can also extract the intermediate blobs and observe which operator caused the NaN result
针对onnx模型转换的各种问题,推荐使用最新的pnnx工具转换到ncnn In view of various problems in onnx model conversion, it is recommended to use the latest pnnx tool to convert your model to ncnn
pip install pnnx
pnnx model.onnx inputshape=[1,3,224,224]
详细参考文档 Detailed reference documentation https://github.com/pnnx/pnnx https://github.com/Tencent/ncnn/wiki/use-ncnn-with-pytorch-or-onnx#how-to-use-pnnx