onnxruntime [Performance] Python infer and C++ are different for audio process

Describe the issue

Hi, I want to run an onnx model in C++ environment， code in attach, performance like this Cpp_rlt python_rlt C++ result has more background noise, python result is OK. I use same version onnxruntime(1.14) for inference, but the performance different. I want to know, exactly the same version, why is there a difference between C++ and python? Is there a problem with my code setup? c_infer.zip

To reproduce

See attach.

Urgency

No response

Platform

Windows

OS Version

windows 10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

onnxruntime 1.14

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

No

Apr 08 '24 02:04 lzlwakeup

In C++, you run fftwf_execute at every iteration. In python, it is done only at the end. I suspect session_options.SetIntraOpNumThreads(1); is the cause.

Apr 08 '24 14:04 xadupre

Speech signal processing requires overlap to process data, and the current processing requires the combination of a new frame and the previous frame. librosa.stft in python implements this process in simulation at one time for a wav simulation file. real-time processing indeed requires fft processing in every cycle/iteration. As far as the input data is concerned, I have looked at the comparison of the first frame and fftwf is the same as librosa.stft. If the C++ onnxruntime and python were logically identical, and the input data were the same, the different results would not be explained.

Apr 09 '24 06:04 lzlwakeup

Due to my mistake, the previous version of ort is consistent, which is not accurate. It should be added that the python version uses the win11 + torchcuda, and the C++ environment is the win10 cpu. Of course a small difference is allowed in my opinion, but the difference in results is too big, so I don't know what the reason is. The fft data is consistent and has been confirmed by frame-by-frame printing.

Apr 10 '24 10:04 lzlwakeup

Are there any new solutions or debugging methods?

Apr 17 '24 06:04 lzlwakeup

You may try profiling https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html to see if some operator is behaving differently. When you write this, the win11 + torchcuda, and the C++ environment is the win10 cpu, it is not clear to me that both are running on CPU.

Apr 17 '24 08:04 xadupre

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

May 17 '24 15:05 github-actions[bot]