Pythonwrapper sometimes crashes with arm_fir_decimate_f32
Hello
The code below mostly runs fine when running it as a script. But if running it in interactive mode or trying to debug it, it crashes without any clear error at the arm_fir_decimate_f32 call.
Edit: This is on Windows
Maybe it's a segmentation fault because of some error in reference counting in the C code, and hence, too early garbage collection? (I'm not an expert on the python <-> C interface)
Code:
# %%
import cmsisdsp as dsp
import matplotlib.pyplot as plt
from scipy import signal
import numpy as np
import sys
# %%
test_length_seconds = 0.1
frequency = 87
us_length_buffers = int(np.ceil(test_length_seconds * (12000/32)))
us_length_samples = int(us_length_buffers * 32)
ds_length_buffers = int(np.ceil(test_length_seconds * (16000/32)))
ds_length_samples = int(ds_length_buffers * 32)
ds_source = np.sin(2*np.pi*frequency*np.linspace(0,test_length_seconds,ds_length_samples))
us_source = np.sin(2*np.pi*frequency*np.linspace(0,test_length_seconds,us_length_samples))
# %% Downsampling
ds_zero_stuffed = np.zeros(ds_source.size*3)
ds_zero_stuffed[::3] = ds_source
# %%
ds_filter = signal.firwin(9,0.25)
ds_state = np.zeros(12)
decimator = dsp.arm_fir_decimate_instance_f32()
dsp.arm_fir_decimate_init_f32(decimator, 9,4, ds_filter, ds_state)
output = dsp.arm_fir_decimate_f32(decimator,ds_zero_stuffed)
output = output[:ds_zero_stuffed.size//4]*3
print(output)
# %%
@Chreutz Are you running on Linux ? Mac ? I have another function which is crashing only on Linux/Mac but working on Windows. It may be the same root cause (I have not yet had time to analyze the problem).
I'll try to reproduce your issue.
I saw the one with Mac and Linux and considered just commenting there, since it could sound like the same root cause. But this is Windows. So the plot thickens, I suppose...
@Chreutz I have finally found some time to look at the code.
The error is in:
output = dsp.arm_fir_decimate_f32(decimator,ds_zero_stuffed)
You are using the full input buffer ds_zero_stuffed and your state is defined with:
ds_state = np.zeros(12)
The CMSIS-DSP API is working by blocks. Here you should probably use a small block and iterate the calls to the decimate function. For instance, you could use a block of 160 samples and then call the decimate function several times to process the full ds_zero_stuffed by block of 160 samples.
The state should have a length coherent with the block size you use as detailed here:
https://arm-software.github.io/CMSIS-DSP/latest/group__FIR__decimate.html#ga67c80582fc296552fef2bd1f208d853b
So, for instance, the Python code could look like:
block_size = 160
ds_state = np.zeros(block_size + len(ds_filter)-1)
Then to process the signal by blocks:
output = []
for blockNb in range(len(ds_zero_stuffed) // block_size):
s = blockNb * block_size
e = s + block_size
output.append(dsp.arm_fir_decimate_f32(decimator,ds_zero_stuffed[s:e]))
output = np.hstack(output)
or any other cleaner way to process by block.
Using very small block is bad for performance reason. Using big block (like the full signal) is bad for memory reason. So, you need to select something in between like 160 in this example.
The crash was due to the fact that your state was too small. The C code called by Python was writing outside the state (because the block passed to the function was the full signal). Then, depending on your memory layout it may crash or work.
I hope it helps.
I have not checked the output of the Python is correct. I have just checked it is no more crashing.
Thank you! That was very helpful!
A comment: Is there a way to add docstrings to the python wrapper? This would likely have been avoided with a little bit of guidance.
@Chreutz Other comment : arm_decimate_f32 is doing the decimation. So no need of
ds_zero_stuffed = np.zeros(ds_source.size*3)
ds_zero_stuffed[::3] = ds_source
I was plotting the output and it does not look right so the Python code with my changes is still not ok.
So perhaps it is the reason. You work directly with the signal to downsample and the function will both do the filtering and the downsampling.
The Python wrapper is a layer on top of CMSIS-DSP and with an API very close to the C one.
I have no easy way to copy all of the C documentation to Python docstrings.
I should perhaps make it clearer that the C documentation is in some way the reference for the Python.
@Chreutz There may be a remaining issue with this arm_fir_decimate_f32 function. I am getting wrong results. So, I continue the investigation ...
Thank you. And I'm trying to downsample from what is in essence 16 kHz to 12 kHz, so that's why I'm upsampling by 3 and downsampling by 4. Is this not correct?
And alright about the documentation :-)
It is correct. I was not seeing arm_fir_interpolate_f32 that's why I was confused.
@Chreutz So the remaining problem with arm_fir_decimate_f32 in the Python wrapper is that it is returning a too big buffer. It is using the size of the original block and not the decimated size. For my example, it is returning 160 samples instead of 40. The 40 first samples are correct.
So, I suggest you just keep the meaningful samples from the output. If I reuse the example I had written above, it should now be:
output = []
for blockNb in range(len(ds_zero_stuffed) // block_size):
s = blockNb * block_size
e = s + block_size
r=dsp.arm_fir_decimate_f32(decimator,ds_zero_stuffed[s:e])
output.append(r[0:40])
output = np.hstack(output)
The change is the new r[0:40].
I'll correct this in a future update of the Python wrapper.
Yeah, I already noticed this earlier. Sorry that I didn't mention it. There are afaik several of the python wrapper functions that do this. I have previously used the RFFT, and that also had some problems with the returned array size. I believe it returned an array equivalent to the whole spectrum as if it was a "full" (complex) FFT, but there was only meaningful data in the first half.
Yeah, so this issue above is a huge problem in the interpolate function. It only allocates blockSize for the output, when it needs blockSize * L . So it doesn't work at all.
@Chreutz All the problems should be solved in the latest commit (including the RFFT). I won't regenerate the Python package on PyPI now because I have other fixes I want to include in the next update.
But you can probably rebuild it.
py -m build --sdist from the CMSIS-DSP folder should generate a source distribution that you can then install and compile with pip.
The distribution will be generated in a folder dist.
When the package is finally pushed to PyPI, you'll need to remove this one first because the version number will be the same (1.9.3).