[Feature] Support Short-Time Fourier Transforms
Short-Time Fourier Transforms (STFT) and its inverse are used extensively in signal processing.
Function definition will match scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.html https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.istft.html
You can assign this to me.
Cool.. if we include this, I wonder where it would go? I'm not opposed to including it in core or a sub-package but I think there is a decision to be made there.
I see jax implemented it in a scipy.signal subpackage https://jax.readthedocs.io/en/latest/_autosummary/jax.scipy.signal.stft.html, and PyTorch implemented it at the top level https://pytorch.org/docs/stable/generated/torch.stft.html
Also our Whisper implementation already has an STFT in MLX you might find useful: https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/audio.py#L104-L127
I think adding them in core is best - they’re fundamental algorithms and used frequently. A separate signal sub-package makes sense if you anticipate adding more functions from scipy.signal in he near future. (https://jax.readthedocs.io/en/latest/jax.scipy.html#module-jax.scipy.signal)
Thanks for pointing me to the Whisper implementation, I haven't seen in yet
One tiny issue about stft in mlx-example/whisper, the noverlap parameter does not actually represent the overlap of frames; instead, it is semantically equivalent to hop_length in torch.stft.
I found it a little bit misleading while implementing istft.
Is it not the same as noverlap in the Scipy implementation? https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.html
I think that's a good reference to follow.
Still working on this CL, shouldn't be much longer.
t = (x.size - nperseg + noverlap) // noverlap does look wrong though, maybe that's where the confusion is. The denominator should be the number of steps.
Is it not the same as
noverlapin the Scipy implementation? https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.stft.htmlI think that's a good reference to follow.
No.
According to this hop_length = nperseg - noverlap
In stft in mlx-example/whisper noverlap is being used as hop_length. It appears to be a naming mistake, the logic looks good to me.
Still working on this CL, shouldn't be much longer.
Cool.
I was implementing a istft specific to my problem, not a general one.