mlx_whisper: add support for audio input from stdin
Problem
I wanted to pipe an audio file to mlx_whisper, but found it only accepted file paths. This PR will allow mlx_whisper to accept stdin and pass it to ffmpeg accordingly then allow the rest of the workflow to go on as usual.
Changes
-
load_audiohelper adjustsffmpegflags based on file path vs. stdin mode - CLI parser will gracefully omit the otherwise-required positional
audioarg if stdin is determined to be active - optionally,
--input-namearg is supported to help users name the otherwise anonymous stdin content (cannot guess from file path) - added tests in macOS standard
zshfile to drive and test the changes from the CLI
Process
- ran
blackandpre-commiton changes prior to PR -
python test.pyshows 4 errors, some regarding floating point comparisons. Looks very far away from my change, may be known issues.
Thanks for the addition. What do you think about a couple modifications:
- For piping from stdin use
-as inmlx_whisper -. That is what we do in MLX LM so it is more consistent. - The argument
--input-nameis confusing to me. I understand it now but I think it will in general be confusing. It might be more clear to allow an optional--output-namesargument with appropriate defaults (basename when available oroutputwhen not).
- For piping from stdin use
-as inmlx_whisper -. That is what we do in MLX LM so it is more consistent.
Done. I agree self consistency between related projects is worth more than aesthetic preferences. This does have the nice effect of eliminating test cases.
The only tradeoff is users who reflexively think they can pipe anything into any tool's bare name will have to read the docs.
- The argument
--input-nameis confusing to me. I understand it now but I think it will in general be confusing. It might be more clear to allow an optional--output-namesargument with appropriate defaults (basename when available oroutputwhen not).
I've come around to --output-name and have proposed a "template" solution that preserves existing behavior, but also leaves room for future improvements such as fancy rename strategies based on transcribed audio content, or allows for power users to produce diff variations of output names when they use the same audio_path but use diff parameters.