Add ptime cmdline arg
By now, example code sends audio in 64k chunks every second. However, in a real time audio processing scenarios audio is read at different intervals, e.g. 20 ms in VoIP. As a user I would like to use code example to see/experiment with a speech to text feature working similarly as it is going to be integrated with my real time audio processing (particular sampling rate and ptime).
To provide additional context, I work on a text / speech processing in VoIP where packetization time interval is dictated by packetization time setting (ptime). Most often this is set to 20 ms, therefore audio is processed in 20 ms packets on an audio call. The example code speech/api/streaming_transcribe.cc on GoogleCloudPlatform sends audio in a fixed 1 second intervals. I need to know if speech to text code example will work when I send packets as they come in on my infrastructure with different ptime and packet size or if I need to implement buffering to send them exactly in 1 second 64k chunks as example does. It's understood that speech to text result is mostly driven by accuracy of underlying speech to text method/solution (model/AI) being applied to speech and ideally it is not impacted by audio packetization, but as a code integrator I need to verify my custom case and that would be great if code example let me to mirror as closely as possible audio processing in my environment.
This PR is adding a support for ptime command line argument, so user can experiment with real time audio at various settings. Now, when ptime is set on file in RAW or ULAW encoding, packets are sent in size and with time interval reflecting a ptime and sampling rate (I did not apply that to AMR, FLAC and AMR-WB as number of bytes to send using those codecs per ptime is impacted by additional settings [encoding mode in case of AMR/AMR-WB and compression ratio for FLAC])
% .build/streaming_transcribe --help
Standard C++ exception thrown: the option '--path' is required but missing
Usage:
streaming_transcribe [--bitrate N] [--ptime N] audio.(raw|ulaw|flac|amr|awb)
Example 1. Using ptime 20 ms:
% .build/streaming_transcribe --bitrate 16000 --ptime 20 resources/audio2.raw
Sending 640 bytes.
Sending 640 bytes.
Sending 640 bytes.
(...)
Sending 640 bytes.
Sending 640 bytes.
Sending 640 bytes.
Result stability: 0
0.986006 the rain in Spain stays mainly on the plain
Example 2. Using ptime 200 ms:
.build/streaming_transcribe --bitrate 16000 --ptime 200 resources/audio2.raw
Sending 6400 bytes.
Sending 6400 bytes.
Sending 6400 bytes.
(...)
Sending 6400 bytes.
Sending 6400 bytes.
Sending 6400 bytes.
Result stability: 0
0.986006 the rain in Spain stays mainly on the plain
/gcbrun
/gcbrun
/gcbrun