How to do real-time inference?
Thanks for this awesome repo! I want to test Lipnet on real-time video, i.e. Given a stream of image frames from the video (at 25 fps using the provided model) and output results at real time. However, am not sure how to do it.
I thought about splitting the incoming frames into mutually exclusive chunks, and generating outputs from each chunk. However, this approach may suffer when chunks end in the middle of utterance. Any suggestion?
me too
me too
me too
Thanks for this awesome repo! I want to test Lipnet on real-time video, i.e. Given a stream of image frames from the video (at 25 fps using the provided model) and output results at real time. However, am not sure how to do it.
I thought about splitting the incoming frames into mutually exclusive chunks, and generating outputs from each chunk. However, this approach may suffer when chunks end in the middle of utterance. Any suggestion?
Did you achieve ?