speech-recognition-polyfill Mic doesn't turn off if applyPolyfill is set

Describe the bug

I'm trying to set up polyfill using react-speech-recognition and this library. My mic's behavior ok without using the polyfill.

To Reproduce

My code is almost the same as in the example. I have only changed the continuous property.

const appId = process.env.REACT_APP_SPEECHLY_ID;
const SpeechlySpeechRecognition = createSpeechlySpeechRecognition(appId);
SpeechRecognition.applyPolyfill(SpeechlySpeechRecognition);

const SomeComponent = () => {

  const {
    transcript,
    resetTranscript,
    listening,
    browserSupportsSpeechRecognition,
  } = useSpeechRecognition();

  useEffect(() => {
    if (!listening) {
      SpeechRecognition.stopListening();
      setTitleState(prevState);
    }
 
  }, [listening]);

  useEffect(() => {
    if (transcript) {
      setValue('do something here');
    }
  }, [transcript]);

  const handleOnClikOrTapStart = () => {
    setPrevState(titleState);
    if (browserSupportsSpeechRecognition) {
      resetTranscript();
      setTitleState(STATE_VOICE);
      SpeechRecognition.startListening({
        language: "en-US",
      });
    }
  };
return (
   <Button
            className={classnames("voice-btn", {
              active: listening,
            })}
            btnType="icon"
            onMouseDown={handleOnClikOrTapStart}
            onTouchStart={handleOnClikOrTapStart}
            disabled={!browserSupportsSpeechRecognition}
          >
            <MicIcon />
    </Button>
)
}

Expected behavior

I need to have my mic turned off. But it is working all the time after clicking the button.

Environment

Platform: [Desktop]
OS: [macOS (latest)]
Browser [chrome, safari]
Version [latest]
Package version [latest]

Sep 25 '22 12:09 newdisease

Hi @newdisease thanks for raising this issue!

You've encountered an interesting difference between the browser's native speech recognition and the Speechly polyfill. When SpeechRecognition.stop is called in the native version, the browser will stop transcribing and turn off the microphone. In the polyfill, voice data will stop being streamed to the Speechly API and transcription will end (i.e. the listening state will become false) but the microphone will remain active. However, the functionality should be identical between the two as no microphone data is being processed while transcription is turned off - the only difference is that the user will see the recording icon on their browser tab. Keeping the microphone on probably has a slight efficiency advantage in the case where transcription is needed again later. That said, I could understand if this affected user trust if they asked for the microphone to be turned off but could still see the recording icon.

Besides the microphone staying on with the polyfill, have you seen any unexpected behaviour differences that affect how your web app works?

Oct 08 '22 09:10 JamesBrill

Hi. Thanks for your answer!

After recording, the microphone does not work. However, the recording icon still appears, confusing my users.

Is it possible somehow to fix that?

Oct 09 '22 08:10 newdisease

@newdisease I totally understand, and have raised an issue on the Speechly Browser Client library for you to follow. If we're able to work out a suitable change to the Browser Client, I'll update it in the polyfill.

In the meantime, the best I can suggest is to make it as clear as possible in your UI when no voice data is being processed (i.e. when the web app is not "listening"). In most cases, users will pay more attention to the UI than the recording icon in the browser tab, so they shouldn't notice the discrepancy too often. But I do understand your concern about any confusion or trust/privacy issues that could cause.

Oct 09 '22 09:10 JamesBrill

Hi @JamesBrill , I investigated a solution that allows closing the microphone after each utterance without @speechly/browser-client API changes. If you find this approach working work we could work towards creating a PR out of this (currently many unit tests fail as internal initialize function was retired).

Here's a summary of changes:

start() always waits for microphone to be initialized and reattached. This works well on modern browsers but at least old iOS 12 suffers from considerable lag.
stop() always detaches and closes the mic.
there is a new listeningPromise that orchestrates start and stop calls into a queue so they work neatly in stress testing. Also transcribing state change is immediate, used to ensure that duplicate start/stop calls are not made.

And here's the diff for git apply createSpeechRecognition.ts.patch:

diff --git a/src/createSpeechRecognition.ts b/src/createSpeechRecognition.ts
index 4828eec..2e0258d 100644
--- a/src/createSpeechRecognition.ts
+++ b/src/createSpeechRecognition.ts
@@ -32,11 +32,11 @@ export const createSpeechlySpeechRecognition = (appId: string): SpeechRecognitio
 
   return class SpeechlySpeechRecognition implements SpeechRecognition {
     static readonly hasBrowserSupport: boolean = browserSupportsAudioApis
-
     private readonly client: BrowserClient
-    private clientInitialised = false
+    private readonly microphone: BrowserMicrophone
     private aborted = false
     private transcribing = false
+    private listenPromise: Promise<void> | null = null
 
     continuous = false
     interimResults = false
@@ -46,15 +46,14 @@ export const createSpeechlySpeechRecognition = (appId: string): SpeechRecognitio
 
     constructor() {
       this.client = new BrowserClient({ appId })
+      this.microphone = new BrowserMicrophone()
       this.client.onSegmentChange(this.handleResult)
     }
 
     public start = async (): Promise<void> => {
       try {
         this.aborted = false
-        await this.initialise()
-        await this.client.start()
-        this.transcribing = true
+        await this._start()
       } catch (e) {
         if (e === ErrNoAudioConsent) {
           this.onerror(MicrophoneNotAllowedError)
@@ -73,31 +72,49 @@ export const createSpeechlySpeechRecognition = (appId: string): SpeechRecognitio
       await this._stop()
     }
 
-    private readonly initialise = async (): Promise<void> => {
-      if (!this.clientInitialised) {
-        const microphone = new BrowserMicrophone()
-        await microphone.initialize()
-        const { mediaStream } = microphone
+    private readonly _start = async (): Promise<void> => {
+      if (this.transcribing) {
+        return
+      }
+
+      this.transcribing = true
+
+      this.listenPromise = (async () => {
+        // Wait for earlier task(s) to complete, effectively adding to a task queue
+        await this.listenPromise
+        await this.microphone.initialize()
+        const { mediaStream } = this.microphone
         if (mediaStream === null || mediaStream === undefined) {
           throw ErrDeviceNotSupported
         }
         await this.client.attach(mediaStream)
-        this.clientInitialised = true
-      }
+        await this.client.start()
+      })()
+
+      await this.listenPromise
     }
 
     private readonly _stop = async (): Promise<void> => {
       if (!this.transcribing) {
         return
       }
-      await this.initialise()
-      try {
-        await this.client.stop()
-        this.transcribing = false
-        this.onend()
-      } catch (e) {
-        // swallow errors
-      }
+
+      this.transcribing = false
+
+      this.listenPromise = (async () => {
+        // Wait for earlier task(s) to complete, effectively adding to a task queue
+        await this.listenPromise
+        try {
+          await this.client.stop()
+          await this.client.detach();
+          await this.microphone.close();
+          this.onend()
+        } catch (e) {
+          // swallow errors
+        }
+      })()
+
+      await this.listenPromise
     }
 
     private readonly handleResult = (segment: Segment): void => {

Oct 24 '22 08:10 arzga

Hi! any updates on this? I'm having the same issue with my app, Although I’m not using the mic anymore, the "mic in use" of the phone is still active, and users think they are still being heard. It’s a big "security" problem for me.

Nov 08 '22 12:11 JesusADS

@JesusADS Don't worry, I haven't forgotten about this issue - I've just lacked free time for open source recently. I've applied @arzga 's patch in a Pull Request to get the ball rolling. 😃

Nov 08 '22 13:11 JamesBrill

I would appreciate it very much if this PR could be applied.

Dec 01 '22 09:12 kla-ko

Thanks

Dec 01 '22 10:12 newdisease

The PR that fixes this is now ready for review - apologies for the delay. https://github.com/speechly/speech-recognition-polyfill/pull/31 @arzga @bigdatabaracus

Dec 01 '22 12:12 JamesBrill

@newdisease @JesusADS @kla-ko A fix for this has been released in version 1.3.0 - let us know if this resolves the issue for you.

Dec 04 '22 11:12 JamesBrill

Works on Chrome/Desktop, Opera/Desktop, Safari/Mobile, Samsung/Mobile

Thanks

Dec 04 '22 15:12 kla-ko