BotFramework-WebChat icon indicating copy to clipboard operation
BotFramework-WebChat copied to clipboard

Direct Line Speech adapters do not connect

Open MunozVictor opened this issue 11 months ago • 1 comments

Description

I'm having an issue when using the DirectLine Speech adapter in my Web Chat integration. I adapted a template that previously worked with the standard DirectLine (text) channel and configured the WebSocket connection on my App Service. While speech-to-text functionality appears to work, the bot integration doesn't seem to be functioning as expected. Specifically:

  • The client sends the webchat/join event (which my C# bot correctly receives and processes when use Directline Text).
  • The bot responds with a welcome message as a message activity.
  • However, the welcome message and any subsequent activities from the bot are not rendered in the Web Chat UI. When I create with DirectLine Speech

Code to Reproduce

<script>
    (async function () {
      async function fetchCredentials() {
        const res = await fetch("https://northeurope.api.cognitive.microsoft.com/sts/v1.0/issuetoken", {
          method: "POST",
          headers: {
            "Content-Type": "application/x-www-form-urlencoded",
            "Content-Length": "0",
            "Ocp-Apim-Subscription-Key": "SECRET"
          }
        });
        if (!res.ok) {
          throw new Error("Error al obtener el token de autorización.");
        }
        console.log("Token obtenido correctamente.");
        return { authorizationToken: await res.text(), region: "northeurope" };
      }
      const adapters = await window.WebChat.createDirectLineSpeechAdapters({
        fetchCredentials
      });
      const store = window.WebChat.createStore({}, ({ dispatch }) => next => action => {
        if (action.type === 'DIRECT_LINE/CONNECT_FULFILLED') {
          console.log("Conexión establecida con Direct Line Speech, enviando evento 'webchat/join'");
          dispatch({
            type: 'WEB_CHAT/SEND_EVENT',
            payload: {
              name: 'webchat/join',
              value: {
                language: "es-ES",
                mail: "[email protected]",
                client: "Cliente123",
                centro: "CentroX",
                ambito: "app",
                traces: "logs"
              }
            }
          });
        }
        return next(action);
      });

      window.WebChat.renderWebChat({
        ...adapters,
        store,
        webSpeechPonyfillFactory: window.WebChat.createBrowserWebSpeechPonyfillFactory()
      }, document.getElementById('webchat'));
      document.querySelector("#webchat > *").focus();
    })().catch(err => console.error("Error al inicializar Web Chat:", err));
  </script>

Expected Behavior

  • Once the connection is established and the webchat/join event is sent, the bot should respond with a welcome message.
  • The welcome message (and any subsequent bot activities) should render in the Web Chat UI.

Actual Behavior

  • Although the event is sent and received by the bot, no welcome message or other bot activities are displayed in the Web Chat interface when using the DirectLine Speech adapter.

Additional Context

  • This integration works as expected with the standard DirectLine channel.
  • I suspect the issue might be related to how the DirectLine Speech adapter handles or renders text-based activities, or any error creating the store.
  • Any suggestions or known workarounds for ensuring that message activities are properly rendered in DirectLine Speech would be appreciated.

Please let me know if you need any further information or logs. Thank you!

MunozVictor avatar Feb 25 '25 18:02 MunozVictor

Now I have managed to make it work, but not through Direct Line Speech:

  • If I set it to "hybrid" mode, the browser's speech-to-text (STT) works and cognitive service (TTS).
  • If I set it to "azure" mode, both STT and text-to-speech (TTS) work using the cognitive service.
  • In "browser" mode, it works using the browser's speech services.

Code that Works Without Direct Line Speech

This is my working implementation without Direct Line Speech, only using Web Speech and Azure Speech Services:

(async function () {
  try {
    // 🔵 1️⃣ IN BACKEND
    const directLineRes = await fetch('https://directline.botframework.com/v3/directline/tokens/generate', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ` + SECRET_DE_DIRECTLINE,
        'Content-Type': 'application/json'
      }
    });
    if (!directLineRes.ok) {
      throw new Error("❌ Error al generar el token de Direct Line.");
    }
    const { token } = await directLineRes.json();
    console.log("✅ Token de Direct Line generado:", token);


	//IN BACKEND
	async function fetchSpeechCredentials() {
	  const res = await fetch("https://northeurope.api.cognitive.microsoft.com/sts/v1.0/issuetoken", {
		method: "POST",
		headers: {
		  "Content-Type": "application/x-www-form-urlencoded",
		  "Content-Length": "0",
		  "Ocp-Apim-Subscription-Key": CLAVE_DE_SPEECH
		}
	  });

	  if (!res.ok) {
		throw new Error("❌ Error al obtener el token de Speech.");
	  }

	  const authorizationToken = await res.text();
	  console.log("✅ Token de Speech obtenido correctamente.");
	  console.log("✅ Token de Speech:"+authorizationToken);

	  return { authorizationToken, region: "northeurope" };
	}
		
		
	async function createPonyfillFactory({ credentials, mode = "hybrid" }) {

    const speechServicesPonyfillFactory = await window.WebChat.createCognitiveServicesSpeechServicesPonyfillFactory({
        credentials
    });

    const webSpeechPonyfillFactory = await window.WebChat.createBrowserWebSpeechPonyfillFactory();

    return options => {
        // Obtener los ponyfills de cada servicio
        const speechServicesPonyfill = speechServicesPonyfillFactory(options);
        const webSpeechPonyfill = webSpeechPonyfillFactory(options);

        if (mode === "azure") {
            console.log("🟢 Modo Azure: Usando Cognitive Services Speech");
            return {
                SpeechGrammarList: speechServicesPonyfill.SpeechGrammarList,
                SpeechRecognition: speechServicesPonyfill.SpeechRecognition,
                speechSynthesis: speechServicesPonyfill.speechSynthesis, //HABILITAR LECTURA DE MENSAJES
              //speechSynthesis: null, //DESHABILITAR LECTURA DE MENSAJES
                SpeechSynthesisUtterance: speechServicesPonyfill.SpeechSynthesisUtterance
            };
        } else if (mode === "browser") {
            console.log("🟠 Modo Navegador: Usando Web Speech API");
            return {
                SpeechGrammarList: webSpeechPonyfill.SpeechGrammarList,
                SpeechRecognition: webSpeechPonyfill.SpeechRecognition,
                speechSynthesis: webSpeechPonyfill.speechSynthesis,
                SpeechSynthesisUtterance: webSpeechPonyfill.SpeechSynthesisUtterance
            };
        } else {
            console.log("🔵 Modo Híbrido: Speech de Azure + Síntesis del navegador");
            return {
                SpeechGrammarList: webSpeechPonyfill.SpeechGrammarList,
                SpeechRecognition: webSpeechPonyfill.SpeechRecognition,
                speechSynthesis: speechServicesPonyfill.speechSynthesis, // Usa síntesis del navegador
                SpeechSynthesisUtterance: speechServicesPonyfill.SpeechSynthesisUtterance
            };
        }
    };
	}

    const webSpeechPonyfillFactory = await window.WebChat.createCognitiveServicesSpeechServicesPonyfillFactory({
      credentials: await fetchSpeechCredentials
    });

    const directLine = window.WebChat.createDirectLine({ token });

    const store = window.WebChat.createStore({}, ({ dispatch }) => next => action => {
      console.log("➡ Acción recibida:", action.type);

      if (action.type === 'DIRECT_LINE/CONNECT_FULFILLED') {
        dispatch({
          type: 'WEB_CHAT/SEND_EVENT',
          payload: {
            name: 'webchat/join',
            value: {
              language: "es-ES",
              mail: "[email protected]",
              client: "web",
              centro: "ss",
              ambito: "logado",
              traces: "no"
            }
          }
        });
      }
	  if (action.type === 'DIRECT_LINE/INCOMING_ACTIVITY') {	
			const activity = action.payload.activity;
			// Verifica si el mensaje es del bot y tiene texto para sintetizar
			if (activity.from.role === 'bot' && activity.type === 'message') {
			  console.log("📢 Mensaje del bot recibido:", activity.text);
			}
		}
		return next(action);
	});


    window.WebChat.renderWebChat({
      directLine,
      //webSpeechPonyfillFactory,  // Usa Cognitive Services Speech para voz
	  webSpeechPonyfillFactory: await createPonyfillFactory({ credentials: await fetchSpeechCredentials , mode:"azure"}),
      store
    }, document.getElementById('webchat'));

    console.log("✅ Web Chat renderizado correctamente.");
    document.querySelector('#webchat > *').focus();
  } catch (error) {
    console.error("❌ Error en la inicialización:", error);
  }
})();

Issue with Direct Line Speech

This behavior might seem correct at first glance, but in reality, you can run this configuration without enabling Direct Line Speech in the Azure Channel, and it will still work in "azure" mode. In other words, Direct Line Speech is doing nothing.

I found this documentation that seems to properly configure Direct Line Speech: [Direct Line Speech Setup](https://github.com/microsoft/BotFramework-WebChat/blob/main/docs/DIRECT_LINE_SPEECH.md#render-web-chat-using-direct-line-speech-adapters)

However, it does not explicitly mention how to retrieve the credentials. From my understanding, we should generate a token for Cognitive Services Speech just like in my previous implementation and then use that token to establish the Direct Line Speech connection.

Following the example from the official documentation (using Direct Line Speech adapters), I cannot get it to work.


** Attempt to Use Direct Line Speech (Not Working)

Following the documentation's example, I tried implementing Direct Line Speech Adapters like this:

	async function fetchCredentials() {
	  const res = await fetch("https://northeurope.api.cognitive.microsoft.com/sts/v1.0/issuetoken", {
		method: "POST",
		headers: {
		  "Content-Type": "application/x-www-form-urlencoded",
		  "Content-Length": "0",
		  "Ocp-Apim-Subscription-Key": CLAVE_DE_SPEECH
		}
	  });

	  if (!res.ok) {
		throw new Error("❌ Error al obtener el token de Speech.");
	  }

	  const authorizationToken = await res.text();

	  return { authorizationToken, region: "northeurope" };
	}
 

	const adapters = await window.WebChat.createDirectLineSpeechAdapters({
	  fetchCredentials
	});

	}

    // Crear la store para manejar eventos y enviar 'webchat/join'
    const store = window.WebChat.createStore({}, ({ dispatch }) => next => action => {
      console.log("➡ Acción recibida:", action.type);
  
      if (action.type === 'DIRECT_LINE/CONNECT_FULFILLED') {
        console.log("✅ Conectado a Direct Line. Enviando evento 'webchat/join'...");
        dispatch({
          type: 'WEB_CHAT/SEND_EVENT',
          payload: {
            name: 'webchat/join',
            value: {
              language: "es-ES",
              mail: "[email protected]",
              client: "web",
              centro: "ss",
              ambito: "logado",
              traces: "no"
            }
          }
        });
      }
  
      if (action.type === 'DIRECT_LINE/INCOMING_ACTIVITY') {
        const activity = action.payload.activity;
        if (activity.from.role === 'bot' && activity.type === 'message') {
          console.log("📢 Mensaje del bot recibido:", activity.text);
        }
      }
      return next(action);
    });
  
    window.WebChat.renderWebChat(
      {
        ...adapters,  
        store
      },
      document.getElementById('webchat')
    );
  
    console.log("✅ Web Chat renderizado correctamente.");
    document.querySelector('#webchat > *').focus();
  } catch (error) {
    console.error("❌ Error en la inicialización:", error);
  }
})();

Configuration Checked

I have ensured that:

  • WebSockets are enabled in the App Service and the bot's code.

Image

  • Streaming is enabled in the Direct Line Speech channel.

Image

  • The Cognitive Service matches the one used in the working implementation.
  • WebShockets Code configuration:

Image

  • Controller config:

Image

Despite this, the Direct Line Speech adapters do not work.

Error in browser:

Image

Does anyone know what I might be missing? Do I need a different token for Direct Line Speech? Should I be making a different API call?

Any help is appreciated!

MunozVictor avatar Feb 28 '25 13:02 MunozVictor