chainlit Microphone or wave-recorder button not work, no voice chat as described

Describe the bug I have been using Gradio to build chat apps. I recently found Chainlit, and was attracted immediately. The chat UI seems to be very professional and it’s simple to pick up. However, I noticed a major bug: its microphone or voice chat does not work! Mic is disabled by default. When I turned it on and clicked on the mic or wave-recorder button it did not respond at all. It did not bring up the "allow connecting to microphone?" message in the browser, nor did it connect to the mic. In fact, the process seems to be hanging there forever. (see the attached pics). I think it would be a major bug that a framework specialized in chat cannot talk or connect to microphone. This feature has been available for quite a while in frameworks like Gradio, which are less specialized in chat. I noticed that this problem was reported a while ago like in (https://github.com/Chainlit/chainlit/issues/626), but has not been addressed. I tried different solutions suggested by users, e.g. deploy over https instead of http etc. nothing worked so far. So please take it as a major bug as it is and address it. thank you!

Expected behavior -simple ways to turn mic on and off. -make mic or wave-recorder button work -support voice chat in general (both users and AI can talk)

Screenshots

Feb 12 '25 03:02 bigmw

Hi @bigmw,

Can you share what code do you currently have in your @cl.on_audio_chunk and @cl.on_audio_end function decorators? Voice chat is supported in Chainlit, but it gives you full control over how you want to implement the code to handle the audio chunks. This allows you to pass audio to OpenAI's Realtime API, pass it to a whisper model for transcription, etc.

They have an example for setting up Chainlit with realtime audio in their Cookbook, and it worked well for me with some small modifications to fit my use case. Are you able to try that code in your app?

Feb 13 '25 01:02 AidanShipperley

Aidan, Here is the code in my @cl.on_audio_chunk and @cl.on_audio_end function decorators. As you mentioned, I also checked the example for realtime audio in their Cookbook. In addition, I also checked the Quivr Chatbot Example. Let me know if you find any problem here. thank you!

import os
import speech_recognition as sr
from io import BytesIO
import chainlit as cl
from chainlit.element import Element


@cl.on_audio_chunk
async def on_audio_chunk(chunk: cl.InputAudioChunk):
    if chunk.isStart:
        buffer = BytesIO()
        # This is required for whisper to recognize the file type
        buffer.name = f"input_audio.{chunk.mimeType.split('/')[1]}"
        # Initialize the session for a new audio stream
        cl.user_session.set("audio_buffer", buffer)
        cl.user_session.set("audio_mime_type", chunk.mimeType)

    # Write the chunks to a buffer and transcribe the whole audio at the end
    cl.user_session.get("audio_buffer").write(chunk.data)


@cl.on_audio_end
async def on_audio_end(elements: list[Element]):
    # Get the audio buffer from the session
    audio_buffer: BytesIO = cl.user_session.get("audio_buffer")
    audio_buffer.seek(0)  # Move the file pointer to the beginning
    audio_file = audio_buffer.read()
    audio_mime_type: str = cl.user_session.get("audio_mime_type")

    input_audio_el = cl.Audio(
        mime=audio_mime_type, content=audio_file, name=audio_buffer.name
    )
    await cl.Message(
        author="You",
        type="user_message",
        content="",
        elements=[input_audio_el, *elements],
    ).send()

    recognizer = sr.Recognizer()
    with sr.AudioFile(audio_buffer.name) as source:
        audio_data = recognizer.record(source)
        try:
            text = recognizer.recognize_google(audio_data)
        except sr.UnknownValueError:
            await cl.Message(content="Sorry, I couldn't understand that audio.").send()
        except sr.RequestError:
            await cl.Message(content="Could not request results, please try again.").send()

    msg = cl.Message(author="You", content=text, elements=elements)
    await on_message(message=msg)

Feb 14 '25 17:02 bigmw

Hi @bigmw,

Your code looks a lot like the Quivr Chatbot Example you provided, but unfortunately this method of handling audio was changed in Chainlit v2.0.0 to add support for realtime conversations like OpenAI's realtime audio.

You can still get this code to work, but you would need to follow their migration guide from the prerelease. The reason you are seeing the permanent spinning icon is that you do not have a function decorated with @cl.on_audio_start, which is required to begin an audio conversation. You could then run your own voice activity detection (VAD) in the on_audio_chunk() function, and when the user stops talking you could run the code you currently have in on_audio_end() in the on_audio_chunk() function. Additionally, you would need to remove the elements input argument from on_audio_end() as that is no longer present post-v2.0.0.

Alternatively, you could use a Chainlit version pre-v2.0.0 and your code would potentially work.

Feb 14 '25 23:02 AidanShipperley

Hi Aidan @AidanShipperley, Thanks for the input and detailed suggestion. It make a lot of sense. However, I still saw the same problem after I updated my script accordingly.

I further followed the second Multi-Modality example in the Chainlit documentation, which includes both Text To Speech and Speech to Text, very similar to my app. note the example is released/updated recently after chainlit 2.0 release, and is consistent with what you suggested. And I still got the same problem.

For demo purpose, I simplified my app script and remove the LLM calling and SST/TTS parts as below. I did see the app prompt me for mic access now. But otherwise it was still the same, mic does not work and connecting try (spinning icon) lasts forever. In the demo script below, I inserted a few "await cl.Message().send()" lines for debugging purpose. This showed that on_audio_start() did worked, but on_audio_chunk() never. If fact it never stated as the first cl.Message().send() line in on_audio_chunk() never worked. Hope this give you guys some better idea on the bug. Note both on_audio_start() and on_audio_chunk() are copied from the official openai-whisper example, except process_audio() was not called for simple demo. The problem/bug can be replicated if you run the demo app. Let me know if you have further thoughts/suggestions. Thank you!

import io
import os
import wave
import numpy as np
import audioop
import chainlit as cl

# Define a threshold for detecting silence and a timeout for ending a turn
SILENCE_THRESHOLD = 3500 # Adjust based on your audio level (e.g., lower for quieter audio)
SILENCE_TIMEOUT = 1300.0 # Seconds of silence to consider the turn finished


@cl.on_chat_start
async def start_chat():
    msg0="Hello! How can I help you?"
    await cl.Message(content=msg0).send()


@cl.on_audio_start
async def on_audio_start():
    cl.user_session.set("silent_duration_ms", 0)
    cl.user_session.set("is_speaking", False)
    cl.user_session.set("audio_chunks", [])
    # await cl.Message(content="audio starts.").send()
    return True

@cl.on_audio_chunk
async def on_audio_chunk(chunk: cl.InputAudioChunk):
#     await cl.Message(content="On audio chunk now").send()
    audio_chunks = cl.user_session.get("audio_chunks")
    
    if audio_chunks is not None:
        await cl.Message(content="adding audio chunk..").send()
        audio_chunk = np.frombuffer(chunk.data, dtype=np.int16)
        audio_chunks.append(audio_chunk)
        cl.user_session.set("audio_chunks", audio_chunks)

    # If this is the first chunk, initialize timers and state
    if chunk.isStart:
        await cl.Message(content="first audio chunk..").send()
        cl.user_session.set("last_elapsed_time", chunk.elapsedTime)
        cl.user_session.set("is_speaking", True)
        return

    audio_chunks = cl.user_session.get("audio_chunks")
    last_elapsed_time = cl.user_session.get("last_elapsed_time")
    silent_duration_ms = cl.user_session.get("silent_duration_ms")
    is_speaking = cl.user_session.get("is_speaking")

    # Calculate the time difference between this chunk and the previous one
    time_diff_ms = chunk.elapsedTime - last_elapsed_time
    cl.user_session.set("last_elapsed_time", chunk.elapsedTime)

    # Compute the RMS (root mean square) energy of the audio chunk
    audio_energy = audioop.rms(chunk.data, 2)  # Assumes 16-bit audio (2 bytes per sample)

    if audio_energy < SILENCE_THRESHOLD:
        # Audio is considered silent
        silent_duration_ms += time_diff_ms
        cl.user_session.set("silent_duration_ms", silent_duration_ms)
        if silent_duration_ms >= SILENCE_TIMEOUT and is_speaking:
            cl.user_session.set("is_speaking", False)
            # await process_audio()
            await cl.Message(content="This is an audio response.").send()
    else:
        # Audio is not silent, reset silence timer and mark as speaking
        cl.user_session.set("silent_duration_ms", 0)
        if not is_speaking:
            cl.user_session.set("is_speaking", True)


# @cl.on_audio_end
# async def on_audio_end():
#     pass

@cl.on_message
async def on_message(message: cl.Message):
    await cl.Message(content="This is a reponse.").send()

Feb 16 '25 00:02 bigmw

I have not directly tested your code yet, first could you give me a few of these things so I can help narrow down where this is happening?

Could you share your .chainlit/config.toml file? Just to ensure that you've set your sample rate to 24000 and everything else is in order.
Can you share your OS, what browser you are using, and what the browser's version is?
- I happened to, just by pure chance, be testing my own audio code and I noticed that Chainlit's current implementation of the realtime assistant doesn't seem to work in FireFox as a custom sample rate is being set for one AudioContext (or another node supplying the microphone stream) while the microphone data comes at the device’s default rate. Edge and Chrome often handle this discrepancy automatically by resampling or allowing inter-context connections, but Firefox enforces stricter rules.
- Could you try your code with another browser just in case?
After you click the audio button, do any errors print out in either your terminal or in the web browser's developer console (right click -> inspect -> click on Console tab at the top)?
Instead of sending messages to the chat for debugging, which are quite slow to send (compared to how fast on_audio_chunk() will be called), can you try print statements instead and see which functions get called? I usually add print statements at the top of each function.
Was there a reason you commented out on_audio_end()? You may need all three functions defined for it to work, but this is just a guess.

I think with these we can narrow down where the issue is arising from.

Feb 21 '25 18:02 AidanShipperley

Hi @bigmw can you please update us? Did you solve your problem? How did you do it? Sharing your solution might help us all 😄

All the best.

Apr 08 '25 10:04 martrdev

I have the same problem but something is different a bit. I can run my chainlit app correctly in notebook pc, but I cannot run in my mobile phone. The AI app is https://github.com/monuminu/AOAI_Samples/tree/main/realtime-assistant-support. Wait for the solution. Thx.

May 09 '25 09:05 thomasgzh

Same problem here, I have followed the up-to-date documentation, and it still hangs. Even if I just put a pass in the methods.

https://docs.chainlit.io/api-reference/lifecycle-hooks/on-audio-chunk

Also, the cookbok for audio has been removed...

May 13 '25 08:05 dshahrokhian

bump!

May 29 '25 21:05 pindaroso

bump,too

May 30 '25 07:05 hopofb118

It does seem like this issue is related to the sample rate difference between the device and what chainlit expects, as sort of mentioned above by @AidanShipperley. At first glace, it seems like this is something chainlit needs to handle on the client side.

This only seems to be an issue in Firefox. I tested in Chrome (desktop and mobile) & Safari (desktop macos) and it works fine. On Firefox it fails on both desktop and mobile.

Here is the console output:

Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported. [index.mjs:217:4079](https://127.0.0.1:52087/libs/react-client/dist/index.mjs)
Uncaught (in promise) DOMException: AudioContext.createMediaStreamSource: Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported.
    begin index.mjs:217
    pe index.mjs:308
    emit index.mjs:136
    emitEvent socket.js:498
    onevent socket.js:485
    onpacket socket.js:455
    emit index.mjs:136
    ondecoded manager.js:204
    promise callback*Dae< websocket-constructor.browser.js:5
    ondecoded manager.js:203
    emit index.mjs:136
    add index.js:146
    ondata manager.js:190
    emit index.mjs:136
    onPacket socket.js:341
    emit index.mjs:136
    onPacket transport.js:98
    onData transport.js:90
    onmessage websocket.js:68
[index.mjs:10:3795](https://127.0.0.1:52087/libs/react-client/dist/index.mjs)

May 31 '25 21:05 marty-sullivan

@cl.on_audio_start async def on_audio_start(): cl.user_session.set("silent_duration_ms", 0) cl.user_session.set("is_speaking", False) cl.user_session.set("audio_chunks", []) # await cl.Message(content="audio starts.").send() return True

人家都说很清楚了，要添加on_audio_start方法，搞定

Jun 17 '25 13:06 benbenbaby09

Please add some more documentation on this functionality... 😭

I don't see any listing at all for on_audio_start() today in the API reference
The Advanced Features > Multi-Modality page just mentions on_audio_chunk being required, and doesn't really give any implementation guidance
The "migration guide" referenced above is buried in the 2.0rc0 release note (not even the actual 2.0 release)

It took me ages to figure out why my audio button was spinning forever, as I was starting from quite a heavily modified internal version of the samples that I guess must've been created for v1.

Jul 25 '25 05:07 athewsey

I have the same sample rate issue in Firefox (but works fine in Edge, Chrome, and Safari). A crude kludge that works for me is to inject the following JavaScript (specified in .chainlit/config.toml):

const RealContext = window.AudioContext || window.webkitAudioContext;
window.AudioContext = function (opts = {}) {
  delete opts.sampleRate;
  return new RealContext(opts);
};

Then, the microphone works fine in Firefox as well, but at a cost of loss of control over the sampling rate. Does anyone know how to send the sampling rate from the frontend back to the python Chainlit backend (or generally how to send any information from the browser frontend to the backend programmatically)?

Aug 02 '25 19:08 dilwong

This issue is stale because it has been open for 14 days with no activity.

Aug 17 '25 02:08 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

Aug 25 '25 02:08 github-actions[bot]

My case: Server and hosting on a local network Solved for me: I generated SSL Certificate Generator so a cert.pem file and key.pem file Inserted in .env file CHAINLIT_SSL_CERT=C:\Users.....\cert.pem CHAINLIT_SSL_KEY = C:\Users.....\key.pem And went to the address https:\server ip:serverport

Aug 28 '25 08:08 TherealMadao