[firebase_ai]: None of the documentation for the live api work?? Are we in a simulation
Is there an existing issue for this?
- [x] I have searched the existing issues.
Which plugins are affected?
Other
Which platforms are affected?
No response
Description
The docs dont contain a working example ? most of the method names dont exist. theres no example of passing audio stream into the model? What the hell is going on here guys?
late LiveModelSession _session; //should be LiveSession
final _audioRecorder = YourAudioRecorder();
// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
final model = FirebaseAI.vertexAI().liveModel( //should be liveGenerationModel
model: 'gemini-2.0-flash-live-preview-04-09',
// Configure the model to respond with audio
config: LiveGenerationConfig(responseModalities: [ResponseModality.audio]), //should be ResponseModalities
);
_session = await model.connect();
final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream); //no method called startMediaStreamExists
// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
// Process the received message
///HOW??? HOW DO WE PROCESS THE MESSAGE??????
}```
What
### Reproducing the issue
doc links https://firebase.google.com/docs/ai-logic/live-api#audio-in-audio-out
### Firebase Core version
3.13.0
### Flutter Version
3.32
### Relevant Log Output
```shell
Flutter dependencies
Expand Flutter dependencies snippet
Replace this line with the contents of your `flutter pub deps -- --style=compact`.
Additional context and comments
No response
Hi @colbymaloy, could you link me the documentation you're referencing? In the meantime, you can have a look at this example which captures how you pass audio to a model.
Hi @SelaseKay , the example you shared is something different from what we want to achieve.
https://firebase.google.com/docs/ai-logic/live-api#audio-in-audio-out
we want to make realtime talk with Gemini, not recording audio and sending it as Content
example
yes here is the link to the page which references methods that dont exist: https://firebase.google.com/docs/ai-logic/live-api#audio-in-audio-out
@kevinthecheung Could we review and possibly update this documentation in light of the concerns raised above? It appears some of the referenced methods do not exist.
yep struggling a bit here as well without examples. its technically working for me but it seems to be just speaking gibberish back to me (switching languages, many back-to-back AI speech with no user speech, etc)
I was able to solve all the issues and ended up writing a blog post about it on Medium. You can take a look there if you’re interested.
thank you @ilham-asgarli !
i have read some that the firebase ai support is still not production-ready for ios for some reason (sandbox or noise cancellation or something)
do you know if your solution performs well on ios?
also curious to know if google ai bidirectional voice is expecting to be natively webrtc soon. that would make things much easier like openai realtime is fairly easy to setup right now.
@willsmanley Our solution simply takes the audio stream, sends it to the AI backend, and waits for the response—so there shouldn’t be any major platform-specific changes for iOS. We handle recording using a package that already includes noise cancellation and related features. That said, I haven’t personally tested it on iOS yet, so if you do, I’d appreciate any feedback you can share.
As for Google’s AI bidirectional voice and native WebRTC support, I haven’t seen any official updates. If Google moves to direct WebRTC integration, it would certainly simplify things, much like OpenAI’s current realtime solution. But honestly, I hope they prioritize proper multi-language support before anything else.
yep agreed with you there.
i used your code on ios and it seems the echo cancellation does not work. whenever the AI talks, it then hears itself, and continues to talk to itself in a loop.
back when openai realtime only had websockets enabled, i had solved this by using a webrtc relay, but it was a devops nightmare becuase it required a cluster of livekit relay workers, and it was not the most reliable solution.
i was hoping something else had been solved, but i think this echo cancellation might be the same reason that firebase library doesnt have it fully supported yet.
it would be nice for a firebase ai maintainer to chime in here
I experienced the same issue on Android—it’s not traditional echo, but rather the fact that recording continues while the AI is responding, so the AI ends up hearing itself. To address this, I added logic to stop recording when the AI starts responding, and that resolved it on Android.
I'm surprised it doesn't work the same way on iOS. I haven't had the chance to test on iOS yet; access to macOS might take a while for me. If you want to try tackling it yourself, check whether you can pause and resume the recorder at the right moments in your code:
await _audioRecorder.pause();
await _audioRecorder.resume();
that makes more sense now that im looking at the code more closely.
the problem is that the recorder is resumed exactly when another LiveServerContent with turnComplete == true. Unfortunately, this happens as soon as the entire response is received to device, even if the full thing hasn't been played.
As a result, the recorder always resumes too early, so the AI always hears the last 50% or 75% of its own speech.
I did make a slight improvement which uses better (but not perfect) logic to estimate when the utterance will finish: https://gist.github.com/willsmanley/7b942ae817ccb4a49ef27b33fec30c27
This seems to work better
Thank you for the detailed explanation and for sharing your improvement. I’ll review your approach, test it on my end, and update both the code and the blog once I’ve verified the results.
also i think we will want to update it to use the new gemini api instead of vertex: https://firebase.google.com/docs/ai-logic/migrate-to-latest-sdk https://firebase.google.com/docs/ai-logic/live-api
i'm working on that now
If you mean switching from firebase_vertexai to firebase_ai, I’ve already made that change—both the code and the blog use firebase_ai.
However, if you’re referring to using Google AI directly within that library, it’s not currently possible. According to the Firebase documentation, bidirectional live streaming is only supported when the Vertex AI Gemini API is set as the API provider.
Would you be interested in working together to try to get 2.5 pro native working in flutter? I see it working in their playground but porting to flutter looks like it will be possible with raw websockets if you generate an ephemeral token on the server.
I'm open to giving it a try. If you’d like to get in touch, my email is in my profile. We can connect on whichever platform works best for you.
After researching it more, I found out that google's official Gemini app actually uses a webrtc relay over Pipecat and Daily's infra. See: https://developers.googleblog.com/en/gemini-2-0-level-up-your-apps-with-real-time-multimodal-interactions/#:~:text=We%27ve%20also%20partnered%20with%20Daily%2C,guide%20to%20get%20started%20building
It seems that this will be the preferred solution since we will always be fighting echo cancellation. In fact with my code change above, it works pretty well, but we obviously have to sacrifice interruption capabilities.
I will be working on a webrtc integration between gemini 2.5 live <-> flutter, I will let you know how it goes. Seems there isn't a completed flutter library available for this yet, but some initial work (ios only) has been done here: https://pub.dev/packages/google_multimodal_assistant_real_time_with_daily
It would be good to hear from firebase developers as to whether the firebase API will support webrtc or just websockets. Kind of seems like a waste of time to bother supporting websockets at the client level, given the limitations.
By the way, example codes for firebase_ai have been updated. I don't know if the problem has been solved, but I'm planning to try it soon.
FWIW I switched to livekit SDK for flutter, which proxies everything through webrtc before relaying it to gemini over websockets. this ended up working much better with proper echo cancellations, interruptions, etc like in the official gemini app, etc. Only downside is that I have to host a separate worker service to act as the relay. But this is pretty much necessary until google adds direct webrtc support as openai did. Hope this helps. https://docs.livekit.io/agents/integrations/llm/gemini/
I concur the example that is shared simply does not work and is horrendously misleading. Please at the very minimum make a comment on the page it does not work. @ilham-asgarli Many thanks for the example. I gave it to my good friend Claude and asked nicely to use riverpod for state management - I have a working version very impressive. A really great help thanks. I will try and create a hello world and make it availabe
You're very welcome @chrisn-au. I'm glad it helped in some way, and great to hear you got it working with Riverpod.