flutterfire icon indicating copy to clipboard operation
flutterfire copied to clipboard

[firebase_ai]: None of the documentation for the live api work?? Are we in a simulation

Open colbymaloy opened this issue 7 months ago • 19 comments

Is there an existing issue for this?

  • [x] I have searched the existing issues.

Which plugins are affected?

Other

Which platforms are affected?

No response

Description

The docs dont contain a working example ? most of the method names dont exist. theres no example of passing audio stream into the model? What the hell is going on here guys?

late LiveModelSession _session; //should be LiveSession
final _audioRecorder = YourAudioRecorder();



// Initialize the Vertex AI Gemini API backend service
// Create a `LiveModel` instance with the model that supports the Live API
final model = FirebaseAI.vertexAI().liveModel( //should be liveGenerationModel
  model: 'gemini-2.0-flash-live-preview-04-09',
   // Configure the model to respond with audio
   config: LiveGenerationConfig(responseModalities: [ResponseModality.audio]), //should be ResponseModalities
);

_session = await model.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream); //no method called startMediaStreamExists

// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
   // Process the received message 

///HOW??? HOW DO WE PROCESS THE MESSAGE??????
}```

What 

### Reproducing the issue

doc links https://firebase.google.com/docs/ai-logic/live-api#audio-in-audio-out

### Firebase Core version

3.13.0

### Flutter Version

3.32

### Relevant Log Output

```shell

Flutter dependencies

Expand Flutter dependencies snippet

Replace this line with the contents of your `flutter pub deps -- --style=compact`.

Additional context and comments

No response

colbymaloy avatar Jun 06 '25 08:06 colbymaloy

Hi @colbymaloy, could you link me the documentation you're referencing? In the meantime, you can have a look at this example which captures how you pass audio to a model.

SelaseKay avatar Jun 06 '25 10:06 SelaseKay

Hi @SelaseKay , the example you shared is something different from what we want to achieve.

https://firebase.google.com/docs/ai-logic/live-api#audio-in-audio-out

we want to make realtime talk with Gemini, not recording audio and sending it as Content

AliKales avatar Jun 06 '25 15:06 AliKales

example

yes here is the link to the page which references methods that dont exist: https://firebase.google.com/docs/ai-logic/live-api#audio-in-audio-out

colbymaloy avatar Jun 07 '25 06:06 colbymaloy

@kevinthecheung Could we review and possibly update this documentation in light of the concerns raised above? It appears some of the referenced methods do not exist.

SelaseKay avatar Jun 09 '25 11:06 SelaseKay

yep struggling a bit here as well without examples. its technically working for me but it seems to be just speaking gibberish back to me (switching languages, many back-to-back AI speech with no user speech, etc)

willsmanley avatar Jun 11 '25 03:06 willsmanley

I was able to solve all the issues and ended up writing a blog post about it on Medium. You can take a look there if you’re interested.

ilham-asgarli avatar Jun 11 '25 18:06 ilham-asgarli

thank you @ilham-asgarli !

i have read some that the firebase ai support is still not production-ready for ios for some reason (sandbox or noise cancellation or something)

do you know if your solution performs well on ios?

also curious to know if google ai bidirectional voice is expecting to be natively webrtc soon. that would make things much easier like openai realtime is fairly easy to setup right now.

willsmanley avatar Jun 11 '25 18:06 willsmanley

@willsmanley Our solution simply takes the audio stream, sends it to the AI backend, and waits for the response—so there shouldn’t be any major platform-specific changes for iOS. We handle recording using a package that already includes noise cancellation and related features. That said, I haven’t personally tested it on iOS yet, so if you do, I’d appreciate any feedback you can share.

As for Google’s AI bidirectional voice and native WebRTC support, I haven’t seen any official updates. If Google moves to direct WebRTC integration, it would certainly simplify things, much like OpenAI’s current realtime solution. But honestly, I hope they prioritize proper multi-language support before anything else.

ilham-asgarli avatar Jun 11 '25 19:06 ilham-asgarli

yep agreed with you there.

i used your code on ios and it seems the echo cancellation does not work. whenever the AI talks, it then hears itself, and continues to talk to itself in a loop.

back when openai realtime only had websockets enabled, i had solved this by using a webrtc relay, but it was a devops nightmare becuase it required a cluster of livekit relay workers, and it was not the most reliable solution.

i was hoping something else had been solved, but i think this echo cancellation might be the same reason that firebase library doesnt have it fully supported yet.

it would be nice for a firebase ai maintainer to chime in here

willsmanley avatar Jun 11 '25 19:06 willsmanley

I experienced the same issue on Android—it’s not traditional echo, but rather the fact that recording continues while the AI is responding, so the AI ends up hearing itself. To address this, I added logic to stop recording when the AI starts responding, and that resolved it on Android.

I'm surprised it doesn't work the same way on iOS. I haven't had the chance to test on iOS yet; access to macOS might take a while for me. If you want to try tackling it yourself, check whether you can pause and resume the recorder at the right moments in your code:

await _audioRecorder.pause(); await _audioRecorder.resume();

ilham-asgarli avatar Jun 11 '25 20:06 ilham-asgarli

that makes more sense now that im looking at the code more closely.

the problem is that the recorder is resumed exactly when another LiveServerContent with turnComplete == true. Unfortunately, this happens as soon as the entire response is received to device, even if the full thing hasn't been played.

As a result, the recorder always resumes too early, so the AI always hears the last 50% or 75% of its own speech.

I did make a slight improvement which uses better (but not perfect) logic to estimate when the utterance will finish: https://gist.github.com/willsmanley/7b942ae817ccb4a49ef27b33fec30c27

This seems to work better

willsmanley avatar Jun 11 '25 20:06 willsmanley

Thank you for the detailed explanation and for sharing your improvement. I’ll review your approach, test it on my end, and update both the code and the blog once I’ve verified the results.

ilham-asgarli avatar Jun 11 '25 20:06 ilham-asgarli

also i think we will want to update it to use the new gemini api instead of vertex: https://firebase.google.com/docs/ai-logic/migrate-to-latest-sdk https://firebase.google.com/docs/ai-logic/live-api

i'm working on that now

willsmanley avatar Jun 11 '25 21:06 willsmanley

If you mean switching from firebase_vertexai to firebase_ai, I’ve already made that change—both the code and the blog use firebase_ai.

However, if you’re referring to using Google AI directly within that library, it’s not currently possible. According to the Firebase documentation, bidirectional live streaming is only supported when the Vertex AI Gemini API is set as the API provider.

ilham-asgarli avatar Jun 11 '25 21:06 ilham-asgarli

Would you be interested in working together to try to get 2.5 pro native working in flutter? I see it working in their playground but porting to flutter looks like it will be possible with raw websockets if you generate an ephemeral token on the server.

willsmanley avatar Jun 11 '25 22:06 willsmanley

I'm open to giving it a try. If you’d like to get in touch, my email is in my profile. We can connect on whichever platform works best for you.

ilham-asgarli avatar Jun 12 '25 01:06 ilham-asgarli

After researching it more, I found out that google's official Gemini app actually uses a webrtc relay over Pipecat and Daily's infra. See: https://developers.googleblog.com/en/gemini-2-0-level-up-your-apps-with-real-time-multimodal-interactions/#:~:text=We%27ve%20also%20partnered%20with%20Daily%2C,guide%20to%20get%20started%20building

It seems that this will be the preferred solution since we will always be fighting echo cancellation. In fact with my code change above, it works pretty well, but we obviously have to sacrifice interruption capabilities.

I will be working on a webrtc integration between gemini 2.5 live <-> flutter, I will let you know how it goes. Seems there isn't a completed flutter library available for this yet, but some initial work (ios only) has been done here: https://pub.dev/packages/google_multimodal_assistant_real_time_with_daily

It would be good to hear from firebase developers as to whether the firebase API will support webrtc or just websockets. Kind of seems like a waste of time to bother supporting websockets at the client level, given the limitations.

willsmanley avatar Jun 12 '25 17:06 willsmanley

By the way, example codes for firebase_ai have been updated. I don't know if the problem has been solved, but I'm planning to try it soon.

ilham-asgarli avatar Jun 12 '25 21:06 ilham-asgarli

FWIW I switched to livekit SDK for flutter, which proxies everything through webrtc before relaying it to gemini over websockets. this ended up working much better with proper echo cancellations, interruptions, etc like in the official gemini app, etc. Only downside is that I have to host a separate worker service to act as the relay. But this is pretty much necessary until google adds direct webrtc support as openai did. Hope this helps. https://docs.livekit.io/agents/integrations/llm/gemini/

willsmanley avatar Jun 13 '25 13:06 willsmanley

I concur the example that is shared simply does not work and is horrendously misleading. Please at the very minimum make a comment on the page it does not work. @ilham-asgarli Many thanks for the example. I gave it to my good friend Claude and asked nicely to use riverpod for state management - I have a working version very impressive. A really great help thanks. I will try and create a hello world and make it availabe

chrisn-au avatar Jun 22 '25 05:06 chrisn-au

You're very welcome @chrisn-au. I'm glad it helped in some way, and great to hear you got it working with Riverpod.

ilham-asgarli avatar Jun 22 '25 07:06 ilham-asgarli