commanddash icon indicating copy to clipboard operation
commanddash copied to clipboard

[Feat]: Integrate vision model in chat.

Open yatendra2001 opened this issue 2 years ago • 9 comments

Description

At present, Gemini-pro-vision doesn't support multi-turn text-based conversations. However, there is a need for multi-turn multi-modal chat capabilities.

One option is to await the implementation of multi-turn multi-modal functionality in Gemini. Alternatively, as a temporary solution, we can consider attaching images as discrete features to enrich the conversation experience.

What do you think @samyakkkk ?

Screenshot 2023-12-23 at 2 07 51 PM

yatendra2001 avatar Dec 23 '23 08:12 yatendra2001

@superiorsd10 would you like to take this up next?

samyakkkk avatar Dec 31 '23 11:12 samyakkkk

Let's keep it simple. Multimodal chat works the same way as normal chat.

  1. We will just add an image selection square box icon on the left of the text field. Users can use it to attach one of multiple images and send a message to Gemini.
  2. Since the multimodal models doesn't support multi turn chats, if a user sends a follow up, we show an error snackbar saying: "Follow up not allowed with images. Please clear existing chat to send a new message."

samyakkkk avatar Dec 31 '23 11:12 samyakkkk

@superiorsd10 would you like to take this up next?

Yes, I would like to work on this. But before this, I'll have to work on #148. So that the users can clear the existing chat.

superiorsd10 avatar Dec 31 '23 13:12 superiorsd10

just looping in, @superiorsd10 checkout generateTextFromImage in gemini-repository.ts. It can particularly help in this case.

yatendra2001 avatar Dec 31 '23 14:12 yatendra2001

just looping in, @superiorsd10 checkout generateTextFromImage in gemini-repository.ts. It can particularly help in this case.

Sure, thanks for the help :)

superiorsd10 avatar Dec 31 '23 15:12 superiorsd10

Hello @samyakkkk and @yatendra2001 👋

I wanted to share the approach to integrating this feature into the extension. Please review the plan, and if any changes or adjustments are needed, your feedback would be greatly appreciated. If everything looks good, I'm excited to start working on the implementation.

Here's the proposed approach (as understood by me from the above conversation):

  • Introduce an image selection box positioned to the left of the prompt text field.
  • Users can tap the image selection box to choose a single image (accepted formats: 'png', 'jpg', 'jpeg', 'gif', 'bmp').
  • Allow users to enter a prompt text alongside the selected image.
  • Call the generateTextFromImage function, passing the prompt, selected image, and image type for processing.
  • Display the result obtained from generateTextFromImage in the chat container as a message generated by the model.
  • Implement error handling for attempting to add another image in the same conversation without clearing the chat history.
  • In case of a follow-up attempt with an image present, show an error snack bar instructing the user to clear the chat history before sending a new message.

Thank you,

superiorsd10 avatar Jan 03 '24 16:01 superiorsd10

@superiorsd10 thanks for the clear plan of action. lgtm! let's execute.

samyakkkk avatar Jan 04 '24 06:01 samyakkkk

Hello @yatendra2001 👋

I want to show the selected image in the user's message (in the chat-container) with the prompt, but it's not working, and only alt text is being shown.

For the debugging purposes, I hardcoded the image path to be shown, but still it's not being shown.

Can you please help me with this? Am I missing something here?

Thank you,

superiorsd10 avatar Jan 20 '24 20:01 superiorsd10

Hi @superiorsd10, can we connect in our community channel: https://join.slack.com/t/welltested-ai/shared_invite/zt-25u09fty8-gaggH9HbmopB~4tialTrlA.

We will be able to work closely with you here. Please send me a 👋 .

samyakkkk avatar Jan 21 '24 04:01 samyakkkk