Implement multimodal request support for Gemini API (#2)
This pull request introduces multimodal support for the Google Gemini API within the ChatAIze.GenerativeCS library, addressing issue #2. Users can now send requests combining text with various file types (PDF, DOC, TXT, images, audio, video).
Key Changes Implemented:
-
Gemini File Service (
FileService.cs,IFileService.cs):- Added a new service to handle interactions with the Gemini Files API.
- Supports uploading files (via path or stream), retrieving file metadata, listing uploaded files, and deleting files.
- Follows the resumable upload protocol as per Gemini API documentation.
- File service and interface names (
FileService,IFileService) are aligned with existing provider naming conventions (e.g.,ChatCompletion.cs).
-
Enhanced Chat Message Structure (
ChatMessage.cs,ChatContentPart.cs):-
ChatMessage.csnow uses anICollection<IChatContentPart> Partsproperty to hold different content types within a single message. - Introduced
IChatContentPartinterface and concreteTextPartandFileDataPartclasses. -
FileDataPartencapsulatesFileDataSource(MIME type and file URI) for referencing uploaded files. - The existing
ChatMessage.Contentproperty has been marked[Obsolete]and now acts as a getter/setter for the firstTextPartin thePartscollection to maintain backward compatibility.
-
-
Updated Gemini Chat Provider (
ChatCompletion.cs):- The
CreateChatCompletionRequestmethod now iterates throughmessage.Parts. - Correctly serializes
TextPartandFileDataPart(includingmime_typeandfile_uri) into the JSON payload for the Gemini API'sgenerateContentendpoint. - Ensures an empty text part is added if a message has no other content parts, as required by the Gemini API.
- Obsolete warnings for
ChatMessage.Contentusage (for backward compatibility fallback) have been suppressed with#pragma.
- The
-
Client and DI Integration (
GeminiClient.cs,GeminiClientExtension.cs):-
GeminiClient.csnow instantiates and exposes anIFileServicethrough a publicFilesproperty. - Dependency Injection in
GeminiClientExtension.cshas been updated to registerIFileServiceas a singleton, resolving its instance from theGeminiClient.Filesproperty. This ensures a consistentIFileServiceinstance is used.
-
-
Model Updates (
Models/Gemini/)- Added
GeminiFile.cs,GeminiFileUploadRequest.cs,GeminiListFilesResponse.csto represent data structures for the Gemini Files API. - Addressed nullable warnings (CS8618) in these models by using the
requiredmodifier for non-nullable properties expected from the API and initializing collections.
- Added
-
Documentation & Packaging:
- Updated
README.mdwith a new section explaining how to use the multimodal features, including accessingIFileService, uploading files, and sending chat messages with file references. - Incremented the library version in
ChatAIze.GenerativeCS.csprojto0.15.0. - Updated package description and tags in the
.csprojfile to reflect the new multimodal capabilities.
- Updated
How to Test:
- Obtain an instance of
GeminiClient. - Access the file service via
geminiClient.Files. - Upload a supported file (e.g., PDF, PNG) using
fileService.UploadFileAsync(...). - Create a
Chatobject and add aChatMessage. - To the
ChatMessage.Partscollection, add aTextPartand aFileDataPartusing theMimeTypeandUrifrom the uploaded file. - Call
geminiClient.CompleteAsync(chat)and observe the model's response, which should consider the content of the uploaded file.
Future Considerations (Not in this PR):
- Adding higher-level convenience wrappers in
GeminiClient.csto simplify the process of sending a message with a local file (e.g., a method that handles both upload and message creation).
This implementation adheres to the existing coding patterns and architectural style of the library.
Fixes #2