adk-docs icon indicating copy to clipboard operation
adk-docs copied to clipboard

No examples documented for processing image/PDF files in multi-agent workflows

Open nikogamulin opened this issue 9 months ago • 13 comments

I'm building a multi-agent system that needs to process files provided as either images or PDFs. However, I couldn't find any examples or documentation that show:

  • How to read and load an image or PDF file as input
  • How to pass the file through multiple agents for processing
  • How to handle different file types within the workflow

Most of the current documentation and examples focus on text inputs. A minimal example for handling and processing image/PDF files in a multi-agent setup would be very helpful.

Thank you!

nikogamulin avatar Apr 15 '25 19:04 nikogamulin

This repository contains the example you are looking for(it may scatter across multiple agents): https://github.com/google/adk-samples. For example, FOMC agent, RAG agent.

hangfei avatar Apr 15 '25 19:04 hangfei

Thanks @hangfei I agree with @nikogamulin there is no concrete example available in the docs or in the adk-samples repo. Please check.

QAInsights avatar Apr 15 '25 22:04 QAInsights

@polong-lin @turanbulmus would like to see your thoughts on this.

hangfei avatar Apr 16 '25 00:04 hangfei

In theory you could pass any file blob data to model, if you are curious, you can try out in adk web by pressing the attachment button, and send files/pdfs to your agent!

Image

wyf7107 avatar Apr 17 '25 06:04 wyf7107

I agree with Hangfei. Both examples:

  • https://github.com/google/adk-samples/tree/main/agents/RAG
  • https://github.com/google/adk-samples/tree/main/agents/fomc-research provides samples on how to process documents including PDFs

turanbulmus avatar Apr 17 '25 07:04 turanbulmus

Let's talk about handling files in the documentation.

boyangsvl avatar Apr 19 '25 01:04 boyangsvl

Both examples:

  • https://github.com/google/adk-samples/tree/main/agents/RAG
  • https://github.com/google/adk-samples/tree/main/agents/fomc-research

provides samples on how to process documents by downloading from url but they don't show how to receive files uploaded via adk web, agent can still see the file but I don't know how to make it save that file to artifact for other agent processing later

I tried this but no luck

def save_uploaded_image_to_artifact(callback_context: CallbackContext, llm_response: LlmResponse) -> Optional[LlmResponse]:
    image_bytes = callback_context.state["image_bytes"]
    print('image_bytes = ' + image_bytes)
    image_artifact = adk_types.Part(
        inline_data=adk_types.Blob(
            mime_type="image/png",
            data=image_bytes
        )
    )
    callback_context.save_artifact("test", image_artifact)
    return None

test_agent = Agent(
    name="TranslateAgent",
    model=GEMINI_AGENT_MODEL,
    instruction="""
    Your task is to receive uploaded image, creates a types.Part from its bytes and MIME type and save to state use the key "image_bytes"
    """,
    output_key="image_bytes",
    after_model_callback=save_uploaded_image_to_artifact
)

Trace log

ERROR - fast_api.py:616 - Error in event_generator: 'image_bytes'
Traceback (most recent call last):
  File "<Project Path>\.venv\Lib\site-packages\google\adk\cli\fast_api.py", line 605, in event_generator
    async for event in runner.run_async(
  File "<Project Path>\.venv\Lib\site-packages\google\adk\runners.py", line 197, in run_async
    async for event in invocation_context.agent.run_async(invocation_context):
  File "<Project Path>\.venv\Lib\site-packages\google\adk\agents\base_agent.py", line 141, in run_async
    async for event in self._run_async_impl(ctx):
  File "<Project Path>\.venv\Lib\site-packages\google\adk\agents\sequential_agent.py", line 36, in _run_async_impl
    async for event in sub_agent.run_async(ctx):
  File "<Project Path>\.venv\Lib\site-packages\google\adk\agents\base_agent.py", line 141, in run_async
    async for event in self._run_async_impl(ctx):
  File "<Project Path>\.venv\Lib\site-packages\google\adk\agents\llm_agent.py", line 232, in _run_async_impl
    async for event in self._llm_flow.run_async(ctx):
  File "<Project Path>\.venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 231, in run_async
    async for event in self._run_one_step_async(invocation_context):
  File "<Project Path>\.venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 257, in _run_one_step_async
    async for llm_response in self._call_llm_async(
  File "<Project Path>\.venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 482, in _call_llm_async
    if altered_llm_response := self._handle_after_model_callback(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<Project Path>\.venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 529, in _handle_after_model_callback
    return agent.after_model_callback(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<Project Path>\agent.py", line 124, in save_uploaded_image_to_artifact
    image_bytes = callback_context.state["image_bytes"]
                  ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "<Project Path>\.venv\Lib\site-packages\google\adk\sessions\state.py", line 38, in __getitem__
    return self._value[key]
           ~~~~~~~~~~~^^^^^
KeyError: 'image_bytes'

haitub4 avatar Apr 19 '25 02:04 haitub4

converting the file to binary, sending it to parts and then as a part is working for my case

pdf_part = types.Part.from_bytes(data=file_as_binary("files\1103842827 - Note 3.pdf"),mime_type="application/pdf") prompt = types.Part.from_text(text="this is the Note document to analyze") content = types.Content(role='user', parts=[prompt,pdf_part]) await call_agent_async(content)

ramanujan-dev avatar Apr 23 '25 06:04 ramanujan-dev

I still can't use adk web interface to receive the file, I changed to create a flask rest endpoint and upload it through form data then read the file in binary and pass it to agent as https://github.com/google/adk-docs/issues/355 and it works

haitub4 avatar Apr 28 '25 03:04 haitub4

Can you share the code snippet to show the image in the adk web ui? Do we just have to save it to artifact storage or it involves something extra?

Antriksh-Narang avatar May 08 '25 05:05 Antriksh-Narang

Here is the code I used to modify an .eml (email) file with attachments:

root_agent = Agent(
    name="weather_time_agent",
    model="gemini-2.0-flash",
    description=(
        "Agent to answer questions about the time and weather in a city, and handle .json file uploads."
    ),
    instruction=(
        "You are a helpful agent who can answer user questions about the time and weather in a city, and can accept .json file uploads."
    ),
    tools=[get_weather, get_current_time],
    before_agent_callback=modify_attachment
)

`def modify_attachment(callback_context: CallbackContext) -> Optional[types.Content]: """ Logs entry and checks 'skip_llm_agent' in session state. If True, returns Content to skip the agent's execution. If False or not present, returns None to allow execution. """ agent_name = callback_context.agent_name invocation_id = callback_context.invocation_id current_state = callback_context.state.to_dict()

parts_reformatted = []
for part in callback_context.user_content.parts:
    if hasattr(part, 'inline_data') and part.inline_data and hasattr(part.inline_data, 'data'):
        mime_type = part.inline_data.mime_type
        if mime_type == "message/rfc822":
            email_content_bytes = part.inline_data.data
            # print(email_content_bytes)
            processed_email = process_email_content(email_content_bytes)
            email_body = _extract_text_from_html(processed_email['body_html'])
            email_headers = processed_email['headers']
            str_email_headers = json.dumps(email_headers, indent=4)
            str_email_content = f"""Headers:
            {str_email_headers}

            Body:
            {email_body}
            """
            parts_reformatted.append(types.Part(text=str_email_content))
            attached_pdfs = [attachment for attachment in processed_email['attachments'] if attachment['content_type'].startswith('application/pdf')]
            pdf_mime_type = "application/pdf"    
            # attach first email as text
            
            for pdf in attached_pdfs:
                pdf_artifact = types.Part(inline_data=types.Blob(data=pdf['data'], mime_type=pdf_mime_type))
                parts_reformatted.append(pdf_artifact)
    else:
        parts_reformatted.append(part)
            
callback_context.user_content.parts = parts_reformatted`

nikogamulin avatar May 08 '25 12:05 nikogamulin

Can Agents communicate with each other using the artifacts, suppose agent1 creates an image and agent2 gets this generated artifact and does analysis on the artifact?

NevinMath avatar Jun 17 '25 21:06 NevinMath

In theory you could pass any file blob data to model,

This issue still features prominently in search results, and it took me a while to turn this comment into code. In hindsight, https://github.com/google/adk-docs/issues/355#issuecomment-2925528347 already tried to point this out, but the formatting is broken, and the code was too convoluted for me to extract the essence.

Others might find this useful:

from pathlib import Path
from google.genai import types

pdf_path = Path("/tmp/some.pdf")

# Attach file to user prompt.
user_content = types.Content(
    role="user",
    parts=[
        # TODO: Add other types.Part(...) as needed.
        types.Part(
            inline_data=types.Blob(
                display_name=pdf_path.name,
                data=pdf_path.read_bytes(),   # <-- Attach the file
                mime_type="application/pdf",
            )
        )
    ],
)

Because this reads the entire file into memory, it's a good idea to check for a maximum acceptable file size.

roskakori avatar Oct 30 '25 12:10 roskakori