No examples documented for processing image/PDF files in multi-agent workflows
I'm building a multi-agent system that needs to process files provided as either images or PDFs. However, I couldn't find any examples or documentation that show:
- How to read and load an image or PDF file as input
- How to pass the file through multiple agents for processing
- How to handle different file types within the workflow
Most of the current documentation and examples focus on text inputs. A minimal example for handling and processing image/PDF files in a multi-agent setup would be very helpful.
Thank you!
This repository contains the example you are looking for(it may scatter across multiple agents): https://github.com/google/adk-samples. For example, FOMC agent, RAG agent.
Thanks @hangfei I agree with @nikogamulin there is no concrete example available in the docs or in the adk-samples repo. Please check.
@polong-lin @turanbulmus would like to see your thoughts on this.
In theory you could pass any file blob data to model, if you are curious, you can try out in adk web by pressing the attachment button, and send files/pdfs to your agent!
I agree with Hangfei. Both examples:
- https://github.com/google/adk-samples/tree/main/agents/RAG
- https://github.com/google/adk-samples/tree/main/agents/fomc-research provides samples on how to process documents including PDFs
Let's talk about handling files in the documentation.
Both examples:
- https://github.com/google/adk-samples/tree/main/agents/RAG
- https://github.com/google/adk-samples/tree/main/agents/fomc-research
provides samples on how to process documents by downloading from url but they don't show how to receive files uploaded via adk web, agent can still see the file but I don't know how to make it save that file to artifact for other agent processing later
I tried this but no luck
def save_uploaded_image_to_artifact(callback_context: CallbackContext, llm_response: LlmResponse) -> Optional[LlmResponse]:
image_bytes = callback_context.state["image_bytes"]
print('image_bytes = ' + image_bytes)
image_artifact = adk_types.Part(
inline_data=adk_types.Blob(
mime_type="image/png",
data=image_bytes
)
)
callback_context.save_artifact("test", image_artifact)
return None
test_agent = Agent(
name="TranslateAgent",
model=GEMINI_AGENT_MODEL,
instruction="""
Your task is to receive uploaded image, creates a types.Part from its bytes and MIME type and save to state use the key "image_bytes"
""",
output_key="image_bytes",
after_model_callback=save_uploaded_image_to_artifact
)
Trace log
ERROR - fast_api.py:616 - Error in event_generator: 'image_bytes'
Traceback (most recent call last):
File "<Project Path>\.venv\Lib\site-packages\google\adk\cli\fast_api.py", line 605, in event_generator
async for event in runner.run_async(
File "<Project Path>\.venv\Lib\site-packages\google\adk\runners.py", line 197, in run_async
async for event in invocation_context.agent.run_async(invocation_context):
File "<Project Path>\.venv\Lib\site-packages\google\adk\agents\base_agent.py", line 141, in run_async
async for event in self._run_async_impl(ctx):
File "<Project Path>\.venv\Lib\site-packages\google\adk\agents\sequential_agent.py", line 36, in _run_async_impl
async for event in sub_agent.run_async(ctx):
File "<Project Path>\.venv\Lib\site-packages\google\adk\agents\base_agent.py", line 141, in run_async
async for event in self._run_async_impl(ctx):
File "<Project Path>\.venv\Lib\site-packages\google\adk\agents\llm_agent.py", line 232, in _run_async_impl
async for event in self._llm_flow.run_async(ctx):
File "<Project Path>\.venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 231, in run_async
async for event in self._run_one_step_async(invocation_context):
File "<Project Path>\.venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 257, in _run_one_step_async
async for llm_response in self._call_llm_async(
File "<Project Path>\.venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 482, in _call_llm_async
if altered_llm_response := self._handle_after_model_callback(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<Project Path>\.venv\Lib\site-packages\google\adk\flows\llm_flows\base_llm_flow.py", line 529, in _handle_after_model_callback
return agent.after_model_callback(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<Project Path>\agent.py", line 124, in save_uploaded_image_to_artifact
image_bytes = callback_context.state["image_bytes"]
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
File "<Project Path>\.venv\Lib\site-packages\google\adk\sessions\state.py", line 38, in __getitem__
return self._value[key]
~~~~~~~~~~~^^^^^
KeyError: 'image_bytes'
converting the file to binary, sending it to parts and then as a part is working for my case
pdf_part = types.Part.from_bytes(data=file_as_binary("files\1103842827 - Note 3.pdf"),mime_type="application/pdf") prompt = types.Part.from_text(text="this is the Note document to analyze") content = types.Content(role='user', parts=[prompt,pdf_part]) await call_agent_async(content)
I still can't use adk web interface to receive the file, I changed to create a flask rest endpoint and upload it through form data then read the file in binary and pass it to agent as https://github.com/google/adk-docs/issues/355 and it works
Can you share the code snippet to show the image in the adk web ui? Do we just have to save it to artifact storage or it involves something extra?
Here is the code I used to modify an .eml (email) file with attachments:
root_agent = Agent(
name="weather_time_agent",
model="gemini-2.0-flash",
description=(
"Agent to answer questions about the time and weather in a city, and handle .json file uploads."
),
instruction=(
"You are a helpful agent who can answer user questions about the time and weather in a city, and can accept .json file uploads."
),
tools=[get_weather, get_current_time],
before_agent_callback=modify_attachment
)
`def modify_attachment(callback_context: CallbackContext) -> Optional[types.Content]: """ Logs entry and checks 'skip_llm_agent' in session state. If True, returns Content to skip the agent's execution. If False or not present, returns None to allow execution. """ agent_name = callback_context.agent_name invocation_id = callback_context.invocation_id current_state = callback_context.state.to_dict()
parts_reformatted = []
for part in callback_context.user_content.parts:
if hasattr(part, 'inline_data') and part.inline_data and hasattr(part.inline_data, 'data'):
mime_type = part.inline_data.mime_type
if mime_type == "message/rfc822":
email_content_bytes = part.inline_data.data
# print(email_content_bytes)
processed_email = process_email_content(email_content_bytes)
email_body = _extract_text_from_html(processed_email['body_html'])
email_headers = processed_email['headers']
str_email_headers = json.dumps(email_headers, indent=4)
str_email_content = f"""Headers:
{str_email_headers}
Body:
{email_body}
"""
parts_reformatted.append(types.Part(text=str_email_content))
attached_pdfs = [attachment for attachment in processed_email['attachments'] if attachment['content_type'].startswith('application/pdf')]
pdf_mime_type = "application/pdf"
# attach first email as text
for pdf in attached_pdfs:
pdf_artifact = types.Part(inline_data=types.Blob(data=pdf['data'], mime_type=pdf_mime_type))
parts_reformatted.append(pdf_artifact)
else:
parts_reformatted.append(part)
callback_context.user_content.parts = parts_reformatted`
Can Agents communicate with each other using the artifacts, suppose agent1 creates an image and agent2 gets this generated artifact and does analysis on the artifact?
In theory you could pass any file blob data to model,
This issue still features prominently in search results, and it took me a while to turn this comment into code. In hindsight, https://github.com/google/adk-docs/issues/355#issuecomment-2925528347 already tried to point this out, but the formatting is broken, and the code was too convoluted for me to extract the essence.
Others might find this useful:
from pathlib import Path
from google.genai import types
pdf_path = Path("/tmp/some.pdf")
# Attach file to user prompt.
user_content = types.Content(
role="user",
parts=[
# TODO: Add other types.Part(...) as needed.
types.Part(
inline_data=types.Blob(
display_name=pdf_path.name,
data=pdf_path.read_bytes(), # <-- Attach the file
mime_type="application/pdf",
)
)
],
)
Because this reads the entire file into memory, it's a good idea to check for a maximum acceptable file size.