Fix/bedrock cache point messages
Description
This PR fixes the cachePoint formatting issue in BedrockModel that was causing ParamValidationError when using prompt caching with system prompts.
Problem: When using cachePoint in system prompts, Bedrock's API returned a validation error because the system content blocks were not being formatted correctly. Bedrock requires cachePoint to be a separate content block (tagged union), not merged with other fields.
Solution: Added _format_bedrock_system_blocks() method that formats system content blocks using the same logic as message content blocks, ensuring cachePoint blocks remain as separate content blocks.
Verified: Cache metrics now show cacheWriteInputTokens on first request and cacheReadInputTokens on subsequent requests, confirming prompt caching works correctly.
Related Issues
Fixes #1219 Fixes #1015
Documentation PR
N/A - No documentation changes needed
Type of Change
Bug fix
Testing
How have you tested the change?
-
All 92 bedrock unit tests pass (
hatch test -- tests/strands/models/test_bedrock.py) -
All 1608 unit tests pass (
hatch test) -
All 19 bedrock integration tests pass (
hatch run test-integ -- tests_integ/models/test_model_bedrock.py) -
Added 5 new tests specifically for cachePoint formatting
-
Verified cache metrics show actual cache hits with real Bedrock API calls
-
[x] I ran
hatch run prepare
Checklist
- [x] I have read the CONTRIBUTING document
- [x] I have added any necessary tests that prove my fix is effective or my feature works
- [x] I have updated the documentation accordingly
- [x] I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
- [x] My changes generate no new warnings
- [x] Any dependent changes have been merged and published
Output
================================================================================
CACHE PERFORMANCE SUMMARY
================================================================================
Request 1 (First - cache CREATED):
- Cache Write Input Tokens: 1761 tokens <- system prompt written to cache
- Cache Read Input Tokens: 0 tokens
- Input Tokens: 14 tokens <- user query only
Request 2 (Second - cache HIT):
- Cache Write Input Tokens: 0 tokens
- Cache Read Input Tokens: 1761 tokens <- reused from cache!
- Input Tokens: 197 tokens <- user query + conversation
Request 3 (Third - cache HIT):
- Cache Write Input Tokens: 0 tokens
- Cache Read Input Tokens: 1761 tokens <- reused from cache!
- Input Tokens: 442 tokens <- user query + conversation
================================================================================
SUCCESS: Prompt caching is working!
Total Cache Write: 1761 tokens (charged at write rate)
Total Cache Read: 3522 tokens (charged at ~90% discount)
Total Input: 653 tokens (charged at standard rate)
Without caching, requests 2 & 3 would have cost 3522 more input tokens!
================================================================================
Notes
NOTE: Bedrock Prompt Caching Minimum Token Requirements (5 minute cache)
--------------------------------------------------------
| Model | Min Tokens per Cache Checkpoint |
|------------------------|--------------------------------|
| Claude Sonnet 4 | 1,024 tokens |
| Claude 3.7 Sonnet | 1,024 tokens |
| Claude 3.5 Haiku | 2,048 tokens |
| Claude Opus 4.5 | 4,096 tokens |
| Amazon Nova (all) | 1,000 tokens |
--------------------------------------------------------
Reference: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
Here is the error you get:
raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Unknown parameter in system[1]: "cachePoint", must be one of: text, guardContent
This is a result of attempting to use both methods for Bedrock System Prompt Caching. For example:
bedrock_modle = BedrockModel(
boto_session=boto_session,
boto_client_config=bedrock_config,
model_id=model,
temperature=self.parameters.temperature,
streaming=streaming,
cache_prompt=cache_prompt, # True
)
As a result, no cacheRead/Write tokens are being used and input_tokens remains high. System prompts are static and are the majority of the tokens we are being charged for:
{ # First run
...
"input_tokens": 4489,
"output_tokens": 39,
"total_tokens": 4528,
"latency_ms": 2764,
"cycle_count": 1,
"total_duration_s": 3.0020978450775146,
"cache_read_tokens": 0,
"cache_write_tokens": 0,
"time_to_first_byte_ms": null,
"tool_calls": 1
},
{ # Second Run
...
"input_tokens": 4494,
"output_tokens": 40,
"total_tokens": 4534,
"latency_ms": 3257,
"cycle_count": 1,
"total_duration_s": 3.5222342014312744,
"cache_read_tokens": 0,
"cache_write_tokens": 0,
"time_to_first_byte_ms": null,
"tool_calls": 1
}
The file was last modified here from #1112 . This introduced the feature but may have had a formatting bug, making it a regression. Verified this impacts v1.15-v1.19. Did not test v1.14 due to code refactoring required.
Here is a temp fix for anyone that runs into this. Instead of using BedrockModel, use the CacheEnabledBedrockModel and override the problematic functions
https://gist.github.com/af001/09acaeb4360cb4537d10bb6b096de765
Hi @af001, when you say
This is a result of attempting to use both methods for Bedrock System Prompt Caching.
Do you mean both cache_prompt and SystemContentBlock Cache blocks?
If we regressed I want to resolve this quickly, thanks
Hey @dbschmigelski! No regression. This is essentially a hack for users pinned to older botocore versions :). I am going to close this. I have confirmed my issue is resolved by updating botocore>=1.41.5