Claude CLI not working with sonnet 4 in bedrock [BUG]
Environment
- Platform (select one):
- [ ] Anthropic API
- [x] AWS Bedrock
- [ ] Google Vertex AI
- [ ] Other:
- Claude CLI version: 1.0.3 (Claude Code)
- Operating System: MacOS 15.5
- Terminal: iTerm, Warp
Bug Description
When using claude cli with sonnet 4 bedrock model, it received API Error (429 Too many tokens, please wait before trying again.) message and never gets a response
Steps to Reproduce
-
Run
DISABLE_TELEMETRY=1 ANTHROPIC_SMALL_FAST_MODEL=us.anthropic.claude-3-5-haiku-20241022-v1:0 AWS_REGION=us-east-1 CLAUDE_CODE_USE_BEDROCK=1 DISABLE_PROMPT_CACHING=1 claude --model us.anthropic.claude-sonnet-4-20250514-v1:0 -
Type in a prompt as simple as 'hi'
-
429 Comes back
Expected Behavior
Expect a response from the sonnet 4 model
Actual Behavior
429 too make tokens API error
Additional Context
If I run the exact same command but use sonnet 3-7, no problems. I have access to the sonnet 4 in AWS in the regions I use in the CLI command. I have used that model with other tools without a problem.
Copying comment from closed duplicate #1439:
I had the same issue when trying Bedrock for Claude Code yesterday. I can even send requests using the SAME IAM credentials to InvokeModel at the same time while the errors are happening in Claude Code with other tools and they work completely fine. I had a similar issue while trying Bedrock with Aider and was only able to use it there with the Converse API, not with the default InvokeModel API. Something must be up with how the Bedrock API is called or some weird issue on the Bedrock platform.
I can write requests that use all claude models over bedrock in normal python and bash scripts. But in Claude Code it does not work and since the beginning i get same 429. It is not related to the quotas and it is not necessary to request a quota increase to the aws team. That can be seen under https://us-east-1.console.aws.amazon.com/servicequotas/home/services/bedrock/quotas It is something funky of Claude Code+bedrock 🤷🏼♂️
I found a similar issue in the cline repo: https://github.com/cline/cline/issues/3954
This really seems to be some weird Bedrock issue. I'll gladly spend some time debugging this if someone can point me in the right direction.
Having the same issue as @strayer ;(
Same her
I am having the same issue with the Claude Sonnet 4 I can invoke Claude Sonnet 4 using cli and It works fine.
aws bedrock-runtime invoke-model \
--model-id arn:aws:bedrock:us-west-2:XXXX:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0 \
--body '{
"messages": [
{
"role": "user",
"content": "Tell me a joke."
}
],
"max_tokens": 1024,
"temperature": 0.8,
"anthropic_version": "bedrock-2023-05-31"
}' \
--content-type application/json \
--accept application/json \
--cli-binary-format raw-in-base64-out \
--region us-west-2 \
out.txt
But for some reason I can not invoke it using bedrock client Here is code snippet that I have used
const {
BedrockRuntimeClient,
InvokeModelWithResponseStreamCommand,
} = require("@aws-sdk/client-bedrock-runtime");
....
this.bedrock = new BedrockRuntimeClient({ region: region });
async invoke(messages) {
const preparedBody = this.constructBody(messages);
const params = {
modelId: this.modelRef,
contentType: "application/json",
accept: "*/*",
body: JSON.stringify(preparedBody),
};
const command = new InvokeModelWithResponseStreamCommand(params);
const response = await this.bedrock.send(command);
return response.body;
}
...
Is this relevant? https://community.aws/content/2xVZmCM5E7XXw0yqTEGgXYxRowk/bedrock-claude-4-burndown-rates
Hi everyone!
I am having the same issue when I am using cli and also the console even for a simple hello in a prompt and as the previous users mentioned it shows me the: too many tokens error
aws bedrock-runtime invoke-model \
--model-id arn:aws:bedrock:us-east-1:xxxxxx:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0 \
--body fileb://input.json \
--content-type application/json \
--accept application/json \
--cli-binary-format raw-in-base64-out \
--region us-east-1 \
out.txt
An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 2): Too many tokens, please wait before trying again.
I also tried creating an inference profile on aws for using claude 4 because it does not work on demand. (shows me:
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: Invocation of model ID anthropic.claude-sonnet-4-20250514-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of an inference profile that contains this model.
)
So now I have the following code:
import argparse
import json
import botocore.auth
import botocore.session
from botocore.awsrequest import AWSRequest
from botocore.httpsession import URLLib3Session
# --- Configuration ---
REGION = 'us-east-1'
inference_profile_id = 'us.anthropic.claude-sonnet-4-20250514-v1:0'
INFERENCE_PROFILE_ARN = f'arn:aws:bedrock:{REGION}:xxxxxx:application-inference-profile/{inference_profile_id}'
STYLE_PROMPT_PATH = "CONF_TEXT_ICONIP2025.txt"
ENDPOINT = f"https://bedrock-runtime.{REGION}.amazonaws.com/application-inference-profile/{inference_profile_id}/invoke"
# --- Build payload for Claude prompt-style API ---
def load_style_prompt(path):
with open(path, "r", encoding="utf-8") as f:
return f.read().strip()
def build_payload(style_prompt, user_message):
return {
"input": f"{style_prompt}\n\n{user_message}"
}
# --- Sign and send request ---
def invoke_claude(payload):
session = botocore.session.get_session()
credentials = session.get_credentials().get_frozen_credentials()
aws_request = AWSRequest(
method="POST",
url=ENDPOINT,
data=json.dumps(payload),
headers={
"content-type": "application/json",
"accept": "application/json"
}
)
sigv4 = botocore.auth.SigV4Auth(credentials, "bedrock", REGION)
sigv4.add_auth(aws_request)
prepared = aws_request.prepare()
print("\nSigned Headers Sent to AWS:")
for k, v in dict(prepared.headers).items():
print(f"{k}: {v}")
http_session = URLLib3Session()
response = http_session.send(prepared)
print("\nPayload Sent to AWS:")
print(prepared.body)
return response
# --- CLI Entrypoint ---
def main():
parser = argparse.ArgumentParser(description="Call Claude with a styled message.")
parser.add_argument("--message", required=True, help="The user message to send.")
args = parser.parse_args()
style_prompt = load_style_prompt(STYLE_PROMPT_PATH)
payload = build_payload(style_prompt, args.message)
response = invoke_claude(payload)
print(f"\nStatus Code: {response.status_code}")
try:
parsed = json.loads(response.text)
print(json.dumps(parsed, indent=2))
except Exception:
print("Could not parse response:")
print(response.text)
if __name__ == "__main__":
main()
and the output is:
Status Code: 200
{
"Output": {
"__type": "com.amazon.coral.service#UnknownOperationException"
},
"Version": "1.0"
}
I would be very grateful if someone has resolved this and has any idea on what to do next :)
Having the same issue, I can call the Sonnet 4 model on the AWS console from us-west-2, however I keep receiving 429 errors on the Bedrock client, the same code works for Sonnet 3.5.
This is not an issue with the Quotas either, they are not 0 from what I can see.
node_modules/@anthropic-ai/claude-code/cli.js L1614
function gI5(A){if(A.includes("3-5"))return 8192;if(A.includes("haiku"))return 8192;let B=process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS;if(B){let Q=parseInt(B,10);if(!isNaN(Q)&&Q>0)return Q}return 32000}
Is this relevant? https://community.aws/content/2xVZmCM5E7XXw0yqTEGgXYxRowk/bedrock-claude-4-burndown-rates
According to this post, max_tokens 32000 may be too big. When I use bedrock-runtime directly, max_tokens limit is between 35000 and 40000 (sending 40000 will get ThrottlingException. I didn't encountered the situation using 3.7 Sonnet). The quota settings only allows one request simultaneously for 32000 tokens.
Also, joannakarayianni 's post seems to specify too big max tokens in input.json
workaround
node_modules/@anthropic-ai/claude-code/cli.js L1614
* before
function gI5(A){if(A.includes("3-5"))return 8192;if(A.includes("haiku"))return 8192;let B=process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS;if(B){let Q=parseInt(B,10);if(!isNaN(Q)&&Q>0)return Q}return 32000}
* after
function gI5(A){if(A.includes("3-5"))return 1024;if(A.includes("haiku"))return 1024;let B=process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS;if(B){let Q=parseInt(B,10);if(!isNaN(Q)&&Q>0)return 1024}return 1024}
Changing this function works fine for my env
EDIT
I understood about CLAUDE_CODE_MAX_OUTPUT_TOKENS in the function.
We don't need cli.js modification.
We can just add "CLAUDE_CODE_MAX_OUTPUT_TOKENS" env into ~/.claude/settings.json.
~/.claude/settings.json
{
"env": {
"CLAUDE_CODE_USE_BEDROCK": "true",
"ANTHROPIC_MODEL": "apac.anthropic.claude-sonnet-4-20250514-v1:0",
"CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024
}
}
@s-aita that eliminates the 429s for me. I tested it up to 16384 and it worked for a while, but it ended up crapping out with 429s after a while of use. I'll investigate later if there's some metric in AWS we can increase to make it work like it does with sonnet 3-7. I recall from @muraoka-delta 's article post that the output token by sonnet 4 is 5x of that. So maybe there's some config somewhere in AWS to account for that increase. TBD.
setting my .claude/settings.json file like this resolved the issue for me.
{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"provider": "bedrock",
"env": {
"CLAUDE_CODE_USE_BEDROCK": "true",
"ANTHROPIC_MODEL": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024
},
"permissions": {
"allow": [
"Bash(aws configure:*)"
],
"deny": []
}
}
@cachatj thanks that helps! with opus it works for a bit and then i get the error " ⎿ API Error (503 Model is getting throttled. Try your request again.) · Retrying in 1 seconds… (attempt 1/10)..."
Any idea why that can be and how to fix it?
@cachatj thanks that helps! with opus it works for a bit, and then I get the error " ⎿ API Error (503 Model is getting throttled. Try your request again.) · Retrying in 1 seconds… (attempt 1/10)..."
Any idea why that can be and how to fix it?
@jralduaveuthey - Unfortunately, those token limits are set by anthropic, and everyone hates them. I couldn't agree more & frankly the most significant promise of using Claude code was "full repo/codebase awareness," which it doesn't deliver on at all.
In fact, with some of the pipeline jNBs I work on day to day, it cannot even troubleshoot or productively assist with a single file, let alone the whole codebase. It's a true disappointment, and it pushed me back to co-pilot / Roocode in VS Code and the Claude desktop on its own.
This has been a huge community issue for so long, it's hard to remain optimistic that anthropoic will ever deliver a solution. Oh, and using your own instance of Claude (served from GCP or AWS ) doesn't make it any better. 😢
this env settings works for me
I tried this and my experience got better but after a while (probably 20mins of use - just hitting yes on every change lol) it did the 429 errors
this env settings works for me
I tried this and my experience got better but after a while (probably 20mins of use - just hitting yes on every change lol) it did the 429 errors
Yes, bedrock quota issue
Here's what I found during my testing, which worked for me:
I tested the Bedrock client using the following models:
us.anthropic.claude-sonnet-4-20250514-v1:0, us.anthropic.claude-opus-4-20250514-v1:0
Note: Make sure to set region = us-west-2. Other regions either have lower rate limits or do not support these models.
Claude Sonnet 4 Officially, the maximum max_tokens is documented as 64,000. However, in practice, requests with max_tokens above 39,200 fail. This issue is also reproducible via the Bedrock Chat UI. To avoid errors, I’ve been setting max_tokens below 39,200. If you don't explicitly set it, it defaults to 64,000 and results in an error.
Claude Opus 4 According to the Bedrock Chat UI, the maximum max_tokens is 32,000 — and it does accept that value. However, the strange part is that even though the response is returned successfully, an error is also thrown. This means you'll need special error handling to manage the response properly. I haven’t implemented that yet, so I can’t confirm whether handling it that way would fully resolve the issue.
Conclusion: for sonnet-4 max_token=39200 worked for me.
Here's what I found during my testing, which worked for me:
I tested the Bedrock client using the following models:
us.anthropic.claude-sonnet-4-20250514-v1:0,us.anthropic.claude-opus-4-20250514-v1:0Note: Make sure to set region = us-west-2. Other regions either have lower rate limits or do not support these models.
Claude Sonnet 4 Officially, the maximum max_tokens is documented as 64,000. However, in practice, requests with max_tokens above 39,200 fail. This issue is also reproducible via the Bedrock Chat UI. To avoid errors, I’ve been setting max_tokens below 39,200. If you don't explicitly set it, it defaults to 64,000 and results in an error.
Claude Opus 4 According to the Bedrock Chat UI, the maximum max_tokens is 32,000 — and it does accept that value. However, the strange part is that even though the response is returned successfully, an error is also thrown. This means you'll need special error handling to manage the response properly. I haven’t implemented that yet, so I can’t confirm whether handling it that way would fully resolve the issue.
Conclusion: for sonnet-4
max_token=39200worked for me.
Where are you setting this max_token value? Is it an environment variable?
@Thomas-McKanna Some in this thread found CLAUDE_CODE_MAX_OUTPUT_TOKENS as a config for claude code. So use that for now. You'll still get rate limits, but the lower you set it the lower throttling. It's been a trade off for me between rate limits and output tokens. Honestly, I've just been using 3-7 models without any problem, but do still want the v4 to work sooner than later.
Hey folks, This is Ara from the Cline team here. I found the answer for this in the AWS service Quotas(Make sure to select the right region).
Basically the Account level service quota for Claude 4 family of models(Sonnet 4 and Opus 4 is this )
I do not believe this Quota is super strict but I can totally see how its very easy to hit this value(for output tokens). I encourage you to go to the account and request and increase for all the related quotas for the models that are hitting these errors for you.
Alternatively you can lower the max tokens of output(CLAUDE_CODE_USE_BEDROCK) but that will only work until it doesn't
{
"model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"provider": "bedrock",
"env": {
"CLAUDE_CODE_USE_BEDROCK": "true",
"ANTHROPIC_MODEL": "us.anthropic.claude-sonnet-4-20250514-v1:0",
"CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024
},
"permissions": {
"allow": [
"Bash(aws configure:*)"
],
"deny": []
}
}
I was experiencing lots of 429 and 503's by simply saying "hello". I switched from Opus 4 to Sonnet 4 and it started working again.
https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html for better throughput. ARN can be found here https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/inference-profiles
my setup:
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "0",
"OTEL_METRICS_EXPORTER": "otlp",
"AWS_REGION":"us-west-2",
"ANTHROPIC_SMALL_FAST_MODEL": "arn:aws:bedrock:us-west-2:PROFILE_ID:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0",
"ANTHROPIC_MODEL":"arn:aws:bedrock:us-west-2:PROFILE_ID:inference-profile/us.anthropic.claude-opus-4-20250514-v1:0",
"CLAUDE_CODE_USE_BEDROCK": "1"
}
It seems that Anthropic has increased the API rate limits. Has anyone tried if this helps solving the problem?
For Claude Opus 4, the new rate limits are: 2,000,000 input tokens per minute 400,000 output tokens per minute
It seems that Anthropic has increased the API rate limits. Has anyone tried if this helps solving the problem?
For Claude Opus 4, the new rate limits are: 2,000,000 input tokens per minute 400,000 output tokens per minute
I've noticed fewer Bedrock API rate limits with larger token counts. I just tried claude with default CLAUDE_CODE_MAX_OUTPUT_TOKENS and it worked. I haven't stress tested it out, but looks ok.
Closing this as it's not a Claude Code issue. Thanks for all the debugging and suggestions here! Pls open a new issue if there's something we can fix in CC.
This issue has been automatically locked since it was closed and has not had any activity for 7 days. If you're experiencing a similar issue, please file a new issue and reference this one if it's relevant.
this env settings works for me