claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

Claude CLI not working with sonnet 4 in bedrock [BUG]

Open marklaczynski opened this issue 8 months ago • 24 comments

Environment

  • Platform (select one):
    • [ ] Anthropic API
    • [x] AWS Bedrock
    • [ ] Google Vertex AI
    • [ ] Other:
  • Claude CLI version: 1.0.3 (Claude Code)
  • Operating System: MacOS 15.5
  • Terminal: iTerm, Warp

Bug Description

When using claude cli with sonnet 4 bedrock model, it received API Error (429 Too many tokens, please wait before trying again.) message and never gets a response

Steps to Reproduce

  1. Run DISABLE_TELEMETRY=1 ANTHROPIC_SMALL_FAST_MODEL=us.anthropic.claude-3-5-haiku-20241022-v1:0 AWS_REGION=us-east-1 CLAUDE_CODE_USE_BEDROCK=1 DISABLE_PROMPT_CACHING=1 claude --model us.anthropic.claude-sonnet-4-20250514-v1:0

  2. Type in a prompt as simple as 'hi'

  3. 429 Comes back

Expected Behavior

Expect a response from the sonnet 4 model

Actual Behavior

429 too make tokens API error

Additional Context

If I run the exact same command but use sonnet 3-7, no problems. I have access to the sonnet 4 in AWS in the regions I use in the CLI command. I have used that model with other tools without a problem.

marklaczynski avatar May 24 '25 15:05 marklaczynski

Copying comment from closed duplicate #1439:

I had the same issue when trying Bedrock for Claude Code yesterday. I can even send requests using the SAME IAM credentials to InvokeModel at the same time while the errors are happening in Claude Code with other tools and they work completely fine. I had a similar issue while trying Bedrock with Aider and was only able to use it there with the Converse API, not with the default InvokeModel API. Something must be up with how the Bedrock API is called or some weird issue on the Bedrock platform.

strayer avatar May 31 '25 09:05 strayer

I can write requests that use all claude models over bedrock in normal python and bash scripts. But in Claude Code it does not work and since the beginning i get same 429. It is not related to the quotas and it is not necessary to request a quota increase to the aws team. That can be seen under https://us-east-1.console.aws.amazon.com/servicequotas/home/services/bedrock/quotas It is something funky of Claude Code+bedrock 🤷🏼‍♂️

jralduaveuthey avatar May 31 '25 11:05 jralduaveuthey

I found a similar issue in the cline repo: https://github.com/cline/cline/issues/3954

This really seems to be some weird Bedrock issue. I'll gladly spend some time debugging this if someone can point me in the right direction.

strayer avatar May 31 '25 16:05 strayer

Having the same issue as @strayer ;(

neo-picasso-2112 avatar May 31 '25 18:05 neo-picasso-2112

Same her

danieljqs1 avatar May 31 '25 18:05 danieljqs1

I am having the same issue with the Claude Sonnet 4 I can invoke Claude Sonnet 4 using cli and It works fine.

aws bedrock-runtime invoke-model \
--model-id arn:aws:bedrock:us-west-2:XXXX:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0 \
--body '{
  "messages": [
    {
      "role": "user",
      "content": "Tell me a joke."
    }
  ],
  "max_tokens": 1024,
  "temperature": 0.8,
  "anthropic_version": "bedrock-2023-05-31"
}' \
--content-type application/json \
--accept application/json \
--cli-binary-format raw-in-base64-out \
--region us-west-2 \
out.txt

But for some reason I can not invoke it using bedrock client Here is code snippet that I have used

const {
  BedrockRuntimeClient,
  InvokeModelWithResponseStreamCommand,
} = require("@aws-sdk/client-bedrock-runtime");
....
this.bedrock = new BedrockRuntimeClient({ region: region });
async invoke(messages) {
    const preparedBody = this.constructBody(messages);
    const params = {
      modelId: this.modelRef,
      contentType: "application/json",
      accept: "*/*",
      body: JSON.stringify(preparedBody),
    };
    const command = new InvokeModelWithResponseStreamCommand(params);
    const response = await this.bedrock.send(command);
    return response.body;
  }
...

somesh-scoville avatar Jun 02 '25 07:06 somesh-scoville

Is this relevant? https://community.aws/content/2xVZmCM5E7XXw0yqTEGgXYxRowk/bedrock-claude-4-burndown-rates

muraoka-delta avatar Jun 02 '25 12:06 muraoka-delta

Hi everyone!

I am having the same issue when I am using cli and also the console even for a simple hello in a prompt and as the previous users mentioned it shows me the: too many tokens error

aws bedrock-runtime invoke-model \
  --model-id arn:aws:bedrock:us-east-1:xxxxxx:inference-profile/us.anthropic.claude-sonnet-4-20250514-v1:0 \
  --body fileb://input.json \
  --content-type application/json \
  --accept application/json \
  --cli-binary-format raw-in-base64-out \
  --region us-east-1 \
  out.txt

An error occurred (ThrottlingException) when calling the InvokeModel operation (reached max retries: 2): Too many tokens, please wait before trying again.

I also tried creating an inference profile on aws for using claude 4 because it does not work on demand. (shows me:

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the InvokeModel operation: Invocation of model ID anthropic.claude-sonnet-4-20250514-v1:0 with on-demand throughput isn’t supported. Retry your request with the ID or ARN of an inference profile that contains this model.

)

So now I have the following code:

import argparse
import json
import botocore.auth
import botocore.session
from botocore.awsrequest import AWSRequest
from botocore.httpsession import URLLib3Session

# --- Configuration ---
REGION = 'us-east-1'
inference_profile_id = 'us.anthropic.claude-sonnet-4-20250514-v1:0'
INFERENCE_PROFILE_ARN = f'arn:aws:bedrock:{REGION}:xxxxxx:application-inference-profile/{inference_profile_id}'
STYLE_PROMPT_PATH = "CONF_TEXT_ICONIP2025.txt"
ENDPOINT = f"https://bedrock-runtime.{REGION}.amazonaws.com/application-inference-profile/{inference_profile_id}/invoke"

# --- Build payload for Claude prompt-style API ---
def load_style_prompt(path):
    with open(path, "r", encoding="utf-8") as f:
        return f.read().strip()

def build_payload(style_prompt, user_message):
    return {
        "input": f"{style_prompt}\n\n{user_message}"
    }


# --- Sign and send request ---
def invoke_claude(payload):
    session = botocore.session.get_session()
    credentials = session.get_credentials().get_frozen_credentials()

    aws_request = AWSRequest(
        method="POST",
        url=ENDPOINT,
        data=json.dumps(payload),
        headers={
            "content-type": "application/json",
            "accept": "application/json"
        }
    )

    sigv4 = botocore.auth.SigV4Auth(credentials, "bedrock", REGION)
    sigv4.add_auth(aws_request)

    prepared = aws_request.prepare()
    print("\nSigned Headers Sent to AWS:")
    for k, v in dict(prepared.headers).items():
        print(f"{k}: {v}")

    http_session = URLLib3Session()
    response = http_session.send(prepared)
    print("\nPayload Sent to AWS:")
    print(prepared.body)

    return response

# --- CLI Entrypoint ---
def main():
    parser = argparse.ArgumentParser(description="Call Claude with a styled message.")
    parser.add_argument("--message", required=True, help="The user message to send.")
    args = parser.parse_args()

    style_prompt = load_style_prompt(STYLE_PROMPT_PATH)
    payload = build_payload(style_prompt, args.message)
    response = invoke_claude(payload)

    print(f"\nStatus Code: {response.status_code}")
    try:
        parsed = json.loads(response.text)
        print(json.dumps(parsed, indent=2))
    except Exception:
        print("Could not parse response:")
        print(response.text)

if __name__ == "__main__":
    main()

and the output is:

Status Code: 200
{
  "Output": {
    "__type": "com.amazon.coral.service#UnknownOperationException"
  },
  "Version": "1.0"
}

I would be very grateful if someone has resolved this and has any idea on what to do next :)

joannakarayianni avatar Jun 03 '25 10:06 joannakarayianni

Having the same issue, I can call the Sonnet 4 model on the AWS console from us-west-2, however I keep receiving 429 errors on the Bedrock client, the same code works for Sonnet 3.5.

This is not an issue with the Quotas either, they are not 0 from what I can see. Image

samuelchen-ofx avatar Jun 04 '25 03:06 samuelchen-ofx

node_modules/@anthropic-ai/claude-code/cli.js L1614

function gI5(A){if(A.includes("3-5"))return 8192;if(A.includes("haiku"))return 8192;let B=process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS;if(B){let Q=parseInt(B,10);if(!isNaN(Q)&&Q>0)return Q}return 32000}

Is this relevant? https://community.aws/content/2xVZmCM5E7XXw0yqTEGgXYxRowk/bedrock-claude-4-burndown-rates

According to this post, max_tokens 32000 may be too big. When I use bedrock-runtime directly, max_tokens limit is between 35000 and 40000 (sending 40000 will get ThrottlingException. I didn't encountered the situation using 3.7 Sonnet). The quota settings only allows one request simultaneously for 32000 tokens.

Also, joannakarayianni 's post seems to specify too big max tokens in input.json

s-aita avatar Jun 04 '25 04:06 s-aita

workaround

node_modules/@anthropic-ai/claude-code/cli.js L1614

* before
function gI5(A){if(A.includes("3-5"))return 8192;if(A.includes("haiku"))return 8192;let B=process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS;if(B){let Q=parseInt(B,10);if(!isNaN(Q)&&Q>0)return Q}return 32000}

* after
function gI5(A){if(A.includes("3-5"))return 1024;if(A.includes("haiku"))return 1024;let B=process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS;if(B){let Q=parseInt(B,10);if(!isNaN(Q)&&Q>0)return 1024}return 1024}

Changing this function works fine for my env

EDIT

I understood about CLAUDE_CODE_MAX_OUTPUT_TOKENS in the function. We don't need cli.js modification. We can just add "CLAUDE_CODE_MAX_OUTPUT_TOKENS" env into ~/.claude/settings.json.

~/.claude/settings.json

{
  "env": {
    "CLAUDE_CODE_USE_BEDROCK": "true",
    "ANTHROPIC_MODEL": "apac.anthropic.claude-sonnet-4-20250514-v1:0",
    "CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024
  }
}

s-aita avatar Jun 04 '25 04:06 s-aita

@s-aita that eliminates the 429s for me. I tested it up to 16384 and it worked for a while, but it ended up crapping out with 429s after a while of use. I'll investigate later if there's some metric in AWS we can increase to make it work like it does with sonnet 3-7. I recall from @muraoka-delta 's article post that the output token by sonnet 4 is 5x of that. So maybe there's some config somewhere in AWS to account for that increase. TBD.

marklaczynski avatar Jun 04 '25 20:06 marklaczynski

setting my .claude/settings.json file like this resolved the issue for me.

{
  "model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
  "provider": "bedrock",
  "env": {
    "CLAUDE_CODE_USE_BEDROCK": "true",
    "ANTHROPIC_MODEL": "us.anthropic.claude-sonnet-4-20250514-v1:0",
    "CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024
  },
  "permissions": {
    "allow": [
      "Bash(aws configure:*)"
    ],
    "deny": []
  }
}

cachatj avatar Jun 04 '25 22:06 cachatj

@cachatj thanks that helps! with opus it works for a bit and then i get the error " ⎿ API Error (503 Model is getting throttled. Try your request again.) · Retrying in 1 seconds… (attempt 1/10)..."

Any idea why that can be and how to fix it?

jralduaveuthey avatar Jun 05 '25 13:06 jralduaveuthey

Image this env settings works for me

lroolle avatar Jun 05 '25 16:06 lroolle

@cachatj thanks that helps! with opus it works for a bit, and then I get the error " ⎿ API Error (503 Model is getting throttled. Try your request again.) · Retrying in 1 seconds… (attempt 1/10)..."

Any idea why that can be and how to fix it?


@jralduaveuthey - Unfortunately, those token limits are set by anthropic, and everyone hates them. I couldn't agree more & frankly the most significant promise of using Claude code was "full repo/codebase awareness," which it doesn't deliver on at all.

In fact, with some of the pipeline jNBs I work on day to day, it cannot even troubleshoot or productively assist with a single file, let alone the whole codebase. It's a true disappointment, and it pushed me back to co-pilot / Roocode in VS Code and the Claude desktop on its own.

This has been a huge community issue for so long, it's hard to remain optimistic that anthropoic will ever deliver a solution. Oh, and using your own instance of Claude (served from GCP or AWS ) doesn't make it any better. 😢

cachatj avatar Jun 05 '25 19:06 cachatj

Image this env settings works for me

I tried this and my experience got better but after a while (probably 20mins of use - just hitting yes on every change lol) it did the 429 errors

itsjustmeemman avatar Jun 06 '25 00:06 itsjustmeemman

Image this env settings works for me

I tried this and my experience got better but after a while (probably 20mins of use - just hitting yes on every change lol) it did the 429 errors

Yes, bedrock quota issue

lroolle avatar Jun 06 '25 09:06 lroolle

Here's what I found during my testing, which worked for me:

I tested the Bedrock client using the following models:

us.anthropic.claude-sonnet-4-20250514-v1:0, us.anthropic.claude-opus-4-20250514-v1:0

Note: Make sure to set region = us-west-2. Other regions either have lower rate limits or do not support these models.

Claude Sonnet 4 Officially, the maximum max_tokens is documented as 64,000. However, in practice, requests with max_tokens above 39,200 fail. This issue is also reproducible via the Bedrock Chat UI. To avoid errors, I’ve been setting max_tokens below 39,200. If you don't explicitly set it, it defaults to 64,000 and results in an error.

Claude Opus 4 According to the Bedrock Chat UI, the maximum max_tokens is 32,000 — and it does accept that value. However, the strange part is that even though the response is returned successfully, an error is also thrown. This means you'll need special error handling to manage the response properly. I haven’t implemented that yet, so I can’t confirm whether handling it that way would fully resolve the issue.

Conclusion: for sonnet-4 max_token=39200 worked for me.

somesh-scoville avatar Jun 06 '25 10:06 somesh-scoville

Here's what I found during my testing, which worked for me:

I tested the Bedrock client using the following models:

us.anthropic.claude-sonnet-4-20250514-v1:0, us.anthropic.claude-opus-4-20250514-v1:0

Note: Make sure to set region = us-west-2. Other regions either have lower rate limits or do not support these models.

Claude Sonnet 4 Officially, the maximum max_tokens is documented as 64,000. However, in practice, requests with max_tokens above 39,200 fail. This issue is also reproducible via the Bedrock Chat UI. To avoid errors, I’ve been setting max_tokens below 39,200. If you don't explicitly set it, it defaults to 64,000 and results in an error.

Claude Opus 4 According to the Bedrock Chat UI, the maximum max_tokens is 32,000 — and it does accept that value. However, the strange part is that even though the response is returned successfully, an error is also thrown. This means you'll need special error handling to manage the response properly. I haven’t implemented that yet, so I can’t confirm whether handling it that way would fully resolve the issue.

Conclusion: for sonnet-4 max_token=39200 worked for me.

Where are you setting this max_token value? Is it an environment variable?

Thomas-McKanna avatar Jun 10 '25 15:06 Thomas-McKanna

@Thomas-McKanna Some in this thread found CLAUDE_CODE_MAX_OUTPUT_TOKENS as a config for claude code. So use that for now. You'll still get rate limits, but the lower you set it the lower throttling. It's been a trade off for me between rate limits and output tokens. Honestly, I've just been using 3-7 models without any problem, but do still want the v4 to work sooner than later.

marklaczynski avatar Jun 11 '25 19:06 marklaczynski

Hey folks, This is Ara from the Cline team here. I found the answer for this in the AWS service Quotas(Make sure to select the right region).

Basically the Account level service quota for Claude 4 family of models(Sonnet 4 and Opus 4 is this )

Image

I do not believe this Quota is super strict but I can totally see how its very easy to hit this value(for output tokens). I encourage you to go to the account and request and increase for all the related quotas for the models that are hitting these errors for you.

Alternatively you can lower the max tokens of output(CLAUDE_CODE_USE_BEDROCK) but that will only work until it doesn't



{
  "model": "us.anthropic.claude-sonnet-4-20250514-v1:0",
  "provider": "bedrock",
  "env": {
    "CLAUDE_CODE_USE_BEDROCK": "true",
    "ANTHROPIC_MODEL": "us.anthropic.claude-sonnet-4-20250514-v1:0",
    "CLAUDE_CODE_MAX_OUTPUT_TOKENS": 1024
  },
  "permissions": {
    "allow": [
      "Bash(aws configure:*)"
    ],
    "deny": []
  }
}



arafatkatze avatar Jun 14 '25 09:06 arafatkatze

I was experiencing lots of 429 and 503's by simply saying "hello". I switched from Opus 4 to Sonnet 4 and it started working again.

PhillipNinan avatar Jun 17 '25 20:06 PhillipNinan

https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html for better throughput. ARN can be found here https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/inference-profiles

Image

my setup:

  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "0",
    "OTEL_METRICS_EXPORTER": "otlp",
    "AWS_REGION":"us-west-2",
    "ANTHROPIC_SMALL_FAST_MODEL": "arn:aws:bedrock:us-west-2:PROFILE_ID:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0",
    "ANTHROPIC_MODEL":"arn:aws:bedrock:us-west-2:PROFILE_ID:inference-profile/us.anthropic.claude-opus-4-20250514-v1:0",
    "CLAUDE_CODE_USE_BEDROCK": "1"
  }

noma4i avatar Jun 20 '25 05:06 noma4i

It seems that Anthropic has increased the API rate limits. Has anyone tried if this helps solving the problem?

For Claude Opus 4, the new rate limits are: 2,000,000 input tokens per minute 400,000 output tokens per minute

jralduaveuthey avatar Jul 27 '25 13:07 jralduaveuthey

It seems that Anthropic has increased the API rate limits. Has anyone tried if this helps solving the problem?

For Claude Opus 4, the new rate limits are: 2,000,000 input tokens per minute 400,000 output tokens per minute

I've noticed fewer Bedrock API rate limits with larger token counts. I just tried claude with default CLAUDE_CODE_MAX_OUTPUT_TOKENS and it worked. I haven't stress tested it out, but looks ok.

marklaczynski avatar Aug 01 '25 01:08 marklaczynski

Closing this as it's not a Claude Code issue. Thanks for all the debugging and suggestions here! Pls open a new issue if there's something we can fix in CC.

igorkofman avatar Aug 22 '25 18:08 igorkofman

This issue has been automatically locked since it was closed and has not had any activity for 7 days. If you're experiencing a similar issue, please file a new issue and reference this one if it's relevant.

github-actions[bot] avatar Aug 31 '25 14:08 github-actions[bot]