`failed to refresh cached credentials` when using aws provider to connect to object storage
Hey,
I'm having issues using the aws provider to connect to the object storage. Don't know if bug or user error. I'm trying something like in the example. I'm getting this error during tofu init:
╷
│ Error: No valid credential sources found
│
│ with provider["registry.opentofu.org/hashicorp/aws"],
│ on providers.tf line 28, in provider "aws":
│ 28: provider "aws" {
│
│ Please see https://registry.terraform.io/providers/hashicorp/aws
│ for more information about providing credentials.
│
│ Error: failed to refresh cached credentials, no EC2 IMDS role found,
│ operation error ec2imds: GetMetadata, failed to get API token, operation
│ error ec2imds: getToken, http response error StatusCode: 400, request to
│ EC2 IMDS failed
│
╵
Here is my opentofu code snippet:
terraform {
required_version = "~> 1.10.0"
required_providers {
# for creating STACKIT resources
stackit = {
source = "stackitcloud/stackit"
version = "~> 0.58.0"
# for writing the ACL policy on the STACKIT Object Storage Bucket
aws = {
source = "hashicorp/aws"
version = "6.4.0"
}
}
}
provider "stackit" {
default_region = var.stackit_default_region
enable_beta_resources = true
# authentication via environment variable
}
provider "aws" {
region = var.stackit_default_region
skip_credentials_validation = true
skip_region_validation = true
skip_requesting_account_id = true
# skip_metadata_api_check = true
access_key = stackit_objectstorage_credential.usage_credential.access_key
secret_key = stackit_objectstorage_credential.usage_credential.secret_access_key
endpoints {
s3 = "https://object.storage.${var.stackit_default_region}.onstackit.cloud"
}
}
resource "stackit_objectstorage_bucket" "usage-bucket" {
project_id = var.stackit_project_id
name = local.obj_str_bucket_name
}
resource "stackit_objectstorage_credentials_group" "usage-group" {
project_id = var.stackit_project_id
name = local.obj_str_creds_grp_name
}
resource "stackit_objectstorage_credential" "usage_credential" {
project_id = var.stackit_project_id
credentials_group_id = stackit_objectstorage_credentials_group.usage-group.credentials_group_id
lifecycle {
create_before_destroy = true
replace_triggered_by = [null_resource.credential_rotation_trigger]
}
}
# ACLs on Object Storage need to be set up with aws provider
resource "aws_s3_bucket_policy" "acl_policy" {
bucket = stackit_objectstorage_bucket.usage-bucket.name
policy = <<EOF
{
"Statement":[
{
"Sid":"Restrict-IP-Range",
"Effect":"Deny",
"Principal":"*",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::${stackit_objectstorage_bucket.usage-bucket.name}/*",
"arn:aws:s3:::${stackit_objectstorage_bucket.usage-bucket.name}"
],
"Condition": {
"NotIpAddress": {
"aws:SourceIp": ${local.acls}
}
}
}
]
}
EOF
}
When setting skip_metadata_api_check = true, I get:
╷
│ Error: No valid credential sources found
│
│ with provider["registry.opentofu.org/hashicorp/aws"],
│ on providers.tf line 28, in provider "aws":
│ 28: provider "aws" {
│
│ Please see https://registry.terraform.io/providers/hashicorp/aws
│ for more information about providing credentials.
│
│ Error: failed to refresh cached credentials, no EC2 IMDS role found,
│ operation error ec2imds: GetMetadata, access disabled to EC2 IMDS via
│ client option, or "AWS_EC2_METADATA_DISABLED" environment variable
│
╵
Currently don't see what I'm doing wrong here. Any advice?
Also, just noticed that the object storage aws provider example has a formatting issue:
You're telling Terraform not to attempt using EC2 Instance Metadata (IMDS), but it still can't use your dynamic credentials because they are not known yet.
The aws provider is only used after the credentials are created, so no IMDS fallback or failure happens.
You need to alias the AWS provider and use it only after credentials are available. Create the AWS provider with alias:
provider "aws" { alias = "stackit" region = var.stackit_default_region skip_credentials_validation = true skip_region_validation = true skip_requesting_account_id = true access_key = stackit_objectstorage_credential.usage_credential.access_key secret_key = stackit_objectstorage_credential.usage_credential.secret_access_key endpoints { s3 = "https://object.storage.${var.stackit_default_region}.onstackit.cloud" } }
Reference the aliased provider in the ACL policy:
resource "aws_s3_bucket_policy" "acl_policy" { provider = aws.stackit bucket = stackit_objectstorage_bucket.usage-bucket.name policy = <<EOF { "Statement":[ { "Sid":"Restrict-IP-Range", "Effect":"Deny", "Principal":"", "Action": [ "s3:" ], "Resource": [ "arn:aws:s3:::${stackit_objectstorage_bucket.usage-bucket.name}/*", "arn:aws:s3:::${stackit_objectstorage_bucket.usage-bucket.name}" ], "Condition": { "NotIpAddress": { "aws:SourceIp": ${local.acls} } } } ] } EOF }
Run it
Run it
Getting the exact same errors. Also, why would I need the workaround with the provider alias? The vault example doesn't need this and it works as expected. Provider configurations are evaluated lazily. Creating a secrets manager and secrets in the same apply is possible.
Are there any updates on this? I have the same issue.
@mardonner did you find a solution by any chance?
Are there any updates on this? I have the same issue.
@mardonner did you find a solution by any chance?
Hi @lupa95 , I happen to be working on it again in the next few days/ next week. I'll test if this issue still exists with the current version of tofu and the providers and maybe try to provide more logs with TF_LOG=trace. In the meantime If it doesn't work, I guess I'll have to use a local-exec provisioner and call some CLI tool for interacting with s3 to place my policy...
There has not been a response from STACKIT on this issue as far I can tell. @h3adex, I see you initially added the documentation for interacting with object storage with #583. Do you have any more insights or advice for us?
Are there any updates on this? I have the same issue. @mardonner did you find a solution by any chance?
Hi @lupa95 , I happen to be working on it again in the next few days/ next week. I'll test if this issue still exists with the current version of tofu and the providers and maybe try to provide more logs with
TF_LOG=trace. In the meantime If it doesn't work, I guess I'll have to use alocal-execprovisioner and call some CLI tool for interacting with s3 to place my policy...There has not been a response from STACKIT on this issue as far I can tell. @h3adex, I see you initially added the documentation for interacting with object storage with #583. Do you have any more insights or advice for us?
Mhh I've written the guide using terraform version v1.5.7. What terraform/opentofu version are you using?
Just checked, back then it was opentofu v1.10.1.
Do you mind retrying it using the terraform version I used?
Not done with testing but got some new ideas that could explain this:
access_key = stackit_objectstorage_credential.usage_credential.access_key
secret_key = stackit_objectstorage_credential.usage_credential.secret_access_key
This is the problem. Even though, I have
skip_credentials_validation = true
skip_region_validation = true
skip_requesting_account_id = true
skip_metadata_api_check = true
set in the provider config, the aws provider will not properly skip these checks during tofu plan. Hardcoding the access_key and secret_key let's me at least pass the plan phase.
This means dynamically setting these details doesn't work, which is pretty inconvenient. This makes me have this assumption. This thread hints that at least some type of authentication detail needs to be available early. I was expecting the same behaviour as the vault provider, with which dynamic auth configuration actually works properly.
I have not tested yet, if I can set a dummy profile like the thread originally suggested, while also setting values for access_key and secret_key dynamically like in the example. I'd hope that it then uses the dummy profile during plan and then the keys during apply.
I hope this works because I imagine it'd take some time to get the expected behaviour merged on the aws provider 🫠
Ok, I'm beginning to understand.
The aws provider docs show in which order the configuration sources are evaluated.
Configuration for the AWS Provider can be derived from several sources, which are applied in the following order: 1.Parameters in the provider configuration 2.Environment variables 3.Shared credentials files 4.Shared configuration files 5.Container credentials 6.Instance profile credentials and Region
I believe the original error was not because the skip_xyz options weren't respected but because I was falling through to the last configuration method (which produced this error) because somehow, referencing resource attributes in the access_key and secret_key definition are not recognised. Don't know if that's an issue with tofu or the aws provider.
I now have this stupid workaround:
provider "aws" {
region = var.stackit_default_region
skip_credentials_validation = true
skip_region_validation = true
skip_requesting_account_id = true
skip_metadata_api_check = true
access_key = stackit_objectstorage_credential.usage_credential.access_key
secret_key = stackit_objectstorage_credential.usage_credential.secret_access_key
shared_credentials_files = ["${path.module}/aws.profile"]
endpoints {
s3 = "https://object.storage.${var.stackit_default_region}.onstackit.cloud"
}
}
and in my pipeline, right before the plan command:
cat << EOF > aws.profile
[default]
aws_access_key_id = xyz
aws_secret_access_key = xyz
EOF
xyz literally, just to have some dummy values. I explicitly reference this as shared_credentials_file to to get through the plan stage.
Strangely, this issue only happens during plan phase. During apply phase, this order is evaluated correctly and the actual values from the resource attributes are used instead of the file.
With this I get:
module.s3_cfg[0].aws_s3_bucket_policy.acl_policy: Creating...
module.s3_cfg[0].aws_s3_bucket_policy.acl_policy: Still creating... [10s elapsed]
module.s3_cfg[0].aws_s3_bucket_policy.acl_policy: Creation complete after 11s [id=xyz]
This cannot be the best solution for this, right?! If anyone knows a better way, please let me know.
Nevertheless, the stackit provider docs should later be changed to whatever happens to be the best solution.
(@h3adex @lupa95 tagging because I don't know if you subscribed to notifications on this issue)
edit: I'm on tofu v1.10.7 with aws provider v6.22.1