duckdb fails to load AWS SSO credentials to connect to AWS S3
What happens?
When creating an S3 secret with PROVIDER credential_chain, duckdb is unable to load valid credentials for a profile configured with the AWS sso mechanism.
Reading from S3 generates an http 403 error.
A workaround in the python API for duckdb is to call:
aws_session = boto3.Session() creds = aws_session.get_credentials().get_frozen_credentials()
and specify the KEY_ID, SECRET and SESSION_TOKEN directly.
Similarly I can manually paste these values into the duckdb cli when creating a secret, but duckdb is unable to load these automatically.
To Reproduce
set
$env:AWS_PROFILE="myprofile"
run the duckdb cli
CREATE SECRET secret2 (
TYPE s3,
PROVIDER credential_chain,
REGION 'af-south-1',
ENDPOINT 's3.af-south-1.amazonaws.com'
);
SELECT * FROM read_parquet('s3://mybucket/mypath/**/*parquet', HIVE_PARTITIONING=1)
OS:
windows x86_64
DuckDB Version:
v1.2.0 5f5512b827
DuckDB Client:
cli
Hardware:
No response
Full Name:
John Leuner
Affiliation:
Absa bank
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
No - Other reason (please specify in the issue body)
Did you include all code required to reproduce the issue?
- [x] Yes, I have
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
- [x] Yes, I have
Is this same as https://github.com/duckdb/duckdb-aws/issues/31?
Yes I think it's very similar to
https://github.com/duckdb/duckdb-aws/issues/31
and probably also a regression of
https://github.com/duckdb/duckdb-aws/issues/14
My testing also seemed to show that this happens not only on Windows x86-64 but also on Amazon Linux 2023
I checked that this still fails in duckdb 1.2.1
Another workaround for duckdb-cli is to first login with aws sso login and then export your credentials as environment variables:
aws configure export-credentials --format powershell
$Env:AWS_ACCESS_KEY_ID="ASXYZZZ" $Env:AWS_SECRET_ACCESS_KEY="oXVAAAABBBBBCCCC" $Env:AWS_SESSION_TOKEN="XXXYYYZZZ"
I found that the trailing hash character was being treated differently by the AWS SDK library linked by duckdb and by the AWS cli tool used to perform aws sso login.
By removing the hash from this url I was able to use an SSO profile
[profile myprofile] sso_start_url = https://orgname.awsapps.com/start#
Also, I had to set the AWS_PROFILE env var to myprofile before starting duckdb.
We are facing the sames issues when using AWS SSO.
Removing the anchor (#) within the sso_start_url did not fix the issue for us.
The http log tells us that there is not even an Authorization header attached to the request.
The workaround using aws configure export-credentials --format env works.
Also see my comments on the other issue
https://github.com/duckdb/duckdb-aws/issues/62#issuecomment-2839235394
Another workaround for duckdb-cli is to first login with aws sso login and then export your credentials as environment variables:
aws configure export-credentials --format powershell
$Env:AWS_ACCESS_KEY_ID="ASXYZZZ" $Env:AWS_SECRET_ACCESS_KEY="oXVAAAABBBBBCCCC" $Env:AWS_SESSION_TOKEN="XXXYYYZZZ"
I am facing the same problems, and this is the only thing that worked for me: defining the envvars. Nothing else actually worked.
In my case, I am using Linux, Ubuntu 24.04 LTS.
I don't have any issues using an SSO profile anymore, I tested with duckdb 1.4.1 on Windows x64
set AWS_PROFILE=myprofile
CREATE SECRET secret2 ( TYPE s3, PROVIDER credential_chain, REGION 'af-south-1', ENDPOINT 's3.af-south-1.amazonaws.com' );
SELECT * FROM read_parquet('s3://mybucket/mypath/**/*parquet', HIVE_PARTITIONING=1)
Does it succeed with a secret having CHAIN set explicitly to 'sso' when reading from an attached database where that database is the Glue Iceberg REST API? I'm referring to SELECT syntax that references a fully qualified table name, as opposed to reading an S3 URI directly. I don't think this issue is completely resolved because that isn't working as of 1.4.1. SSO credentials don't work in that case. The only way such a query can work is by setting environment variables (AWS_ACCESS_KEY_ID, AWS SECRET_ACCESS_KEY, AWS_SESSION_TOKEN) which defeats the purpose of using credential chain in DuckDB Secrets.