Providing a virtual hosted style s3 bucket endpoint causes the healthcheck to fail
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
When providing the more modern, virtual-hosted-style bucket endpoint (https://bucket-name.s3.region-code.amazonaws.com) and the bucket name to an aws_s3 sink then the healthcheck fails indicating that the bucket cannot be found:
ERROR vector::topology::builder: Healthcheck: Failed Reason. error=Unknown bucket: "redacted-bucket-name"
The bucket name is required. However, this results in the following (incorrect) URL being generated for listing the bucket: https://bucket-name.s3.region-code.amazonaws.com/bucket-name.
Vector can still put objects in this bucket but they are all placed inside a directory named the same as the bucket because it will generate https://bucket-name.s3.region-code.amazonaws.com/bucket-name/key-prefix/key as the URL to upload to.
Given that path-style URLs are being deprecated, it would be good to support the new virtual-hosted-style URLs.
Configuration
[sources.my_source_id]
type = "file"
include = [ "/var/log/**/*.log" ]
[sinks.my_bucket]
type = "aws_s3"
inputs = [ "my_source_id" ]
bucket = "my-logs"
endpoint = "https://my-logs.s3.eu-west-1.amazonaws.com"
Version
vector 0.11.1 (v0.11.1 x86_64-unknown-linux-musl 2020-12-17)
Debug Output
No response
Example Data
No response
Additional Context
Virtual-hosted-style URLs are useful when you want to restrict egress traffic to specific domains.
References
No response
vector 0.11.1 (v0.11.1 x86_64-unknown-linux-musl 2020-12-17)
I definitely recommend you update to the latest Vector version (0.29.1 right now) and test your case on this version. If the issue still remains - please let us know. So old Vector versions are not supported, unfortunately.
Thanks, I will update and report back :+1:
Sorry for the delay, I have just tried this on vector 0.29.1 (x86_64-unknown-linux-musl 74ae15e 2023-04-20 14:50:42.739094536) and I am getting the same behaviour as originally described. I had to specify the bucket region because otherwise the region header wasn't matching the bucket region, so the config looks like this:
[sources.my_source_id]
type = "file"
include = [ "/var/log/**/*.log" ]
[sinks.my_bucket]
type = "aws_s3"
inputs = [ "my_source_id" ]
bucket = "my-logs"
region = "eu-west-1"
endpoint = "https://my-logs.s3.eu-west-1.amazonaws.com"
It looks like support for virtual-hosted-style URLs was added to the aws rust sdk a few months back, so it might just be a question of updating the sdk? https://github.com/awslabs/aws-sdk-rust/releases/tag/release-2023-01-13
It looks like support for virtual-hosted-style URLs was added to the aws rust sdk a few months back, so it might just be a question of updating the sdk? https://github.com/awslabs/aws-sdk-rust/releases/tag/release-2023-01-13
I believe that's the case, unfortunately we've been blocked from upgrading by a regression that was introduced a few versions ago. Working through that is on my todo list in the next few weeks IIRC.
Excellent, thank you! I will wait for this to be completed then.
Do you happen to have a link to the issue so we can track it? :eyes:
Do you happen to have a link to the issue so we can track it? 👀
I didn't see an issue so I opened https://github.com/vectordotdev/vector/issues/17728
Just tested this again on 0.31.0 and I am still seeing the same problem. It's odd because 0.31.0 supposedly contains this change: https://github.com/vectordotdev/vector/pull/17731 When I look at the forked AWS SDK that the hash points to then it appears to contain the change that should have fixed this: https://github.com/vectordotdev/aws-sdk-rust/blob/3d6aefb7fcfced5fc2a7e761a87e4ddbda1ee670/CHANGELOG.md#january-13th-2023 So it seems that there might be something else going on here.
Just tested on 0.33.0. The issue is still present.
I believe this is happening because of the force_path_style(true) configuration here: https://github.com/vectordotdev/vector/blob/master/src/common/s3.rs#L11
Virtual-hosted-style bucket endpoint is still not supported in v.0.41.1.
Has anyone tried flipping force_path_style(true) to false and building custom image? Or there is more changes required for it to work? AWS S3 sink was updated around v.0.31, so it should teoretically be possible to make it work.
+1 for this request as I have an object storage service that only supports vhost based access so in its current state, the s3 sink is unusable for me.
FWIW I built a custom vector binary (0.39.0) and tried out removing force_path_style(true). It did switch over to dns style bucket names seemlessly. However, it did not respect when I set the following and still used dns style (but I might be missing some configuration - not sure, my guess is the rust sdk doesnt pick up from this config):
# cat ~/.aws/config
[default]
s3 =
addressing_style = path
So it seems there is also a little bit of work to add an option to go back to path based if needed.
FWIW I built a custom vector binary (0.39.0) and tried out removing
force_path_style(true). It did switch over to dns style bucket names seemlessly. However, it did not respect when I set the following and still used dns style (but I might be missing some configuration - not sure, my guess is the rust sdk doesnt pick up from this config):# cat ~/.aws/config [default] s3 = addressing_style = pathSo it seems there is also a little bit of work to add an option to go back to path based if needed.
Did you just flip
let config = config::Builder::from(config).force_path_style(true).build();
true to false to build it? I'm getting errors when building, but maybe this is just compiling problems on my side
I removed the call to force_path_style(true) to just let it use default
#21999
Closed by https://github.com/vectordotdev/vector/pull/21999