vector icon indicating copy to clipboard operation
vector copied to clipboard

Providing a virtual hosted style s3 bucket endpoint causes the healthcheck to fail

Open bkaznowski opened this issue 2 years ago • 12 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When providing the more modern, virtual-hosted-style bucket endpoint (https://bucket-name.s3.region-code.amazonaws.com) and the bucket name to an aws_s3 sink then the healthcheck fails indicating that the bucket cannot be found:

ERROR vector::topology::builder: Healthcheck: Failed Reason. error=Unknown bucket: "redacted-bucket-name"

The bucket name is required. However, this results in the following (incorrect) URL being generated for listing the bucket: https://bucket-name.s3.region-code.amazonaws.com/bucket-name.

Vector can still put objects in this bucket but they are all placed inside a directory named the same as the bucket because it will generate https://bucket-name.s3.region-code.amazonaws.com/bucket-name/key-prefix/key as the URL to upload to.

Given that path-style URLs are being deprecated, it would be good to support the new virtual-hosted-style URLs.

Configuration

[sources.my_source_id]
type = "file"
include = [ "/var/log/**/*.log" ]
[sinks.my_bucket]
type = "aws_s3"
inputs = [ "my_source_id" ]
bucket = "my-logs"
endpoint = "https://my-logs.s3.eu-west-1.amazonaws.com"

Version

vector 0.11.1 (v0.11.1 x86_64-unknown-linux-musl 2020-12-17)

Debug Output

No response

Example Data

No response

Additional Context

Virtual-hosted-style URLs are useful when you want to restrict egress traffic to specific domains.

References

No response

bkaznowski avatar May 09 '23 14:05 bkaznowski

vector 0.11.1 (v0.11.1 x86_64-unknown-linux-musl 2020-12-17)

I definitely recommend you update to the latest Vector version (0.29.1 right now) and test your case on this version. If the issue still remains - please let us know. So old Vector versions are not supported, unfortunately.

zamazan4ik avatar May 09 '23 14:05 zamazan4ik

Thanks, I will update and report back :+1:

bkaznowski avatar May 09 '23 14:05 bkaznowski

Sorry for the delay, I have just tried this on vector 0.29.1 (x86_64-unknown-linux-musl 74ae15e 2023-04-20 14:50:42.739094536) and I am getting the same behaviour as originally described. I had to specify the bucket region because otherwise the region header wasn't matching the bucket region, so the config looks like this:

[sources.my_source_id]
type = "file"
include = [ "/var/log/**/*.log" ]
[sinks.my_bucket]
type = "aws_s3"
inputs = [ "my_source_id" ]
bucket = "my-logs"
region = "eu-west-1"
endpoint = "https://my-logs.s3.eu-west-1.amazonaws.com"

bkaznowski avatar May 16 '23 12:05 bkaznowski

It looks like support for virtual-hosted-style URLs was added to the aws rust sdk a few months back, so it might just be a question of updating the sdk? https://github.com/awslabs/aws-sdk-rust/releases/tag/release-2023-01-13

bkaznowski avatar May 17 '23 09:05 bkaznowski

It looks like support for virtual-hosted-style URLs was added to the aws rust sdk a few months back, so it might just be a question of updating the sdk? https://github.com/awslabs/aws-sdk-rust/releases/tag/release-2023-01-13

I believe that's the case, unfortunately we've been blocked from upgrading by a regression that was introduced a few versions ago. Working through that is on my todo list in the next few weeks IIRC.

spencergilbert avatar May 17 '23 11:05 spencergilbert

Excellent, thank you! I will wait for this to be completed then.

bkaznowski avatar May 18 '23 09:05 bkaznowski

Do you happen to have a link to the issue so we can track it? :eyes:

bkaznowski avatar Jun 22 '23 13:06 bkaznowski

Do you happen to have a link to the issue so we can track it? 👀

I didn't see an issue so I opened https://github.com/vectordotdev/vector/issues/17728

spencergilbert avatar Jun 22 '23 16:06 spencergilbert

Just tested this again on 0.31.0 and I am still seeing the same problem. It's odd because 0.31.0 supposedly contains this change: https://github.com/vectordotdev/vector/pull/17731 When I look at the forked AWS SDK that the hash points to then it appears to contain the change that should have fixed this: https://github.com/vectordotdev/aws-sdk-rust/blob/3d6aefb7fcfced5fc2a7e761a87e4ddbda1ee670/CHANGELOG.md#january-13th-2023 So it seems that there might be something else going on here.

bkaznowski avatar Aug 14 '23 16:08 bkaznowski

Just tested on 0.33.0. The issue is still present.

bkaznowski avatar Oct 10 '23 13:10 bkaznowski

I believe this is happening because of the force_path_style(true) configuration here: https://github.com/vectordotdev/vector/blob/master/src/common/s3.rs#L11

ashrayjain avatar Mar 04 '24 19:03 ashrayjain

Virtual-hosted-style bucket endpoint is still not supported in v.0.41.1.

Has anyone tried flipping force_path_style(true) to false and building custom image? Or there is more changes required for it to work? AWS S3 sink was updated around v.0.31, so it should teoretically be possible to make it work.

Babbadger avatar Oct 17 '24 20:10 Babbadger

+1 for this request as I have an object storage service that only supports vhost based access so in its current state, the s3 sink is unusable for me.

sam6258 avatar Nov 17 '24 10:11 sam6258

FWIW I built a custom vector binary (0.39.0) and tried out removing force_path_style(true). It did switch over to dns style bucket names seemlessly. However, it did not respect when I set the following and still used dns style (but I might be missing some configuration - not sure, my guess is the rust sdk doesnt pick up from this config):

# cat ~/.aws/config 
[default]
s3 =
    addressing_style = path

So it seems there is also a little bit of work to add an option to go back to path based if needed.

sam6258 avatar Nov 18 '24 09:11 sam6258

FWIW I built a custom vector binary (0.39.0) and tried out removing force_path_style(true). It did switch over to dns style bucket names seemlessly. However, it did not respect when I set the following and still used dns style (but I might be missing some configuration - not sure, my guess is the rust sdk doesnt pick up from this config):

# cat ~/.aws/config 
[default]
s3 =
    addressing_style = path

So it seems there is also a little bit of work to add an option to go back to path based if needed.

Did you just flip

    let config = config::Builder::from(config).force_path_style(true).build();

true to false to build it? I'm getting errors when building, but maybe this is just compiling problems on my side

Babbadger avatar Nov 20 '24 14:11 Babbadger

I removed the call to force_path_style(true) to just let it use default

sam6258 avatar Nov 20 '24 14:11 sam6258

#21999

sam6258 avatar Dec 10 '24 06:12 sam6258

Closed by https://github.com/vectordotdev/vector/pull/21999

jszwedko avatar Dec 20 '24 01:12 jszwedko