aws-sdk-cpp icon indicating copy to clipboard operation
aws-sdk-cpp copied to clipboard

Content-Type for files uploaded via S3 automatically set to application/xml

Open westonpace opened this issue 4 years ago • 2 comments

Describe the bug

When I upload a file to S3 (using a multipart upload request) the content-type of the file will be application/xml unless I specify otherwise. This seems incorrect as a content-type should be omitted if unknown or, at worst, default to application/octet-stream. Per RFC 7231 (3.1.1.5):

A sender that generates a message containing a payload body SHOULD generate a Content-Type header field in that message unless the intended media type of the enclosed representation is unknown to the sender. If a Content-Type header field is not present, the recipient MAY either assume a media type of "application/octet-stream" ([RFC2046], Section 4.5.1) or examine the data to determine its type.

This ended up causing a bit of confusion here (https://github.com/apache/arrow/issues/11934). An S3 client was trying to be intelligent and inspect the XML data if the file was an XML file and this issue caused the client to inspect files it shouldn't.

Expected behavior

If the content type of a file is not set then the file should either have no content-type or the content-type should be set to application/octet-stream.

Current behavior

The file's content-type is set to application/xml

Steps to Reproduce

Reproducible Gist: https://gist.github.com/westonpace/9c3a0baa48083f33aa4880c0cb6a602b

Possible Solution

When the user does not specify a content-type either leave it unset or default to application/octet-stream

AWS CPP SDK version used

1.8.185

Compiler and Version used

GCC 9.3.0

Operating System and version

Ubuntu 20.04.3

westonpace avatar Jan 12 '22 19:01 westonpace

Hi @westonpace , Quick question here before I try to dig too deep into this, have you tried the transferManager to do multipart uploads or is there a reason why you can't? I just tried and I didn't get the same behavior so it might be a good workaround to get you unblocked?

KaibaLopez avatar Jan 19 '22 23:01 KaibaLopez

@KaibaLopez Thanks for the suggestions. I was working on the Apache Arrow S3 filesystem adapter which currently does not use the transfer manager (https://github.com/apache/arrow/blob/master/cpp/src/arrow/filesystem/s3fs.cc). Although that may be an interesting experiment someday it would add an extra dependency and be a bit more of a change.

I'm not really blocked by this. It was simple enough to ensure we always specify the content type. Perhaps the main issue was simply that this default isn't documented anywhere and so it was a surprise and took a little while to isolate the root cause.

westonpace avatar Jan 20 '22 00:01 westonpace