mime icon indicating copy to clipboard operation
mime copied to clipboard

Should mime just use the MIME sniffing algorithm?

Open seanmonstar opened this issue 7 years ago • 8 comments

The target domain of the mime crate is webdev. Instead of following the original RFCs (as is done now), perhaps it's best to just use the sniffing algorithm that is now used by web browsers.

seanmonstar avatar Jan 15 '19 02:01 seanmonstar

cc @nox @SimonSapin @rustonaut

seanmonstar avatar Jan 15 '19 02:01 seanmonstar

https://mimesniff.spec.whatwg.org/ is called "MIME Sniffing" and contains a parse a MIME type algorithm that is relevant.

But "sniffing" refers to looking at the contents of a file or the body of an HTTP response (in addition to other signals) to make a guess at the actual file format, in case the Content-Type header is missing or unspecific or inaccurate. For example, if the first 6 bytes of a file are GIF89a in ASCII it’s very probably a GIF, especially if it’s used in <img>. That spec also has algorithms for this.

This kind of sniffing can be useful, but I don’t know if it should be in scope for this crate.

SimonSapin avatar Jan 15 '19 02:01 SimonSapin

Sorry, I don't mean sniffing the body bytes, just using the parse algorithm mentioned in that document.

seanmonstar avatar Jan 15 '19 05:01 seanmonstar

So, looking through the test cases, I noticed this as a valid MIME type:

!#$%&'+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz/!#$%&'+-.^`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz;!#$%&'*+-.^ `|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz=!#$%&'*+-.^_`|~0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

Something I appreciate in the API in mime/master is the difference between MediaType and MediaRange. They allow things like text/* to be a MediaRange, but not MediaType. That combined with headers::ContentType would help prevent setting a frankly bogus content-type header (even though mimesniff says to parse it).

So I'm torn.

seanmonstar avatar Jan 15 '19 18:01 seanmonstar

After some more thought, the advantages of just following what the Fetch spec wants outweighs having MediaType and MediaRange splits.

So, the new plan is to remove the split, only having Mime again, and only supporting the mimesniff parsing algorithm.

seanmonstar avatar Jan 22 '19 17:01 seanmonstar

The closest it is to the mimesniff algorithm, the more we can make use of it.

nox avatar Jan 30 '19 12:01 nox

What would be useful too is a way to represent just the essence of a mime type, because many specs have prose about that.

nox avatar Jan 30 '19 12:01 nox

Hi,

Is there a way to expose the both parsers (rfc and mime-sniff)? Actually i'd like to make some servo tests pass, so i need to follow the mime-sniff algo. @SimonSapin already has implemented it in rust-url (but not officially exposed by the crate). Should i duplicate the code in servo or can i help here?

Regards

ghostd avatar Oct 28 '20 07:10 ghostd