mini_mime icon indicating copy to clipboard operation
mini_mime copied to clipboard

mini_mime vs marcel

Open pjmartorell opened this issue 4 years ago • 4 comments

Hi, I don't know if this is the right place to post it, but I'm trying to compare mini_mime vs marcel regarding looking up by extension, because I think both gems cover the same space. I was trying to compare the number of extensions registered, the performance and memory consumption of every gem.

mini_mime marcel
#extensions ​ File.open(MiniMime::Configuration.ext_db_path).readlines.count => 1196 Marcel::EXTENSIONS.count => 1243

Regarding memory handling, mini_mime has a hash cache of 200 rows and misses are binary-searched from a file while marcel loads all records in a hash in memory. Is not reading from a file less performant than loading everything in memory? Loading everything in memory consumes more memory obviously, but the gain in performance outweighs the memory consumption, in my opinion.

Also I noticed that both DBs in mini_mime contain similar data but is there any reason why are not both DBs merged removing duplicates? I saw that when merging both files the number of rows/extensions is 1210, but I'm not completely sure if it's due to an error removing duplicates:

irb(main)> File.readlines(MiniMime::Configuration.ext_db_path).each do |line|
irb(main)*     s << line.strip
irb(main)> end
irb(main)> File.readlines(MiniMime::Configuration.content_type_db_path).each do |line|
irb(main)*     s << line.strip
irb(main)> end
irb(main)> s.length
=> 1210

pjmartorell avatar Mar 27 '21 10:03 pjmartorell

unlike mini_mime, which is just a simple table of extension -> content type, marcel and mime_magic also allow lookup by file signature https://en.wikipedia.org/wiki/List_of_file_signatures (magic numbers). This is considered as a security feature, that's why Rails use it.

https://github.com/mime-types/ruby-mime-types - has a much more complex API, mini_mime uses the same DB, but it's simplified for performance reasons (1 extension = 1 mime type).

btw Rack also has its own DB https://github.com/rack/rack/blob/master/lib/rack/mime.rb#L51

sometimes it's hard to persuade some maintainers to do a change https://github.com/rest-client/rest-client/pull/557 and it would be even harder to do a much more breaking change in marcel just to save a few kb of memory. Yes, it would be nice and I'm 100% pro, but I also don't think it's realistic :)

ahorek avatar Apr 01 '21 17:04 ahorek

@ahorek thanks for the reference to https://github.com/rest-client/rest-client/pull/557, is exactly what I wanted to know/understand.

pjmartorell avatar Apr 01 '21 18:04 pjmartorell

I started discussing @georgeclaghorn

My long term thinking here.

  • Move discourse/mini_mime to rails/mini_mime ...
  • Merge marcel into mini_mime so mini_mime can also do lookup by file content
  • Keep parity with mime-types (which was an underlying goal) ... so if Marcel has 1243 and mime-types only has 1196 we got to upstream missing stuff into mime-types-data
  • Keep the perf characteristics of mini-mime (aims to be fastest implementation for cached lookups, reasonable default cache size) . Majority of processes do a very small amount of mime lookups misses are unlikely. Many processes do no mime type lookups, no point keeping in memory
  • Keep to a very tiny public interface, lookup by extension / type / filename / content.

SamSaffron avatar Apr 05 '21 01:04 SamSaffron

@SamSaffron I am all in favour of adding more data to mime-types-data.

halostatue avatar Nov 08 '21 04:11 halostatue