Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mini_mime vs marcel #34

Open
pjmartorell opened this issue Mar 27, 2021 · 4 comments
Open

mini_mime vs marcel #34

pjmartorell opened this issue Mar 27, 2021 · 4 comments

Comments

@pjmartorell
Copy link

pjmartorell commented Mar 27, 2021

Hi, I don't know if this is the right place to post it, but I'm trying to compare mini_mime vs marcel regarding looking up by extension, because I think both gems cover the same space. I was trying to compare the number of extensions registered, the performance and memory consumption of every gem.

mini_mime marcel
#extensions ​ File.open(MiniMime::Configuration.ext_db_path).readlines.count => 1196 Marcel::EXTENSIONS.count => 1243

Regarding memory handling, mini_mime has a hash cache of 200 rows and misses are binary-searched from a file while marcel loads all records in a hash in memory. Is not reading from a file less performant than loading everything in memory? Loading everything in memory consumes more memory obviously, but the gain in performance outweighs the memory consumption, in my opinion.

Also I noticed that both DBs in mini_mime contain similar data but is there any reason why are not both DBs merged removing duplicates? I saw that when merging both files the number of rows/extensions is 1210, but I'm not completely sure if it's due to an error removing duplicates:

irb(main)> File.readlines(MiniMime::Configuration.ext_db_path).each do |line|
irb(main)*     s << line.strip
irb(main)> end
irb(main)> File.readlines(MiniMime::Configuration.content_type_db_path).each do |line|
irb(main)*     s << line.strip
irb(main)> end
irb(main)> s.length
=> 1210
@ahorek
Copy link

ahorek commented Apr 1, 2021

unlike mini_mime, which is just a simple table of extension -> content type, marcel and mime_magic also allow lookup by file signature https://en.wikipedia.org/wiki/List_of_file_signatures (magic numbers). This is considered as a security feature, that's why Rails use it.

https://github.com/mime-types/ruby-mime-types - has a much more complex API, mini_mime uses the same DB, but it's simplified for performance reasons (1 extension = 1 mime type).

btw Rack also has its own DB
https://github.com/rack/rack/blob/master/lib/rack/mime.rb#L51

sometimes it's hard to persuade some maintainers to do a change rest-client/rest-client#557 and it would be even harder to do a much more breaking change in marcel just to save a few kb of memory. Yes, it would be nice and I'm 100% pro, but I also don't think it's realistic :)

@pjmartorell
Copy link
Author

@ahorek thanks for the reference to rest-client/rest-client#557, is exactly what I wanted to know/understand.

@SamSaffron
Copy link
Member

I started discussing @georgeclaghorn

My long term thinking here.

  • Move discourse/mini_mime to rails/mini_mime ...
  • Merge marcel into mini_mime so mini_mime can also do lookup by file content
  • Keep parity with mime-types (which was an underlying goal) ... so if Marcel has 1243 and mime-types only has 1196 we got to upstream missing stuff into mime-types-data
  • Keep the perf characteristics of mini-mime (aims to be fastest implementation for cached lookups, reasonable default cache size) . Majority of processes do a very small amount of mime lookups misses are unlikely. Many processes do no mime type lookups, no point keeping in memory
  • Keep to a very tiny public interface, lookup by extension / type / filename / content.

@halostatue
Copy link

@SamSaffron I am all in favour of adding more data to mime-types-data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants