-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the library fork safe and drop the mutex #50
Conversation
@@ -1,2 +1,2 @@ | |||
liquid application/x-liquid 8bit | |||
liquid application/x-liquid 8bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This padding was missing.
When forking, file descriptors are inherited and their state shared. In the context of MiniMime this means that the offset of the file opened by RandomAccessDb is shared across processes, so the `seek + read` combo is subject to inter-process race conditions. Of course that file is lazily opened, so assuming most applications don't query MiniMime before fork, it's not a big problem. However when reforking post boot (e.g. https://github.com/Shopify/pitchfork) this becomes an issue. Additionally, even if the file descriptor isn't shared across processes, the file position is still process global requiring a Mutex. By using `pread` instead of `seek + read` we can both make the library fork safe and get rid of the need to synchronize accesses. This also happens to fix an outstanding JRuby issue. Fix: discourse#37 Fix: discourse#38
85db05a
to
08004a1
Compare
@db.lookup_by_extension(extension) || | ||
@db.lookup_by_extension(extension.downcase) | ||
end | ||
@db ||= new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a small race condition here, that could lead to more than one DB to be instantiated, but only one will be kept so not sure if it's a big deal.
We could eagerly instantiate the db, and reset it when the paths are changed, but that would change the semantic a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me, but I guess there could be a performance impact if a lot of threads try to init the DB at the same time (e.g. because they're all executing the same startup code in lockstep). A possible alternative to avoid multiple initialization could be the DCL-like idiom @db || MUTEX.synchronize { @db ||= new }
. (I can submit a PR if you think that would be a good idea.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good idea. Not sure why I didn't think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperisfine PR here: #56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, note that I'm not maintainer though, it's up to @SamSaffron to merge this or not.
I like this change! the concurrency issue I guess is not ideal, but the complexity of resolving it is high and impact extremely low. Perf wise I assume pread will be faster anyway cause there is one less syscall I am good to merge this! |
@@ -146,8 +140,7 @@ def lookup_uncached(val) | |||
end | |||
|
|||
def resolve(row) | |||
@file.seek(row * @row_length) | |||
Info.new(@file.readline) | |||
Info.new(@file.pread(@row_length, row * @row_length).force_encoding(Encoding::UTF_8)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, I really sorry, I realized that some platforms (e.g. Windows) don't have pread
.
I'll submit another PR to shim it for these.
When forking, file descriptors are inherited and their state shared.
In the context of MiniMime this means that the offset of the file opened by RandomAccessDb is shared across processes, so the
seek + read
combo is subject to inter-process race conditions.Of course that file is lazily opened, so assuming most applications don't query MiniMime before fork, it's not a big problem.
However when reforking post boot (e.g. https://github.com/Shopify/pitchfork) this becomes an issue.
Additionally, even if the file descriptor isn't shared across processes, the file position is still process global requiring a Mutex.
By using
pread
instead ofseek + read
we can both make the library fork safe and get rid of the need to synchronize accesses.This also happens to fix an outstanding JRuby issue.
Fix: #37
Fix: #38
cc @SamSaffron