-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parseIt is not threadsafe (was: Probabilities printed out of order for multi-threaded LM parser) #16
Comments
Thanks for the report! That's certainly an issue, but I'm afraid parseIt may have much larger issues. At least in parsing mode, I don't think that parseIt is actually thread safe -- I've gotten different outputs (or crashes) when using multithreaded vs. single threaded so I don't recommend it. This is a long time overdue, so I'm finally updating the README and help menu to discourage people from using multiple threads with parseIt for now (b050590) Of course, if anyone wants to fix this printing issue or the larger multithreading problems, I wouldn't object :) |
Dear dmcc, I don't know if there is any progress on the bllip parser since 2013, but currently I am using it for a large project -- where a lot of sentences are needed to be processed in parallel -- and I was wondering if the multi-threading issue is resolved. Have there been some solutions to allow the bllip-parser to run multi-threaded and being thread-safe? Thanks, |
Unfortunately, I'm not aware of any progress on BLLIP in this area. In the past, I've run multiple instances of the parser as a workaround. I still do not recommend using the |
Hello dmcc, Thank you for the quick response. That's pity. I have built a server/wrapper around the parser and it works great on a single core. However, it is rather slow when comparing to , e.g., Stanford which runs multiple threads. On a single thread the BLLIP is actually faster and give a better accuracy. I am willing to take a look at the code. Do you have any suggestions what needs to be fixed? Cheers, |
Hi Dimitar, That would be great if you're willing to take a look at it. It's been a while so my memory is pretty fuzzy on this, but I believe the issue is a race condition in the first stage parser which we never really narrowed down. If you repeatedly parse a small set of long enough sentences with Good luck! |
Hi David, So, I did some look into the multithreading today. Something is definitely fishy. So what I did is just run some tests and some diagnostics. Here is a summary. I will continue digging and will keep posting.
Cheers, |
Thanks for the update -- please keep them coming! Yeah, that sounds (vaguely) familiar. I think I ran |
OK, So, two things. First, in the parseIt/mainLoop there is a print loop:
which is outside of the reading/parsing loop. This loop and the sleep delay the printing process, even if the parse is finished. To be honest, I don't get why is this loop there since there is a print Second, I haven't looked more in details today. If I have time next week I will dig more into it. Cheers, |
Hi David, I did some look around and to be honest didn't figure out what exactly causes helgrade to complain. I modified some things in the parse method. First, switched the global sentence count from Second, the printing function I took out of the main loop - this way they should work on parallel (added one more semaphore). If you want, I will send the code for review. But I wanted to ask - how do I set the option for multithreading (the Thanks, |
switched the global sentence count from int to unsigned long long int
Yup, this makes sense. Please feel free to submit this as a separate patch.
how do I set the option for multithreading (the -t option) with the
python wrapper
There used to be an option to pass a thread ID to the first-stage parser in
the Python wrapper, but I removed it (
1858d75)
since it didn't seem like threads were going to get fixed. The main pieces
were "ThreadManager" which let you sign out a ThreadSlot (thread ID) for
exclusive use and a way to pass ThreadSlots to parse (
1858d75#diff-491cfc4441b3f94d768196cd1252263cL201
).
From your previous message:
Second, helgrind detects a racing condition somewhere in
MeChart::bestParse. It complains about complains about functions from
fnSubFns,C which make access to the FullHist*. I assume that the issue
arrises there - reading and writing something to FullHist in an
uncontrolled way.
This could definitely be an issue and explain why parsing was
non-deterministic with multiple threads -- would be interested to hear if
you were able to dig any deeper.
Hope this helps!
…On Tue, Jun 13, 2017 at 7:51 AM, KantanLabs ***@***.***> wrote:
Hi David,
I did some look around and to be honest didn't figure out what exactly
causes helgrade to complain. I modified some things in the parse method.
First, switched the global sentence count from int to unsigned long long
int. If I am not wrong, for more than 32,767 sentences there would be
errors.
Second, the printing function I took out of the main loop - this way they
should work on parallel (added one more semaphore). If you want, I will
send the code for review.
But I wanted to ask - how do I set the option for multithreading (the -t
option) with the python wrapper?
Thanks,
Dimitar.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#16 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAm5ZXatlUQ_XLMZ7b_3s3TExK_N329bks5sDnergaJpZM4BOS-a>
.
|
Hello David, Sorry for my late reply - being busy with lots of stuff. Once I have a bit of free time I will submit the change for the long long int. Regarding the race condition, it is possible that the history (FullHist) is not thread safe... but not yet verified. I have, however, two questions:
Thanks, |
Hi Dimitar,
Sure, I'd be happy to review changes.
If you're planning to make a new REST server frontend, I'd recommend making
it a separate file from parseIt.C (maybe parseREST.C?). Ideally, it would
only need to call things in first-stage/PARSE/SimpleAPI.h though you will
need to extend the parse() calls to use thread IDs again.
Hope this helps,
David
…On Tue, Jun 20, 2017 at 4:49 PM, KantanLabs ***@***.***> wrote:
Hello David,
Sorry for my late reply - being busy with lots of stuff.
Once I have a bit of free time I will submit the change for the long long
int.
Regarding the race condition, it is possible that the history (FullHist)
is not thread safe... but not yet verified.
I have, however, two questions:
1. If I submit more code changes, will you review it? (maybe that
sounds like a silly question, but to be honest I haven't worked on a github
project like this before.)
2. The whole idea for fixing the multithreading arose from the fact
that I have a python rest-server which I want to connect to the parser and
have it running as fast and as distributed as possible. This, however, will
not be solved with having the parser multithreaded, because the python
wrapper by itself parses the sentences one-by-one (sending and waiting for
a response). So, having a python rest server will not help. So, my second
question - can parseIt.C be extended with REST capabilities? to listen to a
port, receive a request and then send the parse back (in, e.g., json
format). I have been looking into pistache for such a solution, but still
need to work on it. If you have any idea/suggestion/comment on this, that
would be great.
Thanks,
Kind regards,
Dimitar.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#16 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAm5ZRdWED0fjdNQehnM7lu2levb3PfAks5sGDBFgaJpZM4BOS-a>
.
|
Hi David, After long search through the BLLIP parser code I am sort of giving up. I reached to a point where I think (one) problem comes from the "edgeSubFns". There is this array of functions that is accessed by each thread. I tried to work on this, but couldn't make it work. Anyway, I tried a lot of other things (following gdb and valgrind's info) but coudln't make it work properly for multiple threads. I would like to share my last version and maybe you or somebody can take over. Maybe I am on the right track and just missing something small (or maybe I am shoooting tooo far away). Cheers, |
Hi Dimitar,
Thanks for the report and your work on this problem. It is definitely a
challenging codebase. If you have time, it would be awesome if you checked
in your changes to your fork of the parser for others who might want to
take this on (I'm afraid I don't have the bandwidth at this point).
Take care,
David
…On Wed, Aug 2, 2017 at 3:13 PM, KantanLabs ***@***.***> wrote:
Hi David,
After long search through the BLLIP parser code I am sort of giving up. I
reached to a point where I think (one) problem comes from the "edgeSubFns".
There is this array of functions that is accessed by each thread. I tried
to work on this, but couldn't make it work.
Anyway, I tried a lot of other things (following gdb and valgrind's info)
but coudln't make it work properly for multiple threads.
I would like to share my last version and maybe you or somebody can take
over. Maybe I am on the right track and just missing something small (or
maybe I am shoooting tooo far away).
Cheers,
Dimitar.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#16 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAm5ZbFPT5sCTnwKZ3-bmB6AEYYz6_M5ks5sUPRzgaJpZM4BOS-a>
.
|
in first-stage/PARSE/parseIt.C , the probabilities are written out directly, instead of being pushed to the print stack. This means that they appear out of order in multi-threaded LM parsing.
The text was updated successfully, but these errors were encountered: