parseIt is not threadsafe (was: Probabilities printed out of order for multi-threaded LM parser) #16

syllog1sm · 2013-11-19T10:04:58Z

in first-stage/PARSE/parseIt.C , the probabilities are written out directly, instead of being pushed to the print stack. This means that they appear out of order in multi-threaded LM parsing.

dmcc · 2013-11-19T21:50:52Z

Thanks for the report! That's certainly an issue, but I'm afraid parseIt may have much larger issues.

At least in parsing mode, I don't think that parseIt is actually thread safe -- I've gotten different outputs (or crashes) when using multithreaded vs. single threaded so I don't recommend it. This is a long time overdue, so I'm finally updating the README and help menu to discourage people from using multiple threads with parseIt for now (b050590)

Of course, if anyone wants to fix this printing issue or the larger multithreading problems, I wouldn't object :)

KantanLabs · 2017-05-23T15:09:43Z

Dear dmcc,

I don't know if there is any progress on the bllip parser since 2013, but currently I am using it for a large project -- where a lot of sentences are needed to be processed in parallel -- and I was wondering if the multi-threading issue is resolved.

Have there been some solutions to allow the bllip-parser to run multi-threaded and being thread-safe?

Thanks,
Dimitar.

dmcc · 2017-05-23T16:21:06Z

Unfortunately, I'm not aware of any progress on BLLIP in this area. In the past, I've run multiple instances of the parser as a workaround. I still do not recommend using the -t option.

KantanLabs · 2017-05-23T17:10:29Z

Hello dmcc,

Thank you for the quick response. That's pity. I have built a server/wrapper around the parser and it works great on a single core. However, it is rather slow when comparing to , e.g., Stanford which runs multiple threads. On a single thread the BLLIP is actually faster and give a better accuracy.

I am willing to take a look at the code. Do you have any suggestions what needs to be fixed?

Cheers,
Dimitar.

dmcc · 2017-05-24T00:56:46Z

Hi Dimitar,

That would be great if you're willing to take a look at it. It's been a while so my memory is pretty fuzzy on this, but I believe the issue is a race condition in the first stage parser which we never really narrowed down. If you repeatedly parse a small set of long enough sentences with -t > 1, you should be able to see some cases where you get different parses for the same sentence. Most of the code uses arrays to separate structures used in different threads (https://github.com/BLLIP/bllip-parser/search?utf8=%E2%9C%93&q=MAXNUMTHREADS&type=), but my guess is that there's some global structure that should have got this treatment, but didn't.

Good luck!
David

KantanLabs · 2017-05-25T16:50:02Z

Hi David,

So, I did some look into the multithreading today. Something is definitely fishy. So what I did is just run some tests and some diagnostics. Here is a summary. I will continue digging and will keep posting.

Upgraded to gcc 7 and compiled with -fsanitize=thread -fPIE -pie -g -> no problems
after running parseIt the tool hanged and couldn't be terminated (except with a kill -s 9)
Clean compile and run parseIt with -t64, -t1, -t4, -t2, -t24 for different datasets
-t2 is faster than -t1; the rest are slower.
Ran valgrind with helgrind.
For t2:
==23834== ERROR SUMMARY: 51313 errors from 23 contexts (suppressed: 919 from 32)
For single thread
==23867== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)

Cheers,
Dimitar.

dmcc · 2017-05-26T14:57:58Z

Thanks for the update -- please keep them coming! Yeah, that sounds (vaguely) familiar. I think I ran helgrind and it found issues, but I forget how much deeper I dug than that.

KantanLabs · 2017-05-26T15:28:42Z

OK, So, two things.

First, in the parseIt/mainLoop there is a print loop:

	while(!printStack.empty())
	{
		sleep(SLEEPTIME);
		workOnPrintStack(&printStack);
	}

which is outside of the reading/parsing loop. This loop and the sleep delay the printing process, even if the parse is finished. To be honest, I don't get why is this loop there since there is a print workOnPrintStack(&printStack); call inside the main loop (see line 289 in parseIt.C).

Second, helgrind detects a racing condition somewhere in MeChart::bestParse. It complains about complains about functions from fnSubFns,C which make access to the FullHist*. I assume that the issue arrises there - reading and writing something to FullHist in an uncontrolled way.

I haven't looked more in details today. If I have time next week I will dig more into it.

Cheers,
Dimitar.

KantanLabs · 2017-06-13T11:51:06Z

Hi David,

I did some look around and to be honest didn't figure out what exactly causes helgrade to complain. I modified some things in the parse method.

First, switched the global sentence count from int to unsigned long long int. If I am not wrong, for more than 32,767 sentences there would be errors.

Second, the printing function I took out of the main loop - this way they should work on parallel (added one more semaphore). If you want, I will send the code for review.

But I wanted to ask - how do I set the option for multithreading (the -t option) with the python wrapper?

Thanks,
Dimitar.

dmcc · 2017-06-18T02:27:44Z

switched the global sentence count from int to unsigned long long int

Yup, this makes sense. Please feel free to submit this as a separate patch.

how do I set the option for multithreading (the -t option) with the

python wrapper There used to be an option to pass a thread ID to the first-stage parser in the Python wrapper, but I removed it ( 1858d75) since it didn't seem like threads were going to get fixed. The main pieces were "ThreadManager" which let you sign out a ThreadSlot (thread ID) for exclusive use and a way to pass ThreadSlots to parse ( 1858d75#diff-491cfc4441b3f94d768196cd1252263cL201 ). From your previous message:

Second, helgrind detects a racing condition somewhere in

MeChart::bestParse. It complains about complains about functions from fnSubFns,C which make access to the FullHist*. I assume that the issue arrises there - reading and writing something to FullHist in an uncontrolled way. This could definitely be an issue and explain why parsing was non-deterministic with multiple threads -- would be interested to hear if you were able to dig any deeper. Hope this helps!

…

On Tue, Jun 13, 2017 at 7:51 AM, KantanLabs ***@***.***> wrote: Hi David, I did some look around and to be honest didn't figure out what exactly causes helgrade to complain. I modified some things in the parse method. First, switched the global sentence count from int to unsigned long long int. If I am not wrong, for more than 32,767 sentences there would be errors. Second, the printing function I took out of the main loop - this way they should work on parallel (added one more semaphore). If you want, I will send the code for review. But I wanted to ask - how do I set the option for multithreading (the -t option) with the python wrapper? Thanks, Dimitar. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAm5ZXatlUQ_XLMZ7b_3s3TExK_N329bks5sDnergaJpZM4BOS-a> .

KantanLabs · 2017-06-20T20:49:08Z

Hello David,

Sorry for my late reply - being busy with lots of stuff.

Once I have a bit of free time I will submit the change for the long long int.

Regarding the race condition, it is possible that the history (FullHist) is not thread safe... but not yet verified.

I have, however, two questions:

If I submit more code changes, will you review it? (maybe that sounds like a silly question, but to be honest I haven't worked on a github project like this before.)
The whole idea for fixing the multithreading arose from the fact that I have a python rest-server which I want to connect to the parser and have it running as fast and as distributed as possible. This, however, will not be solved with having the parser multithreaded, because the python wrapper by itself parses the sentences one-by-one (sending and waiting for a response). So, having a python rest server will not help. So, my second question - can parseIt.C be extended with REST capabilities? to listen to a port, receive a request and then send the parse back (in, e.g., json format). I have been looking into pistache for such a solution, but still need to work on it. If you have any idea/suggestion/comment on this, that would be great.

Thanks,
Kind regards,
Dimitar.

dmcc · 2017-06-21T19:12:08Z

Hi Dimitar, Sure, I'd be happy to review changes. If you're planning to make a new REST server frontend, I'd recommend making it a separate file from parseIt.C (maybe parseREST.C?). Ideally, it would only need to call things in first-stage/PARSE/SimpleAPI.h though you will need to extend the parse() calls to use thread IDs again. Hope this helps, David

…

On Tue, Jun 20, 2017 at 4:49 PM, KantanLabs ***@***.***> wrote: Hello David, Sorry for my late reply - being busy with lots of stuff. Once I have a bit of free time I will submit the change for the long long int. Regarding the race condition, it is possible that the history (FullHist) is not thread safe... but not yet verified. I have, however, two questions: 1. If I submit more code changes, will you review it? (maybe that sounds like a silly question, but to be honest I haven't worked on a github project like this before.) 2. The whole idea for fixing the multithreading arose from the fact that I have a python rest-server which I want to connect to the parser and have it running as fast and as distributed as possible. This, however, will not be solved with having the parser multithreaded, because the python wrapper by itself parses the sentences one-by-one (sending and waiting for a response). So, having a python rest server will not help. So, my second question - can parseIt.C be extended with REST capabilities? to listen to a port, receive a request and then send the parse back (in, e.g., json format). I have been looking into pistache for such a solution, but still need to work on it. If you have any idea/suggestion/comment on this, that would be great. Thanks, Kind regards, Dimitar. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAm5ZRdWED0fjdNQehnM7lu2levb3PfAks5sGDBFgaJpZM4BOS-a> .

KantanLabs · 2017-08-02T22:13:06Z

Hi David,

After long search through the BLLIP parser code I am sort of giving up. I reached to a point where I think (one) problem comes from the "edgeSubFns". There is this array of functions that is accessed by each thread. I tried to work on this, but couldn't make it work.

Anyway, I tried a lot of other things (following gdb and valgrind's info) but coudln't make it work properly for multiple threads.

I would like to share my last version and maybe you or somebody can take over. Maybe I am on the right track and just missing something small (or maybe I am shoooting tooo far away).

Cheers,
Dimitar.

dmcc · 2017-08-05T15:36:43Z

Hi Dimitar, Thanks for the report and your work on this problem. It is definitely a challenging codebase. If you have time, it would be awesome if you checked in your changes to your fork of the parser for others who might want to take this on (I'm afraid I don't have the bandwidth at this point). Take care, David

…

On Wed, Aug 2, 2017 at 3:13 PM, KantanLabs ***@***.***> wrote: Hi David, After long search through the BLLIP parser code I am sort of giving up. I reached to a point where I think (one) problem comes from the "edgeSubFns". There is this array of functions that is accessed by each thread. I tried to work on this, but couldn't make it work. Anyway, I tried a lot of other things (following gdb and valgrind's info) but coudln't make it work properly for multiple threads. I would like to share my last version and maybe you or somebody can take over. Maybe I am on the right track and just missing something small (or maybe I am shoooting tooo far away). Cheers, Dimitar. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAm5ZbFPT5sCTnwKZ3-bmB6AEYYz6_M5ks5sUPRzgaJpZM4BOS-a> .

dmcc mentioned this issue Jan 20, 2014

Parser does not remain silent when it can not parse a sentence #17

Closed

dmcc mentioned this issue Jul 6, 2017

bllip parser in Server #57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parseIt is not threadsafe (was: Probabilities printed out of order for multi-threaded LM parser) #16

parseIt is not threadsafe (was: Probabilities printed out of order for multi-threaded LM parser) #16

syllog1sm commented Nov 19, 2013

dmcc commented Nov 19, 2013

KantanLabs commented May 23, 2017

dmcc commented May 23, 2017

KantanLabs commented May 23, 2017

dmcc commented May 24, 2017

KantanLabs commented May 25, 2017

dmcc commented May 26, 2017

KantanLabs commented May 26, 2017 •

edited

Loading

KantanLabs commented Jun 13, 2017

dmcc commented Jun 18, 2017 via email

KantanLabs commented Jun 20, 2017

dmcc commented Jun 21, 2017 via email

KantanLabs commented Aug 2, 2017

dmcc commented Aug 5, 2017 via email

parseIt is not threadsafe (was: Probabilities printed out of order for multi-threaded LM parser) #16

parseIt is not threadsafe (was: Probabilities printed out of order for multi-threaded LM parser) #16

Comments

syllog1sm commented Nov 19, 2013

dmcc commented Nov 19, 2013

KantanLabs commented May 23, 2017

dmcc commented May 23, 2017

KantanLabs commented May 23, 2017

dmcc commented May 24, 2017

KantanLabs commented May 25, 2017

dmcc commented May 26, 2017

KantanLabs commented May 26, 2017 • edited Loading

KantanLabs commented Jun 13, 2017

dmcc commented Jun 18, 2017 via email

KantanLabs commented Jun 20, 2017

dmcc commented Jun 21, 2017 via email

KantanLabs commented Aug 2, 2017

dmcc commented Aug 5, 2017 via email

KantanLabs commented May 26, 2017 •

edited

Loading