Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid holding batchCond while dispatching updateInterrupt #164

Merged
merged 3 commits into from
Oct 11, 2021

Conversation

peterbroadhurst
Copy link
Contributor

@peterbroadhurst peterbroadhurst commented Oct 11, 2021

In #162 there is the following deadlock:

Deadlock investigation

Batch processor trying to notify eventPoller to stop using updateInterrupt:

goroutine 547 [chan receive]:
github.com/hyperledger/firefly-ethconnect/internal/events.(*eventStream).batchProcessor(0xc0001be160)
	/ethconnect/internal/events/eventstream.go:544 +0x11b
created by github.com/hyperledger/firefly-ethconnect/internal/events.(*eventStream).startEventHandlers
	/ethconnect/internal/events/eventstream.go:209 +0x10c

Event poller trying to obtain the batchCond:

goroutine 546 [semacquire]:
sync.runtime_SemacquireMutex(0xc000028334, 0xbc8800, 0x1)
	/usr/local/go/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0xc000028330)
	/usr/local/go/src/sync/mutex.go:138 +0x105
sync.(*Mutex).Lock(0xc000028330)
	/usr/local/go/src/sync/mutex.go:81 +0x47
github.com/hyperledger/firefly-ethconnect/internal/events.(*eventStream).isBlocked(0xc0001be160, 0xc000292030)
	/ethconnect/internal/events/eventstream.go:356 +0x5c
github.com/hyperledger/firefly-ethconnect/internal/events.(*eventStream).eventPoller(0xc0001be160)
	/ethconnect/internal/events/eventstream.go:393 +0xd85
created by github.com/hyperledger/firefly-ethconnect/internal/events.(*eventStream).startEventHandlers
	/ethconnect/internal/events/eventstream.go:207 +0xcc


... called from this loop:
if err == nil && !a.isBlocked() {

... which is where we would pull from the updateInterrupt

Proposed fix

Release the batchCond lock, before consuming from the updateInterrupt channel

@@ -541,11 +541,11 @@ func (a *eventStream) batchProcessor() {
a.batchCond.L.Lock()
for !a.suspendOrStop() && a.batchQueue.Len() == 0 {
if a.updateInProgress {
a.batchCond.L.Unlock()
<-a.updateInterrupt
// we were notified by the caller about an ongoing update, return
log.Infof("%s: Notified of an ongoing stream update, exiting batch processor", a.spec.ID)
a.updateWG.Done() //Not moving this to a 'defer' since we need to unlock after calling Done()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a.updateWG.Done() //Not moving this to a 'defer' since we need to unlock after calling Done()
a.updateWG.Done()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the stale comment

Signed-off-by: Peter Broadhurst <[email protected]>
@peterbroadhurst
Copy link
Contributor Author

Build hit the same issue from #166, so pulled in the verbose test running from that change to see if I can catch it here.

@peterbroadhurst peterbroadhurst merged commit e80c6b7 into hyperledger:main Oct 11, 2021
@peterbroadhurst peterbroadhurst deleted the fix-162 branch October 11, 2021 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants