Skip to content
This repository has been archived by the owner on Jun 15, 2024. It is now read-only.

Lots of step-results can still cause deadlocks #144

Closed
flosell opened this issue Dec 6, 2016 · 3 comments
Closed

Lots of step-results can still cause deadlocks #144

flosell opened this issue Dec 6, 2016 · 3 comments
Labels

Comments

@flosell
Copy link
Owner

flosell commented Dec 6, 2016

Problem

PR #143 shows, writing a lot of step results (i.e. writing more events than can be processed by event-listeners), can still cause deadlocks, so clearly #135 and #140 have not entirely solved the problem.

Contributing Factors

  • Reading and writing on the same event-bus from the same go-loop: All events write and read from the same event bus; if an event-listener gets stuck writing to the event-bus, it also can't read (and therefore unblock) the event-bus, causing a deadlock
  • One shared event-bus: All steps and all builds share the same event-bus, so lots of steps/builds active at the same time increase the likelihood of the problem appearing
  • Inheritance: Step-Result updates in a child can trigger a step-result update in a parent (e.g. chaining), duplicating the number of update events
  • Slow/Blocking listeners

Work so far

Potential Workarounds

Solutions

  • Decouple topics from each other so a full topic does not affect our ability to write to another topic (see comments below)
@flosell flosell added the bug label Dec 6, 2016
flosell added a commit that referenced this issue Dec 11, 2016
…load (#144)

In the old (core async pub/sub-based) event-bus, all events were written to the same channel and only distributed to mult-channels afterwards, meaning when the mult-channels blocked, all topics on the event-bus blocked. This change removes the single chan and works directly with mults per topic; This way, topics are more independent from each other, e.g. a persistence mechanism that subscribes to step updates and writes consumed-events afterwards no longer blocks when writing the consumed-event because additional step-updates clog the event-bus, deadlocking the whole system
@flosell
Copy link
Owner Author

flosell commented Dec 11, 2016

After further investigation, reading/writing on the same event-bus seems to be the root cause, specifically, the publisher-channel that is shared by all topics and pipelines. This way, the following could occur:

  1. Lots of :step-result-updated events clog the event-bus (i.e. the publisher-channel)
  2. The pipeline-state-updater now tries to process such an event
  3. After processing an event, the pipeline-state-updater tries to send a :step-result-update-consumedevent but is blocked because the event-publisher channel is full
  4. Since the pipeline-state-updater is now blocked, no updates are processed anymore, resulting in a deadlock

I now rewrote the event-bus to use separate publisher-channels per topic. This way, components that subscribe to a topic and write to another should no longer deadlock.

Components that write to the same topic they read from (e.g. step result inheritance) could still be an issue. However, the fixes for #135 and #140 seem to have worked for now.

@flosell
Copy link
Owner Author

flosell commented Dec 11, 2016

The new event-bus is still not active default (activate with :use-new-event-bus true).

Keeping this issue open until we have some evidence it's good in production and we can make it the default.

Also, the workarounds are probably still a smart idea to reduce overall resource consumption

@flosell
Copy link
Owner Author

flosell commented Jan 30, 2017

Workarounds and new default have their own issues, closing this one.

@flosell flosell closed this as completed Jan 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

1 participant