-
Notifications
You must be signed in to change notification settings - Fork 59
Lots of step-results can still cause deadlocks #144
Comments
…load (#144) In the old (core async pub/sub-based) event-bus, all events were written to the same channel and only distributed to mult-channels afterwards, meaning when the mult-channels blocked, all topics on the event-bus blocked. This change removes the single chan and works directly with mults per topic; This way, topics are more independent from each other, e.g. a persistence mechanism that subscribes to step updates and writes consumed-events afterwards no longer blocks when writing the consumed-event because additional step-updates clog the event-bus, deadlocking the whole system
After further investigation, reading/writing on the same event-bus seems to be the root cause, specifically, the publisher-channel that is shared by all topics and pipelines. This way, the following could occur:
I now rewrote the event-bus to use separate publisher-channels per topic. This way, components that subscribe to a topic and write to another should no longer deadlock. Components that write to the same topic they read from (e.g. step result inheritance) could still be an issue. However, the fixes for #135 and #140 seem to have worked for now. |
The new event-bus is still not active default (activate with Keeping this issue open until we have some evidence it's good in production and we can make it the default. Also, the workarounds are probably still a smart idea to reduce overall resource consumption |
Workarounds and new default have their own issues, closing this one. |
Problem
PR #143 shows, writing a lot of step results (i.e. writing more events than can be processed by event-listeners), can still cause deadlocks, so clearly #135 and #140 have not entirely solved the problem.
Contributing Factors
Work so far
Potential Workarounds
Solutions
The text was updated successfully, but these errors were encountered: