-
Notifications
You must be signed in to change notification settings - Fork 59
Large output in shell-script breaks pipelines in chaining-macro (since 0.9.4) #135
Comments
A workaround seems to be avoiding the always-chaining macro in your pipeline. |
Thanks for the very detailed bug report! I'll try to reproduce this the next chance I get and get back to you as soon as possible. I refactored this part of the codebase a lot recently so there is a chance something went wrong there. |
Just to give an update: I reproduced the issue and started investigating but haven't found the root cause yet. My current guess is an issue with very large and frequest step results in general. The commit you outlined (cee9694) most likely just increased the likelihood of the problem surfacing because it makes step results available per chaining-step as well as overall, therefore essentially duplicating the data. I'll keep you posted... |
Ok, finally figured out what the issue was and fixed it: The size of the step results doesn't matter, it's the frequent updates that are being sent that fill up the event-bus. Normally, this isn't an issue because the updates get read by event-bus subscribers and live continues. Unfortunately, this doesn't work when step-result inheritance comes into play (which is what happens under the covers of the chaining-macro to merge the outputs of the different steps in a chain): Then, the parent listens on the event-bus for updates from the child and sends its own update whenever the child updates. When the child sends lots of updates, the event-bus is saturated so the parent blocks on trying to send updates, therefore also no longer consumes updates, therefore the whole thing deadlocks. It's now fixed by adding a sliding buffer in the inheritance: If the event-bus is saturated, we can compress events and just send the most up to date version of the inherited state once the event-bus is unblocked again. Also, in debugging this issue I noticed every update also flushes to disk which is probably a bit too much. I'll address this in #137 |
While trying to update to a recent release, we encountered an issue in a pipeline that uses the 'stepsupport/always-chaining' macro:
If a step outputs in a bulk, this regulary breaks the pipeline.
The visible effects are: The output stops in the middle of the step. The step itself never finishes (seems to be in a deadlock). The UI is still responding, but no new builds can be triggered, making the pipeline unusable until a restart.
After various tests, it seems that having a lot of output from a shell script at the same time breaks the pipeline.
The bug can be reproduced in (probably) all versions since 0.9.4. More specifically, the commit cee9694 seems to introduce the problem.
Steps to reproduce:
The patch adds two shell scripts to the dev-resources directory and changes the example pipeline to have chaining step that executes those scripts. The first script calls the second (to resemble our real life problem) and the second script produces a large chunk of output.
Disclaimer: Unfortunately the bug is not 100% reproducible. In some cases, especially if the machine running the pipeline is started fresh, the step will not stop. After some restarts of the pipeline it should break and end up in the describe behavior. This might indicate problems with memory or number of threads, etc.
The text was updated successfully, but these errors were encountered: