-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: delay parallel pool termination to prevent false negatives #4959
Conversation
I found that moving `await pool.terminate()` into the finally block here could prevent some false negatives, where tests could fail but node would mysteriously exit 0 anyway.
|
This PR hasn't had any recent activity, and I'm labeling it |
This is still applicable. |
This is highly relevant for me 👍🏾 Can we have it merged? |
This PR hasn't had any recent activity, and I'm labeling it |
This is still applicable. It's waiting for review by a maintainer. |
Note that per #5027 we're a new group of maintainers and not intimately familiar with Mocha's internals yet. Changes to async/pool logic are scary in general and especially to us. Could someone please post an isolated reproduction please? Either in a new 🐛 Bug issue, or as a casual comment here if that's inconvenient for you? We can't reasonably triage this PR without an isolated reproduction. I'm also particularly interested in seeing whether the bug can be reproduced with the native |
👋 ping @echuber2, is this still something you have time for? |
Closing out as it's been a while since PR activity. If anybody wants to take this over, please do - and post a co-author attribution if you take code from this PR. Cheers! 🤎 |
Sorry, I missed the earlier ping. I don't have time to put together a test case right now but I may be able to think about it later this fall. The application we were using this in was fairly elaborate so it may be difficult to reproduce. However, just at a sight reading of the code, it seems more correct to do the cleanup task in the finally block and not in the try (which was the effect of my PR), so I did not expect a full repro would be necessary. At any rate I'll see if I can return to it later. |
I found that moving
await pool.terminate()
into the finally block here could prevent some false negatives, where tests could fail but node would mysteriously exit 0 anyway. (That is, by "false negative", I mean that the mocha test suite gave a passing exit code when it should have failed.)Description of the Change
We saw some sporadic cases where a mixture of tests run with
--parallel
would exit clean with code 0 despite there being some test failures. It's difficult to reproduce. (One specific case can be found in the PrairieLearn project issue linked on this PR.) After some experimentation, I found that the call toawait pool.terminate()
here is sometimes causing the entire node process to immediately exit with 0. I'm not entirely sure of the cause, but it may be due to the unusual implementation of cancellable promises in theworkerpool
library (namedPromise
like the official JS version, but different). When the false negative issue arises inpool.terminate()
, the calls seem to descend into the version ofPromise.all
inworkerpool
and then suddenly exit. It may be that some signal handlers for uncaught exceptions are misconfigured somewhere.At any rate, when I move this call to
await pool.terminate();
later in the code (into thefinally
block), the test results appear to accurately show when at least one test has failed or not. Maybe someone else can test this if they've seen anything similar.Alternate Designs
I tried to figure out a way to fix the issue in the
workerpool
library's handling of Promise.all (which uses its own Promise type, not the standard JS one), since the call toterminate
here ultimately ends up there. But I couldn't conclusively determine if the issue was arising there.Even without this change, it does work to use
--bail
with--parallel
and still catch the first failure that way. But then, the downside would be that the CI test run won't include test results beyond the first failure.Why should this be in core?
The
--parallel
feature sometimes has false negatives (failed tests that show as passing with exit 0 in CI).Benefits
This change successfully catches some unusual failing cases in the particular situation we saw on the PrairieLearn project. I'm not entirely certain, but I think it doesn't hurt anything to terminate the pool later, in the finally block.
Possible Drawbacks
I don't know if this will cause new false negatives or false positives for other users in strange cases. The underlying issue(s) may be in the
workerpool
library, in which case it could theoretically be fixed without changing the mocha code at all.Applicable issues
Maybe related somehow to #4559
Hopefully fixes PrairieLearn/PrairieLearn#6940
Possibility of breaking change
This may be a harmless bug fix or a breaking change. I won't know unless more people test it.