You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please take the following with a grain of salt, as I am new to using Hydra.
If I do a commit N to add a new derivation in nixpkgs (or upgrade an existing one) and put down the correct hash in fetchurl, but a bad source URL, then any jobs that depend on this derivation will fail (because the source file cannot be downloaded).
However, if I fix the URL in commit N+1, then Hydra will not actually restart the job, and instead it will say that the build is still failing, with the same error as before.
From a user perspective this is misleading because you can clearly see that the input to the newest jobset evaluation is the new commit N+1, but the build will still reflect the results from commit N.
In fact, even if you "clear failed builds cache" and "clear VCS caches" and then push a new unrelated commit (to force a new jobset evaluation), the failure will still persist, which is even more confusing. And if you read the nix.conf man page, it says the following:
build-cache-failures (...) Failures in fixed-output derivations (such as fetchurl calls) are never cached. (...)
... which is even more misleading if you're tring to understand the problem, given that it is a fetchurl failure, but it appears to have become a cached failure.
As far as I can see, the problem seems to be in hydra-evaluator, specifically src/lib/Hydra/Helper/AddBuilds.pm#492, function checkBuild():
492 # Don't add a build that has already been scheduled for this
493 # job, or has been built but is still a "current" build for
494 # this job. (...)
(...)
503 if (defined$prevEval) {
504 # Only check one output: if it's the same, the other will be as well.
505 my$firstOutput = $outputNames[0];
506 my ($prevBuild) = $prevEval->builds->search(
507 # The "project" and "jobset" constraints are
508 # semantically unnecessary (because they're implied by
509 # the eval), but they give a factor 1000 speedup on
510 # the Nixpkgs jobset with PostgreSQL.
511 { project=>$jobset->project->name, jobset=>$jobset->name, job=>$jobName,
512 name=>$firstOutputName, path=>$firstOutputPath },
513 { rows=> 1, columns=> ['id'], join=> ['buildoutputs'] });
514 if (defined$prevBuild) {
515 printSTDERR" already scheduled/built as build ", $prevBuild->id, "\n";
516 $buildMap->{$prevBuild->id} = { id=>$prevBuild->id, jobName=>$jobName, new=> 0, drvPath=>$drvPath };
517 return;
518 }
519 }
So it seems that if the output name and therefore, the inputs haven't changed, then hydra-evaluator won't schedule a new build. But changing the URL doesn't change the inputs or the outputs, therefore even though I fixed the underlying problem in building the derivation, a new build won't be scheduled.
Correct me if I'm wrong, but the code above doesn't seem to take into account transient fetchurl failures, and it also seems redundant, considering that Nix already caches build successes and failures.
A similar problem (although more of a race condition) seems to exist just a few lines below:
521 # Prevent multiple builds with the same (job, outPath) from
522 # being added.
523 my$prev = $$jobOutPathMap{$jobName . "\t" . $firstOutputPath};
524 if (defined$prev) {
525 printSTDERR" already scheduled as build ", $prev, "\n";
526 return;
527 }
This means that if I put a bad URL in commit N and hydra evaluates the jobset, then I fix the URL in commit N+1 and hydra evaluates it again, but the previous job is still queued, then a new job won't be queued and instead the result of commit N+1 will reflect the failure in commit N, even though the problem has already been fixed in commit N+1.
The text was updated successfully, but these errors were encountered:
Please take the following with a grain of salt, as I am new to using Hydra.
If I do a commit N to add a new derivation in nixpkgs (or upgrade an existing one) and put down the correct hash in fetchurl, but a bad source URL, then any jobs that depend on this derivation will fail (because the source file cannot be downloaded).
However, if I fix the URL in commit N+1, then Hydra will not actually restart the job, and instead it will say that the build is still failing, with the same error as before.
From a user perspective this is misleading because you can clearly see that the input to the newest jobset evaluation is the new commit N+1, but the build will still reflect the results from commit N.
In fact, even if you "clear failed builds cache" and "clear VCS caches" and then push a new unrelated commit (to force a new jobset evaluation), the failure will still persist, which is even more confusing. And if you read the nix.conf man page, it says the following:
... which is even more misleading if you're tring to understand the problem, given that it is a fetchurl failure, but it appears to have become a cached failure.
As far as I can see, the problem seems to be in hydra-evaluator, specifically src/lib/Hydra/Helper/AddBuilds.pm#492, function checkBuild():
So it seems that if the output name and therefore, the inputs haven't changed, then hydra-evaluator won't schedule a new build. But changing the URL doesn't change the inputs or the outputs, therefore even though I fixed the underlying problem in building the derivation, a new build won't be scheduled.
Correct me if I'm wrong, but the code above doesn't seem to take into account transient fetchurl failures, and it also seems redundant, considering that Nix already caches build successes and failures.
A similar problem (although more of a race condition) seems to exist just a few lines below:
This means that if I put a bad URL in commit N and hydra evaluates the jobset, then I fix the URL in commit N+1 and hydra evaluates it again, but the previous job is still queued, then a new job won't be queued and instead the result of commit N+1 will reflect the failure in commit N, even though the problem has already been fixed in commit N+1.
The text was updated successfully, but these errors were encountered: