You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I want to figure out why a cached task was unexpectedly re-executed, the current approach is to do both runs with -dump-hashes and diff the logs. But I have to do a lot of manual work to extract the relevant information from the logs in order to produce a proper diff.
But it is only a partial solution, still requires some manual effort to inspect the component hashes.
A simple improvement would be to log the hashes for a task in a single line, that way Abhinav's bash script could produce a diff with the component hashes.
The ideal solution would be to incorporate this bash script into Nextflow itself as some kind of report, or a new command like we did with nextflow inspect. Nextflow needs the information from two different runs, so it would be difficult to do in the run command, beyond logging separate reports during each run that the user must compare.
On the other hand, if the task cache could contain all the components of a task that are used to compute the task hash, we could reproduce the -dump-hashes output by querying the task cache and re-computing the component hashes. It could be a nice extension to nextflow log or a new command like nextflow diff.
The task cache current cannot do this, but we are planning to add inputs and outputs metadata to the cache as part of #3802 and #3849 , so maybe we can make it work for this purpose as well.
When I want to figure out why a cached task was unexpectedly re-executed, the current approach is to do both runs with
-dump-hashes
and diff the logs. But I have to do a lot of manual work to extract the relevant information from the logs in order to produce a proper diff.@abhi18av used some bash magic in this blog post to do some of this work:
But it is only a partial solution, still requires some manual effort to inspect the component hashes.
A simple improvement would be to log the hashes for a task in a single line, that way Abhinav's bash script could produce a diff with the component hashes.
The ideal solution would be to incorporate this bash script into Nextflow itself as some kind of report, or a new command like we did with
nextflow inspect
. Nextflow needs the information from two different runs, so it would be difficult to do in therun
command, beyond logging separate reports during each run that the user must compare.On the other hand, if the task cache could contain all the components of a task that are used to compute the task hash, we could reproduce the
-dump-hashes
output by querying the task cache and re-computing the component hashes. It could be a nice extension tonextflow log
or a new command likenextflow diff
.The task cache current cannot do this, but we are planning to add inputs and outputs metadata to the cache as part of #3802 and #3849 , so maybe we can make it work for this purpose as well.
Related: #844
The text was updated successfully, but these errors were encountered: