You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When nextflow creates output files on S3 using the output block, tags specified in that block do not propagate to subdirectories created by processes. As a result, some files in the output block do not have the specified tags, but instead inherit the tags of the files they were copied from in the working directory. This causes problems for tag-based applications like S3 lifecycle rules.
Expected behavior and actual behavior
Under the new workflow-level output feature, files can be published using the publish section of a workflow combined with an output block configuring the output directory. This includes the option to add tags to the published files, which can then be used in downstream applications on AWS.
For various reasons1, some Nextflow processes output directories containing files, rather than flat files. In this case, when the output of that process is published, I would expect the specified tags from the output block to be applied to the files in those process output directories. Instead, these descendant files inherit their tags from the files in the working directory they are copied from. Among other things, this means that tag-based downstream applications will treat these files as though they were temporary files in a working directory; for example, S3 lifecycle rules intended to delete temporary files after a certain time period will also delete these files.
To my knowledge (which is certainly not exhaustive) there is currently no way to correctly tag these files in subdirectories correctly using Nextflow.
Steps to reproduce the problem
A minimum working example of the problem is available here. To run it, first edit params.base_dir in nextflow.config to point to an S3 bucket you have write access to.
The workflow produces two output files, <base_dir>/output/results/flat_file.txt and <base_dir>/output/results/file_dir/dir_file.txt:
In my hands, when inspecting the output files on S3, the workflow tags flat_file.txt as expected, with nextflow_file_class = publish and nextflow.io/temporary = false.
Conversely, dir_file.txt is not tagged as expected, and just inherits nextflow.io/temporary = true from its workflow parent. This causes it to be swept up by lifecycle rules that use nextflow.io/temporary as a marker of temporary status.
Program output
(Copy and paste here output produced by the failing execution. Please highlight it as a code block. Whenever possible upload the .nextflow.log file.)
N/A, execution does not fail -- the workflow completes but does not tag all output files correctly.
Environment
Nextflow version: version 24.10.3 build 5933
Java version: Temurin-17.0.10+7 (build 17.0.10+7)
Operating system: Linux (Amazon Linux 2023)
Bash version: GNU bash, version 5.2.15(1)-release (x86_64-amazon-linux-gnu)
Additional context
N/A
Footnotes
In our case, mostly involving creating indexes for bioinformatic programs (e.g. Bowtie2, Kraken, BBMap). ↩
The text was updated successfully, but these errors were encountered:
Bug report
Summary
When nextflow creates output files on S3 using the
output
block, tags specified in that block do not propagate to subdirectories created by processes. As a result, some files in the output block do not have the specified tags, but instead inherit the tags of the files they were copied from in the working directory. This causes problems for tag-based applications like S3 lifecycle rules.Expected behavior and actual behavior
Under the new workflow-level output feature, files can be published using the
publish
section of a workflow combined with an output block configuring the output directory. This includes the option to add tags to the published files, which can then be used in downstream applications on AWS.For various reasons1, some Nextflow processes output directories containing files, rather than flat files. In this case, when the output of that process is published, I would expect the specified tags from the output block to be applied to the files in those process output directories. Instead, these descendant files inherit their tags from the files in the working directory they are copied from. Among other things, this means that tag-based downstream applications will treat these files as though they were temporary files in a working directory; for example, S3 lifecycle rules intended to delete temporary files after a certain time period will also delete these files.
To my knowledge (which is certainly not exhaustive) there is currently no way to correctly tag these files in subdirectories correctly using Nextflow.
Steps to reproduce the problem
A minimum working example of the problem is available here. To run it, first edit
params.base_dir
innextflow.config
to point to an S3 bucket you have write access to.The workflow produces two output files,
<base_dir>/output/results/flat_file.txt
and<base_dir>/output/results/file_dir/dir_file.txt
:flat_file.txt
as expected, withnextflow_file_class = publish
andnextflow.io/temporary = false
.dir_file.txt
is not tagged as expected, and just inheritsnextflow.io/temporary = true
from its workflow parent. This causes it to be swept up by lifecycle rules that usenextflow.io/temporary
as a marker of temporary status.Program output
N/A, execution does not fail -- the workflow completes but does not tag all output files correctly.
Environment
Additional context
N/A
Footnotes
In our case, mostly involving creating indexes for bioinformatic programs (e.g. Bowtie2, Kraken, BBMap). ↩
The text was updated successfully, but these errors were encountered: