Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow output definition #4784

Merged
merged 54 commits into from
May 17, 2024
Merged

Workflow output definition #4784

merged 54 commits into from
May 17, 2024

Conversation

bentsherman
Copy link
Member

Close #4670

This PR adds a DSL for defining workflow outputs. It is meant to completely replace the publishDir directive, and centralize the definition of published outputs into one location.

This first iteration is essentially a builder DSL for a directory structure, which also collects the essential elements of the existing publish mechanism: process selectors to select process outputs, and publish options like pattern and enabled (other options like mode can also be supported this way).

Under the hood, there is now a WorkflowPublisher which contains all of the publish "selectors" and uses PublishDir just like the directive. So the publishing itself is also handled in one place rather than during the task finalization.

See nf-core/fetchngs#275 for a prototype with nf-core/fetchngs

Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Copy link

netlify bot commented Feb 28, 2024

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit 0db73c8
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/664665b20da96700083892ac

@bentsherman

This comment was marked as outdated.

@pditommaso

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@pditommaso

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@pditommaso

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@marcodelapierre

This comment was marked as outdated.

@bentsherman

This comment was marked as outdated.

@pinin4fjords

This comment was marked as outdated.

@bentsherman
Copy link
Member Author

With static types we are moving towards a syntax where every process output will have a name by definition (<type> <name> = <value>), so in time this will not be an issue

@marcodelapierre
Copy link
Member

thanks for the reminder Ben

@marcodelapierre
Copy link
Member

marcodelapierre commented May 8, 2024

I am happy with the testing of the feature with the Fetchngs POC (both linux vm and Slurm cluster); there I disabled nf-validation.

For the rnaseq POC, how do I get around the following error?

ERROR ~ Install is not supported on dev mode - Missing plugin nf-validation 1.1.3

(nf-validation here seems needed to use fromSamplesheet)

@bentsherman
Copy link
Member Author

You can install the local build with make install. It will replace the installation of your current Nextflow version and allow you to run pipelines with third-party plugins. It won't use the local build of core plugins like nf-amazon, but that's okay because this PR doesn't modify any core plugins.

@marcodelapierre
Copy link
Member

Testing Rnaseq was also successful. As Ben expected, output directory layout is not identical, but output files are there.

@pditommaso
Copy link
Member

While testing the output index I've got this

ERROR ~ Index file record must be a list or map: /Users/pditommaso/Projects/rnaseq-nf/work/10/f200de889f31fcba450bab2d2964a3/fastqc_ggal_gut_logs [UnixPath]

 -- Check '.nextflow.log' file for details

What about instead creating a simple index as "file name", "file path" ?

@pditommaso
Copy link
Member

Made a few tests and works like a charm. I was expecting a much bigger impact in the codebase, great work Ben.

I've commented a few things to address above

@bentsherman bentsherman requested a review from pditommaso May 16, 2024 15:14
@pditommaso
Copy link
Member

Better, but it's not an index file

"/Users/pditommaso/Projects/rnaseq-nf/results/fastqc/fastqc_ggal_gut_logs"

I was expecting as

"fastqc_ggal_gut_logs", "/Users/pditommaso/Projects/rnaseq-nf/results/fastqc/fastqc_ggal_gut_logs"

@bentsherman
Copy link
Member Author

That can be done with the index mapper:

index {
  mapper { file -> [file.name, file] }
}

@pditommaso
Copy link
Member

I know, but it would be nice to have it as default behaviour

@pditommaso
Copy link
Member

and it's ready!

@pditommaso pditommaso merged commit cf0546b into master May 17, 2024
22 checks passed
@pditommaso pditommaso deleted the 4670-workflow-outputs branch May 17, 2024 10:12
marcodelapierre pushed a commit that referenced this pull request May 31, 2024
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Co-authored-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Dr Marco Claudio De La Pierre <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Workflow output definitions and schema
5 participants