Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow output definition #4784

Merged
merged 54 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
0de777a
Add initial prototype
bentsherman Feb 28, 2024
14beedc
Fix race condition
bentsherman Feb 28, 2024
062b421
Simplify output DSL
bentsherman Feb 28, 2024
06a5872
Support default publish options in path method
bentsherman Feb 29, 2024
eecf619
Rename OutputDsl -> WorkflowPublishDsl
bentsherman Mar 19, 2024
b238a81
Replace process selector with channel/topic selectors, add e2e test
bentsherman Mar 20, 2024
d0fa980
cleanup PublishOp
bentsherman Mar 20, 2024
8279743
Add topic operator (not working)
bentsherman Mar 22, 2024
7681e2d
Merge branch 'master' into 4670-workflow-outputs
pditommaso Mar 26, 2024
88d7ead
Add topic op test
pditommaso Mar 26, 2024
1e3d539
clean up e2e test
bentsherman Mar 27, 2024
925bc67
Fix issue with topic operator
bentsherman Mar 27, 2024
e4608b3
Apply suggestions from review
bentsherman Mar 27, 2024
1e69f3e
Add output directory option to CLI, config, output DSL
bentsherman Mar 27, 2024
01b0570
Validate publish options
bentsherman Mar 27, 2024
c43afad
Update docs
bentsherman Mar 27, 2024
570da27
Add defaults to directory statement
bentsherman Mar 27, 2024
0ad12eb
Update docs
bentsherman Mar 28, 2024
3e0823b
Apply suggestions from review
bentsherman Mar 30, 2024
5a1a7bb
Fix workflow binding
bentsherman Mar 30, 2024
3bdd0fe
Fix dynamic path name
bentsherman Apr 6, 2024
59d83c0
Apply suggestions from review
bentsherman Apr 10, 2024
5beca46
Merge branch 'master' into 4670-workflow-outputs
pditommaso Apr 12, 2024
b67376c
Add `publish:` section to process
bentsherman Apr 12, 2024
65d9111
Update docs
bentsherman Apr 12, 2024
35df924
Add publish options to output DSL
bentsherman Apr 12, 2024
509f271
Change publish op to handle multiple task dirs
bentsherman Apr 12, 2024
6fcbf6f
Fix error when no output block is specified
bentsherman Apr 15, 2024
550766a
Rename output -> publish, rule -> target
bentsherman Apr 15, 2024
72d8a6f
Disallow absolute path in publish target
bentsherman Apr 15, 2024
370aa0b
Add feature flag
bentsherman Apr 15, 2024
76af5c9
Allow multi-channel output to be published
bentsherman Apr 15, 2024
0833e11
Add overwrite modes for deep / lenient / standard hash comparison
bentsherman Apr 16, 2024
4614f26
Merge branch 'master' into 4670-workflow-outputs
pditommaso Apr 17, 2024
02745f7
Factor out HashBuilder from CacheHelper
bentsherman Apr 17, 2024
0583b3d
Add index file definition
bentsherman Apr 18, 2024
e60403d
Fix failing tests
bentsherman Apr 18, 2024
12c2720
Apply suggestions from review
bentsherman Apr 22, 2024
d998878
Don't write index file if no records were published
bentsherman Apr 24, 2024
95a110a
Redirect to `null` to disable publishing
bentsherman Apr 24, 2024
7953e79
Remove ternary hack, require parentheses instead
bentsherman Apr 24, 2024
856209b
Replace publish path option with ability to reroute targets in publis…
bentsherman Apr 29, 2024
50484d9
Apply suggestions from review
bentsherman May 1, 2024
ab9118e
Merge branch 'master' into 4670-workflow-outputs
pditommaso May 3, 2024
84e6382
Merge branch 'master' into 4670-workflow-outputs
pditommaso May 9, 2024
2b80f47
Merge branch 'master' into 4670-workflow-outputs
pditommaso May 12, 2024
d504e45
Minor change [ci skip]
pditommaso May 12, 2024
6185d84
Apply suggestions from review
bentsherman May 13, 2024
79cc68c
Fix typo [ci skip]
pditommaso May 15, 2024
a70af4e
Merge branch 'master' into 4670-workflow-outputs
pditommaso May 15, 2024
ef4305d
Add shorthand for publishing single file to index
bentsherman May 15, 2024
b8cf823
Fold PublishIndexOp into PublishOp, add test for OutputDsl,
bentsherman May 16, 2024
02ba0c7
Update index file default for single file
bentsherman May 16, 2024
0db73c8
Use file base name for default index
bentsherman May 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 26 additions & 17 deletions docs/channel.md
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,7 @@ The `interval` method emits an incrementing index (starting from zero) at a peri
Channel.interval('1s').view()
```

The above snippet will emit 0, 1, 2, and so on, every second, forever. You can use an operator such as {ref}`operator-take`, {ref}`operator-timeout`, or {ref}`operator-until` to close the channel based on a stopping condition.
The above snippet will emit 0, 1, 2, and so on, every second, forever. You can use an operator such as {ref}`operator-take` or {ref}`operator-until` to close the channel based on a stopping condition.

An optional closure can be used to transform the index. Additionally, returning `Channel.STOP` will close the channel. For example:

Expand Down Expand Up @@ -467,17 +467,9 @@ See also: [channel.fromList](#fromlist) factory method.
This feature requires the `nextflow.preview.topic` feature flag to be enabled.
:::

A *topic* is a channel type introduced as of Nextflow 23.11.0-edge along with {ref}`channel-type-value` and
{ref}`channel-type-queue`.
A *topic channel*, similar to a *queue channel*, is a non-blocking unidirectional FIFO queue, with the ability to implicitly receive values from multiple sources based on a *topic name*.

A *topic channel*, similarly to a *queue channel*, is non-blocking unidirectional FIFO queue, however it connects
multiple *producer* processes with multiple *consumer* processes or operators.

:::{tip}
You can think about it as a channel that is shared across many different process using the same *topic name*.
:::

A process output can be assigned to a topic using the `topic` option on an output, for example:
A process output can be sent to a topic using the `topic` option, for example:

```groovy
process foo {
Expand All @@ -491,21 +483,38 @@ process bar {
}
```

The `channel.topic` method allows referencing the topic channel with the specified name, which can be used as a process
input or operator composition as any other Nextflow channel:
Additionally, the `topic:` section of a workflow definition can be used to send channels defined in a workflow to a topic:
bentsherman marked this conversation as resolved.
Show resolved Hide resolved

```groovy
workflow foobar {
main:
foo()
bar()

topic:
foo.out >> 'my_topic'
bar.out >> 'my_topic'

emit:
bar.out
}
```

Finally, the `Channel.topic()` factory can be used to consume the resulting channel for a given topic name, which can be used like any other channel:

```groovy
channel.topic('my-topic').view()
bentsherman marked this conversation as resolved.
Show resolved Hide resolved
```

This approach is a convenient way to collect related items from many different sources without explicitly defining
the logic connecting many different queue channels altogether, commonly using the `mix` operator.
The same topic can be consumed using `Channel.topic()` any number of times, similar to referencing a channel multiple times.

This approach is a convenient way to collect related items from many different sources without all of the logic that is required to connect them, e.g. using the `mix` operator.

:::{warning}
Any process that consumes a channel topic should not send any outputs to that topic, or else the pipeline will hang forever.
Avoid creating a circular dependency within a topic (e.g. a process that consumes a channel topic and sends outputs to that same topic), as it will cause the pipeline to run forever.
:::

See also: {ref}`process-additional-options` for process outputs.
See also: {ref}`process-additional-options` for process outputs and the {ref}`workflow topic section <workflow-topics>`.

(channel-value)=

Expand Down
2 changes: 2 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -1109,6 +1109,8 @@ Checking nextflow-io/hello ...
checkout-out at AnyObjectId[1c3e9e7404127514d69369cd87f8036830f5cf64] - revision: 1c3e9e7404 [v1.1]
```

(cli-run)=

### run

Execute a pipeline.
Expand Down
4 changes: 4 additions & 0 deletions docs/operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -1466,6 +1466,8 @@ An optional {ref}`closure <script-closure>` can be used to transform each item b
:language: console
```

(operator-take)=

## take

*Returns: queue channel*
Expand Down Expand Up @@ -1669,6 +1671,8 @@ The difference between `unique` and `distinct` is that `unique` removes *all* du

See also: [distinct](#distinct)

(operator-until)=

## until

*Returns: queue channel*
Expand Down
7 changes: 5 additions & 2 deletions docs/process.md
Original file line number Diff line number Diff line change
Expand Up @@ -1008,7 +1008,7 @@ Some caveats on glob pattern behavior:
Although the input files matching a glob output declaration are not included in the resulting output channel, these files may still be transferred from the task scratch directory to the original task work directory. Therefore, to avoid unnecessary file copies, avoid using loose wildcards when defining output files, e.g. `path '*'`. Instead, use a prefix or a suffix to restrict the set of matching files to only the expected ones, e.g. `path 'prefix_*.sorted.bam'`.
:::

Read more about glob syntax at the following link [What is a glob?][what is a glob?]
Read more about glob syntax at the following link [What is a glob?][glob]

### Dynamic output file names

Expand Down Expand Up @@ -2163,6 +2163,10 @@ The following options are available:

### publishDir

:::{deprecated} 24.04.0
The `publishDir` directive has been deprecated in favor of the new {ref}`workflow output definition <workflow-output-dsl>`.
:::

The `publishDir` directive allows you to publish the process output files to a specified folder. For example:

```groovy
Expand Down Expand Up @@ -2668,4 +2672,3 @@ process foo {
```

[glob]: http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob
[what is a glob?]: http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob
Loading
Loading