[proposal] Add local module named configuration template #4422

pditommaso · 2023-10-17T13:45:26Z

Context

The use of DSL2 introduces the use of module components that allow the inclusion and re-use a process more than one time with different names.

A common requirement in this scenario is that a process configuration needs to be customised, for example, to provide a specific publishDir path or command arguments.

Currently, this need is addressed by creating a monolithic configuration file in which the process configuration is provided by using the target process name. For example here

https://github.com/nf-core/rnaseq/blob/master/conf/modules.config#L174-L192

In some case, there's a conditional logic that needs to be duplicated in the workflow script. For example here:

https://github.com/nf-core/rnaseq/blob/master/conf/modules.config#L209-L224

This approach has several disadvantages:

Monolithic configuration file
The process configuration is out of the context from the workflow code
Very fragile; it's enough a typo in the config file to break the process configuration
The use of monolithic breaks the workflow modularization
Very hard to maintain.

Solution

A possible solution could be to introduce a module-level configuration file that allows the definition of named configuration template that can applied to one or more processes in the same module.

For example the following configuration file could be added in a workflow module:

config.FOO.ext.args = "--x one --y two" 
config.FOO.publishDir = [path: "/some/path"]

config.BAR.ext.args = "--x three --y four" 
config.BAR.publishDir = [path: "/other/path/${params.suffix}"]

Note: in the example above, FOO and BAR are arbitrary identifiers the user chooses. They have no relationship with the process name where they are meant to be used.

In the workflow module, they could be used using the following syntax:

include { PROC1 } from './module/path1'
include { PROC2 } from './module/path2'

workflow NAME {
  PROC1(config.FOO)
  PROC2(config.BAR) 
}

Note: explorative syntax, it may change

In the above example, the process PROC1 is executed having the config setting defined by the template with name FOO is applied. Same with with process PROC2 with config BAR.

Conditional configuration could be defined using usual if and expression statement e.g.

workflow NAME {
  def cfg = params.something ? config.FOO : config.BAR
  PROC1(cfg)
}

This approach has several advantages:

keep the config settings in the nextflow config file, aside the module where it's expected to be used
promote a declarative approach for configuration files
decouple the configuration from the target process name and therefore, increase reusability
Avoid the verbosity of withNameselector mechanism

Caveats

Named config templates should only accessible in the module where they are defined
A conventional config file should used e.g. module.config or main.config
The withName selector should be able to override settings defined by a config template (to not break existing config e.g. groundswell)

The text was updated successfully, but these errors were encountered:

pditommaso · 2023-10-17T13:46:27Z

See also: #4205

Additional notes from hackathon:

need to evaluate process selectors vs new proposed syntax for module config
nf-core modules will only come with a test.config to be loaded by the test harness like a normal config, not used in real pipelines
pipeline developers will install a module and add a nextflow.config / module.config to each module with pipeline-specific config
process modules will not have module config, only subworkflows and above, because they are invoking the processes
module config can be overwritten by outer subworkflows, then pipeline-level config, then user config, then CLI options

I don't like that the proposed syntax (which I'll call the "programmatic config") requires you to create a new name and reference it in the pipeline script, but I see why you did it, so that the config can be set programmatically in the pipeline code.

The question is whether this programmatic config is needed. All of the if statements in the modules.config are used only to replicate the workflow logic to target a specific process, but they are eliminated if the config is defined in the module scope, even with conventional process selectors.

My impression from talking to developers at the hackathon is that they like process selectors but they just need a way to use them in the context of a module, to ensure that the selector won't be applied to completely different processes which they didn't intend.

Both process selectors and the programmatic config should work equally well, but the former is more familiar while the latter imposes some extra steps with dubious value.

For example, suppose workflow A calls subworkflow B which calls process C, and A wants to apply some config to C (taking precedence over config from B). By nf-core convention, each component will reside in its own module. The programmatic config would look like this:

// A/module.config
CONFIG_C.ext.args = '--foo'

// A/main.nf
workflow A {
  B(CONFIG_C)
}

// B/module.config
CONFIG_C.ext.args = '--bar'
CONFIG_C.publishDir = params.outdir

// B/main.nf
workflow B {
  take:
  config_c_extra
  main:
  C.config = CONFIG_C + config_c_extra
  C()
}

// C/main.nf
process C {
}

This strikes me as more complicated than it needs to be. The user is basically re-implementing the logic of process selectors in their pipeline code. While this programmability enables things that are not possible with (declarative) process selectors, I don't think uses are asking for those things.

As a baseline, I will first implement the module config with process selectors. If it enables the config to be fully declarative then that should be enough. Perhaps then we can also try the programmatic config and see what users prefer.

bentsherman · 2023-10-22T07:08:50Z

Rewriting the above example with process selectors:

// A/module.config
withName:'A:B:C' {
  ext.args = '--foo'
}

// A/main.nf
workflow A {
  B()
}

// B/module.config
withName:'B:C' {
  ext.args = '--bar'
  publishDir = params.outdir
}

// B/main.nf
workflow B {
  C()
}

// C/main.nf
process C {
}

Much simpler!

pditommaso · 2023-10-25T12:04:05Z

Ok, we talked. Let's go ahead for the first iteration using standard process selector syntax applied to process *included* in the local module.

bentsherman · 2023-11-14T21:39:18Z

Okay, the PR and rnaseq POC are working quite well. I was able to remove all of the if statements without triggering any config selector warnings, even for processes that aren't always used. This is because the module config is not applied to the process until the process is actually invoked, so there will never be any hanging configs. On the other hand, if you misspell a process selector then it should still warn you.

One minor detail is that the module config file name must always be the same, currently it is module.config. This is because Nextflow loads the module config by convention, the module script does not reference the config file directly.

As an alternative, we could do something like includeConfig for module scripts, which would allow the module script to explicitly import a config file by name:

// main.nf
includeConfig 'module.config'

process FOO {
  // ...
}

This way, users could name the config file whatever they want. On the other hand, it wouldn't make sense to support both approaches, so if we went this way then users would just have to specify the same include line of code many times. So I think I would rather not do this.

ewels · 2024-01-16T08:33:52Z

Related: made an issue about reducing config duplication of publishDir: #4661

bentsherman · 2024-04-25T15:50:25Z

Closing, see #4510 (comment)

pditommaso changed the title ~~Add module local named configuration template~~ [proposal] Add module local named configuration template Oct 17, 2023

pditommaso changed the title ~~[proposal] Add module local named configuration template~~ [proposal] Add local module named configuration template Oct 17, 2023

bentsherman mentioned this issue Oct 22, 2023

[proposal] Add process programmatic configuration context #4322

Closed

bentsherman added lang/config lang/modules labels Oct 22, 2023

pditommaso assigned bentsherman Oct 25, 2023

This was referenced Nov 9, 2023

Move process config into pipeline code nf-core/rnaseq#1081

Closed

Module config #4510

Closed

Use module config nf-core/rnaseq#1112

Closed

marcodelapierre modified the milestone: 23.11.0-edge Nov 17, 2023

bentsherman mentioned this issue Jan 12, 2024

load any nextflow.config that is at the same level as a .nf #4205

Closed

bentsherman closed this as not planned Won't fix, can't repro, duplicate, stale Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proposal] Add local module named configuration template #4422

[proposal] Add local module named configuration template #4422

pditommaso commented Oct 17, 2023 •

edited

Loading

pditommaso commented Oct 17, 2023

pditommaso commented Oct 17, 2023

pditommaso commented Oct 18, 2023

bentsherman commented Oct 22, 2023

bentsherman commented Oct 22, 2023

pditommaso commented Oct 25, 2023

bentsherman commented Nov 14, 2023

ewels commented Jan 16, 2024

bentsherman commented Apr 25, 2024

[proposal] Add local module named configuration template #4422

[proposal] Add local module named configuration template #4422

Comments

pditommaso commented Oct 17, 2023 • edited Loading

Context

Solution

Caveats

pditommaso commented Oct 17, 2023

pditommaso commented Oct 17, 2023

pditommaso commented Oct 18, 2023

bentsherman commented Oct 22, 2023

bentsherman commented Oct 22, 2023

pditommaso commented Oct 25, 2023

bentsherman commented Nov 14, 2023

ewels commented Jan 16, 2024

bentsherman commented Apr 25, 2024

pditommaso commented Oct 17, 2023 •

edited

Loading