Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module config #4510

Closed
wants to merge 13 commits into from
Closed

Module config #4510

wants to merge 13 commits into from

Conversation

bentsherman
Copy link
Member

Close #4422

Signed-off-by: Ben Sherman <[email protected]>
Copy link

netlify bot commented Nov 13, 2023

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit cf27fd4
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/65de0ff7a6dad4000863ec74

@bentsherman bentsherman mentioned this pull request Nov 13, 2023
6 tasks
Signed-off-by: Ben Sherman <[email protected]>
@marcodelapierre marcodelapierre added this to the 23.11.0-edge milestone Nov 17, 2023
docs/module.md Outdated Show resolved Hide resolved
Signed-off-by: Ben Sherman <[email protected]>
marcodelapierre

This comment was marked as outdated.

@marcodelapierre marcodelapierre self-requested a review November 22, 2023 06:28
docs/config.md Outdated Show resolved Hide resolved
docs/module.md Outdated Show resolved Hide resolved
docs/module.md Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we have unit tests about the composition of configs from multiple sources (ideally, process, module and workflow) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman for your consideration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit tests for edits in ExecutionStack ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman for your consideration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new unit tests required for IncludeDef?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new unit tests required for ScriptMeta?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bentsherman for your consideration

@marcodelapierre
Copy link
Member

Removed from edge November milestone, to allow time for team discussion

@marcodelapierre
Copy link
Member

@pditommaso @bentsherman Ran out of capacity to finalise some testing and articulate my thoughts on this one. The general lines is that I am not entirely bought on the proposed syntax.

Keen to resume on this in the new year, however if you two are confident on the way forward, I do not want to be the blocker.

@marcodelapierre
Copy link
Member

marcodelapierre commented Jan 10, 2024

OK, had a good review of the PR today, from the point of view of the design, also having a look at the Rnaseq POC.

I like a lot how a simple design change, i.e. having the module/subwf specific config files, can make such a big impact, i.e. enabling to modularise these configuration entries for improved readability, maintainability, reusability.

One thing I reflected on was Paolo's point on the fragility coming from process selector typos. I had a conversation with @bentsherman around this, and he mentioned the existence of a parameter, I believe nextflow.enable.configProcessNamesValidation, that enables spitting Warnings for no matches of process selectors.
Nf-core developers used to turn it off, to avoid false positives with the old monolithic process configuration.
Now, with modular configurations, Ben points out that false positives can be eliminated, and hence the parameter can be turned on. This will inform pipeline devs/users of no match / typo situations with process selectors.

In summary, I thing the design of this PR, beside matching most original requirements in #4422, provides at least 2 additional advantages:

  1. simplicity
  2. use of existing syntax (process selector): devs do not have to learn new syntax

@pditommaso , further comments on the design of this functionality?

I will go ahead tomorrow and review documentation and implementation.

[edit] PS: here I am assuming that we can decouple andd postpone tackling a more declarative approach to config, in reference to one of Paolo's initial points in #4422 . Happy to discuss further.

Copy link
Member

@marcodelapierre marcodelapierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bentsherman ,

I have finished reviewing, see my outstanding comments from my previous review (docs and tests), plus the new ones on the docs

docs/module.md Outdated Show resolved Hide resolved
docs/module.md Outdated Show resolved Hide resolved
docs/module.md Outdated Show resolved Hide resolved
docs/config.md Outdated Show resolved Hide resolved
docs/config.md Outdated Show resolved Hide resolved
@marcodelapierre
Copy link
Member

@pditommaso

#4416 to be also merged as part of this unit of work

@bentsherman bentsherman changed the title Add module config Module config Mar 18, 2024
@bentsherman bentsherman requested a review from a team as a code owner April 3, 2024 14:39
@bentsherman bentsherman modified the milestones: 24.01.0-edge, 24.04.0 Apr 15, 2024
@bentsherman
Copy link
Member Author

Now that we're converging on a solution for the workflow publish definition, I'm having second thoughts about this module config idea.

The module config further complicates the rules for config resolution, and at this point it would only be used to define ext config for processes. In other words, the module config only serves as a convenient shortcut for what should really be pipeline params.

I think we should discourage the use of ext to provide inputs to the process (i.e. values that are used in the script body), and instead encourage those settings to be made explicit inputs. If they need to be configurable by the user, they can be made so with pipeline params. This can be done today, but it isn't done because process selectors + ext is more convenient.

Moving ext config to explicit inputs will make the params list much longer, at least for nf-core/rnaseq, but it's more transparent and makes the params more comprehensive. Right now, the "inputs" of a process could come from the actual inputs, or pipeline params, or ext config, and that makes it harder to read. Instead, everything should be a process input, and the calling workflow can decide how to provide those inputs.

The big picture that I'm beginning to see is, there are many language features which were fine in DSL1 when everything was in a single script, but become troublesome in DSL2 with modules and subworkflows:

  • publishDir (already discussed)
  • using params anywhere (they should only be used in the anonymous workflow)
  • arbitrary Groovy code anywhere (it should only be allowed in the process / workflow / function body)
  • process selectors (overcomplicated by all the rules for defined vs imported name, simple vs fully qualified name, etc)

Another way to look at it is, if you consider the list of process directives, they all affect how tasks are executed (executor, hardware resources, software deps, error handling, etc), as opposed to what is executed (inputs, outputs, script). With two notable exceptions:

  1. publishDir, which we are moving away from
  2. ext, if it is used to provide inputs into the script (which I have argued is an anti-pattern)
    a. there is a valid use case for ext to define the how, e.g. YellowDog uses it for custom executor settings

So if process directives are used to control how a process is executed, and the how is usually specific to a pipeline-level config profile, then there is no need for module config because modules typically do not describe how they are executed, or when they do, such as conda / container, it is usually in the process definition.

@pditommaso
Copy link
Member

Completely agree. We should consider deprecating ext config option

@bentsherman
Copy link
Member Author

Okay, then I will close this PR. There are some general docs improvements here about the config resolution, I will make a separate PR for those

@pditommaso
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[proposal] Add local module named configuration template
3 participants