Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kraken2/add module #4703

Merged
merged 36 commits into from
Feb 22, 2024
Merged

Kraken2/add module #4703

merged 36 commits into from
Feb 22, 2024

Conversation

alxndrdiaz
Copy link
Member

PR checklist

Closes #2953

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware

@alxndrdiaz alxndrdiaz added new module Adding a new module nf-test labels Jan 10, 2024
@alxndrdiaz alxndrdiaz requested a review from jfy133 January 10, 2024 02:37
@alxndrdiaz alxndrdiaz self-assigned this Jan 10, 2024
@alxndrdiaz alxndrdiaz requested a review from a team as a code owner January 10, 2024 02:37
@alxndrdiaz
Copy link
Member Author

alxndrdiaz commented Jan 10, 2024

Files created in output directory:

        └── [4.0K]  test
            └── [4.0K]  library
                └── [4.0K]  added
                    ├── [ 564]  prelim_map_Mkw71vtzg1.txt
                    ├── [ 13K]  SYkwAEUmm7.fna
                    └── [   0]  SYkwAEUmm7.fna.masked

Error in .command.err when running nf-core modules test kraken2/add :

Status: Downloaded newer image for quay.io/biocontainers/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:87fc08d11968d081f3e8a37131c1f1f6715b6542-0
Masking low-complexity regions of new file... done.
Added "proteome.fasta" to library (test)
chown: unrecognized option '--from'
BusyBox v1.32.1 (2021-04-13 11:15:36 UTC) multi-call binary.

Usage: chown [-RhLHPcvf]... USER[:[GRP]] FILE...

Change the owner and/or group of each FILE to USER and/or GRP

        -R      Recurse
        -h      Affect symlinks instead of symlink targets
        -L      Traverse all symlinks to directories
        -H      Traverse symlinks on command line only
        -P      Don't traverse symlinks (default)
        -c      List changed files
        -v      List all files
        -f      Hide errors
╭────────────────────────────────────────────────────────────────────────── nf-test output ──────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                    │
│ 🚀 nf-test 0.8.2                                                                                                                                                   │
│ https://code.askimed.com/nf-test                                                                                                                                   │
│ (c) 2021 - 2023 Lukas Forer and Sebastian Schoenherr                                                                                                               │
│                                                                                                                                                                    │
│ Found 1 files in test directory.                                                                                                                                   │
│ Error: java.io.IOException: Launch Directory '/home/path/nf-core/modules/.nf-test/tests/cf01ae0d03dee5f3be28629449d283d6' could not be deleted or created:       │
│ java.nio.file.AccessDeniedException:                                                                                                                               │
│ /home/path/nf-core/modules/.nf-test/tests/cf01ae0d03dee5f3be28629449d283d6/work/16/81ff13303c18ac143b4d118cce0cc0/test/library/added/SYkwAEUmm7.fna.masked       │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
INFO     All tests passed!     

@jfy133
Copy link
Member

jfy133 commented Jan 10, 2024

I'm not sure about the chown error, but for the nf-test error, I seem to get this regularly, and have to run e.g., in your case:

sudo rm -rf /home/path/nf-core/modules/.nf-test/tests/cf01ae0d03dee5f3be28629449d283d6

And then run the nf-core module test command again to get it to work.

@alxndrdiaz
Copy link
Member Author

alxndrdiaz commented Jan 11, 2024

I'm not sure about the chown error, but for the nf-test error, I seem to get this regularly, and have to run e.g., in your case:

sudo rm -rf /home/path/nf-core/modules/.nf-test/tests/cf01ae0d03dee5f3be28629449d283d6

And then run the nf-core module test command again to get it to work.

Thanks, this is helpful to clean the nf-test output. One thing I noticed after this output was cleaner is that $prefix is not assigned as the name of the output database as expected (it should be test/ instead of null/):

  Nextflow stdout:                                                                                                                                            │
│                                                                                                                                                               │
│   ERROR ~ Error executing process > 'KRAKEN2_ADD (test)'                                                                                                      │
│                                                                                                                                                               │
│   Caused by:                                                                                                                                                  │
│     Missing output file(s) `null/` expected by process `KRAKEN2_ADD (test)`                                                                                   │
│                                                                                                                                                               │
│   Command executed:                                                                                                                                           │
│                                                                                                                                                               │
│     kraken2-build \                                                                                                                                           │
│         --add-to-library \                                                                                                                                    │
│         proteome.fasta \                                                                                                                                      │
│         --db test \                                                                                                                                           │
│         --threads 2 \                                                                                                                                         │
│                                                                                                                                                               │
│     cat <<-END_VERSIONS > versions.yml                                                                                                                        │
│     "KRAKEN2_ADD":                                                                                                                                            │
│         kraken2: $(echo $(kraken2 --version 2>&1) | sed 's/^.*Kraken version //; s/ .*$//')                                                                   │
│     END_VERSIONS                                                                                                                                              │
│                                                                                                                                                               │
│   Command exit status:                                                                                                                                        │
│     0                                                                                                                                                         │
│                                                                                                                                                               │
│   Command output:                                                                                                                                             │
│     (empty)    

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a lot of comments here, but it's more structural/design rather than the quality of the module against the guidelines (which is good) .

I also identified the error and the reason why you couldn't build the snapshot (I thnk)

Can chat about this on slack if it doesn't make sense.

modules/nf-core/kraken2/add/main.nf Show resolved Hide resolved

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-5799ab18b5fc681e75923b2450abaa969907ec98:87fc08d11968d081f3e8a37131c1f1f6715b6542-0' :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is in this mulled container exactly, when the conda environment is just kraken2? I wonder if this is also possibly causing the issue with the chown error message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a different container:

   container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/kraken2:2.1.2--pl5321h9f5acd7_2' :
        'biocontainers/kraken2:2.1.2--pl5321h9f5acd7_2' }"

But the error persists...

modules/nf-core/kraken2/add/main.nf Show resolved Hide resolved
modules/nf-core/kraken2/add/main.nf Outdated Show resolved Hide resolved
modules/nf-core/kraken2/add/main.nf Outdated Show resolved Hide resolved
modules/nf-core/kraken2/add/meta.yml Show resolved Hide resolved
Comment on lines 12 to 14
path taxonomy_names
path taxonomy_nodes
path accession2taxid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF we stick with the standalone ADD module - would this still be necessary? I'm assuming yes, but we need to be careful how this works as these would have to be staged into an existing $prefix/taxonomy directory, and make sure this doesn't get overwritten on every add...

Which also leads to the question, what do we do if we want to add multiple FASTAs in one (so path(fasta) is a list)...? The examples in the kraken2 manual suggest using for loops or find/xargs combination

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF we stick with the standalone ADD module - would this still be necessary? I'm assuming yes, but we need to be careful how this works as these would have to be staged into an existing $prefix/taxonomy directory, and make sure this doesn't get overwritten on every add...

Which also leads to the question, what do we do if we want to add multiple FASTAs in one (so path(fasta) is a list)...? The examples in the kraken2 manual suggest using for loops or find/xargs combination

That's correct, I forgot to put all those files in the taxonomy directory (db/taxonomy) as expected by kraken2.

modules/nf-core/kraken2/add/meta.yml Show resolved Hide resolved
Comment on lines 36 to 39
file(params.test_data['sarscov2']['genome']['proteome_fasta'], checkIfExists: true)
]
input[1] = file(params.test_data['sarscov2']['metagenome']['prot_names_dmp'], checkIfExists: true)
input[2] = file(params.test_data['sarscov2']['metagenome']['prot_nodes_dmp'], checkIfExists: true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really work with protein accessions!?

Copy link
Member Author

@alxndrdiaz alxndrdiaz Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really work with protein accessions!?

It should work , there is protein flag:

--protein                  Build a protein database for translated search

Kraken2: Translated search: "either along with --standard, or with all steps if building a custom database"

@jfy133
Copy link
Member

jfy133 commented Feb 14, 2024

So you mean it works manually in the docker image with the original?!

@alxndrdiaz
Copy link
Member Author

alxndrdiaz commented Feb 14, 2024

So you mean it works manually in the docker image with the original?!

Yes. The only problem is that the permissions of the output files are changed to root when using the Docker image (still don't know why), it seems that this prevents any further file operations. Also running the test with:

nf-core modules test kraken2/add --update 

Generates the output files but the permissions are changed as well:

# test directory
ldir="/home/nf-core/modules/.nf-test/tests/cf01ae0d03dee5f3be28629449d283d6/work/5a/3c65cbd2f4e9e57e0cde9a618d9e48" 
# check permissions of output files
ls -l $ldir/test/library/added
total 20
-rw-r--r-- 1 root root 13383 feb 14 08:59 mg7bS0iePD.fna
-rw-r--r-- 1 root root     0 feb 14 08:59 mg7bS0iePD.fna.masked
-rw------- 1 root root   564 feb 14 08:59 prelim_map_Wokcrk78XD.txt

Important: this is only when running the test with Docker, I have not checked Singularity or Conda.

@jfy133
Copy link
Member

jfy133 commented Feb 15, 2024

I built a mulled container that works with the module 🎉

I just need to request the official one :), and then you can go about fighting nf-test (the output isn't stable 😅 )

 java.lang.RuntimeException: Different Snapshot:                                                                                                                              │
│ Found:                                                                                                                                                                       │
│ [                                                                                                                                                                            │
│     [                                                                                                                                                                        │
│         [                                                                                                                                                                    │
│             {                                                                                                                                                                │
│                 "id": "test"                                                                                                                                                 │
│             },                                                                                                                                                               │
│             [                                                                                                                                                                │
│                 [                                                                                                                                                            │
│                     [                                                                                                                                                        │
│                         "prelim_map_9AtSQCgKaG.txt:md5,229620e9ad30fc411f025597f4e6379f",                                                                                    │
│                         "qIS93ZUMlY.fna:md5,befde39f5f4d5cebb25e57328830f321",                                                                                               │
│                         "qIS93ZUMlY.fna.masked:md5,d41d8cd98f00b204e9800998ecf8427e"                                                                                         │
│                     ]                                                                                                                                                        │
│                 ],                                                                                                                                                           │
│                 [                                                                                                                                                            │
│                     "prot.accession2taxid:md5,c0f96ba5dbb00150b4b805ba6dab7bea",                                                                                             │
│                     "prot_names.dmp:md5,130f9132095562e09c732679c562f5e9",                                                                                                   │
│                     "prot_nodes.dmp:md5,c471c27a4ce85ae74d2c63633c9ce1e3"                                                                                                    │
│                 ]                                                                                                                                                            │
│             ]                                                                                                                                                                │
│         ]                                                                                                                                                                    │
│     ],                                                                                                                                                                       │
│     [                                                                                                                                                                        │
│         "versions.yml:md5,9a7b40921622d8c873fb3bfd8bac4c3d"                                                                                                                  │
│     ]                                                                                                                                                                        │
│ ]                                                                                                                                                                            │
│                                                                                                                                                                              │
│ Expected:                                                                                                                                                                    │
│ [                                                                                                                                                                            │
│     [                                                                                                                                                                        │
│         [                                                                                                                                                                    │
│             {                                                                                                                                                                │
│                 "id": "test"                                                                                                                                                 │
│             },                                                                                                                                                               │
│             [                                                                                                                                                                │
│                 [                                                                                                                                                            │
│                     [                                                                                                                                                        │
│                         "aABaLor5yN.fna:md5,befde39f5f4d5cebb25e57328830f321",                                                                                               │
│                         "aABaLor5yN.fna.masked:md5,d41d8cd98f00b204e9800998ecf8427e",                                                                                        │
│                         "prelim_map_AE_Kctf91Q.txt:md5,229620e9ad30fc411f025597f4e6379f"                                                                                     │
│                     ]                                                                                                                                                        │
│                 ],                                                                                                                                                           │
│                 [                                                                                                                                                            │
│                     "prot.accession2taxid:md5,c0f96ba5dbb00150b4b805ba6dab7bea",                                                                                             │
│                     "prot_names.dmp:md5,130f9132095562e09c732679c562f5e9",                                                                                                   │
│                     "prot_nodes.dmp:md5,c471c27a4ce85ae74d2c63633c9ce1e3"                                                                                                    │
│                 ]                                                                                                                                                            │
│             ]                                                                                                                                                                │
│         ]                                                                                                                                                                    │
│     ],                                                                                                                                                                       │
│     [                                                                                                                                                                        │
│         "versions.yml:md5,9a7b40921622d8c873fb3bfd8bac4c3d"                                                                                                                  │
│     ]                                                                                                                                                                        │
│ ]   

@jfy133
Copy link
Member

jfy133 commented Feb 15, 2024

@alxndrdiaz there you go! Singularity will come tomorrow I guess.

in the meantime (if you have time) can you:

  1. Add a bash for loop, so it runs through all possible fastas in the fasta input channel
  2. Complete the nf-test assertions, but note it is unstable (there are files made with random strings... might have to ignore those...)

@alxndrdiaz
Copy link
Member Author

@alxndrdiaz there you go! Singularity will come tomorrow I guess.

in the meantime (if you have time) can you:

1. Add a bash for loop, so it runs through all possible fastas in the fasta input channel

2. Complete the nf-test assertions, but note it is unstable (there are files made with random strings... might have to ignore those...)

@jfy133 [2] One (dummy) solution is to ignore the output directory (this makes the Docker test to work) in the main.nf.test file :

  assertAll(
                { assert process.success }
            )

[1] Tried to use to pass multiple FASTA files but still need to find out how to pass a path() within the main.nf.test file instead of file(), alsoaneed to specify the file extensions (fastaext) to allow the command to know which files to pass, but I am still trying to find out how to do this with nf-test :

 find -L $ {fastas} -name ${fastaext} -print0 | \\
    xargs -0 -I{} -n1 kraken2-build --add-to-library {}  --db ${prefix} --threads $task.cpus $args

@alxndrdiaz
Copy link
Member Author

alxndrdiaz commented Feb 16, 2024

@alxndrdiaz there you go! Singularity will come tomorrow I guess.
in the meantime (if you have time) can you:

1. Add a bash for loop, so it runs through all possible fastas in the fasta input channel

2. Complete the nf-test assertions, but note it is unstable (there are files made with random strings... might have to ignore those...)

@jfy133 [2] One (dummy) solution is to ignore the output directory (this makes the Docker test to work) in the main.nf.test file :

  assertAll(
                { assert process.success }
            )

[1] Tried to use to pass multiple FASTA files but still need to find out how to pass a path() within the main.nf.test file instead of file(), also need to specify the file extensions (fastaext) to allow the command to know which files to pass, but I am still trying to find out how to do this with nf-test :

input:
   tuple val(meta), path(fastas)
   val fastaext
   path taxonomy_names, stageAs: 'taxonomy/*'
   path taxonomy_nodes, stageAs: 'taxonomy/*'
   path accession2taxid, stageAs: 'taxonomy/*'

   output:
   tuple val(meta), path("$prefix"), emit: db
   path "versions.yml"             , emit: versions

   when:
   task.ext.when == null || task.ext.when

   script:
   def args = task.ext.args ?: ''
   prefix = task.ext.prefix ?: "${meta.id}"
   """
   mkdir ${prefix}
   mv "taxonomy" ${prefix}
   find -L ${fastas} -name ${fastaext} -print0 | \\
   xargs -0 -I{} -n1 kraken2-build --add-to-library {} --db ${prefix} --threads $task.cpus $args
   cat <<-END_VERSIONS > versions.yml
   "${task.process}":
       kraken2: \$(echo \$(kraken2 --version 2>&1) | sed 's/^.*Kraken version //; s/ .*\$//')
   END_VERSIONS
   """

This is how the main.nf.test look like:

        when {
            process {
                """
                input[0] = [
                    [ id:'test' ], // meta map
                    path(params.test_data['sarscov2']['genome'], checkIfExists: true)
                    ]
                input[1] = 'proteome.fasta'
                input[2] = file(params.test_data['sarscov2']['metagenome']['prot_names_dmp'], checkIfExists: true)
                input[3] = file(params.test_data['sarscov2']['metagenome']['prot_nodes_dmp'], checkIfExists: true)
                input[4] = GUNZIP.out.gunzip.map{ it[1] }
                """
            }

@jfy133
Copy link
Member

jfy133 commented Feb 16, 2024

I should have got the (1) parallle adding now; so I think we can go back to making (2) properly :)

@alxndrdiaz
Copy link
Member Author

I should have got the (1) parallle adding now; so I think we can go back to making (2) properly :)

These assertions only check [1] if the process is run successfully, [2] an output test directory is created, [3] versions file.

  then {
            assertAll(
                { assert process.success },
                { assert process.out.db.get(0).get(1) ==~ ".*/test" },
                { assert snapshot(
                        process.out.versions
                    ).match() }
            )
        }

Trying to use assert process.out.db.get(0).get(1).get(0) does not work (probably because the channel is a directory and not a single file), the following error is raised in this case:

 groovy.lang.MissingMethodException: No signature of method: java.lang.String.get() is applicable for argument types: (Integer) values: [0]                         │
│ Possible solutions: getAt(int), getAt(int), grep(), next(), getAt(groovy.lang.Range), getAt(groovy.lang.Range) 

@jfy133
Copy link
Member

jfy133 commented Feb 20, 2024

I tweaked the tests slightly to make them more explicit, but I think we are there!!!

Copy link
Contributor

@TCLamnidis TCLamnidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very minor suggestions and formatting things.
The nf-tests assertions look pants, as promised, but I've been reassured by @jfy133 all output files have ever-changing nonsensical names, so I am gonna give this a green light.

modules/nf-core/kraken2/add/meta.yml Show resolved Hide resolved
modules/nf-core/kraken2/add/meta.yml Outdated Show resolved Hide resolved
modules/nf-core/kraken2/add/tests/main.nf.test Outdated Show resolved Hide resolved
@alxndrdiaz alxndrdiaz added this pull request to the merge queue Feb 22, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 22, 2024
@alxndrdiaz alxndrdiaz added this pull request to the merge queue Feb 22, 2024
@alxndrdiaz alxndrdiaz removed this pull request from the merge queue due to a manual request Feb 22, 2024
@alxndrdiaz alxndrdiaz added this pull request to the merge queue Feb 22, 2024
Merged via the queue into nf-core:master with commit a414035 Feb 22, 2024
11 checks passed
@alxndrdiaz alxndrdiaz deleted the kraken2/add branch February 22, 2024 20:57
jch-13 pushed a commit to jch-13/modules that referenced this pull request Mar 19, 2024
* first commit

* arrange inputs

* nf-test template

* fix test_data directories

* fix inputs

* updated test

* added tag:untar

* first commit

* arrange inputs

* nf-test template

* fix test_data directories

* fix inputs

* updated test

* added tag:untar

* remove whitespace

* Update modules/nf-core/kraken2/add/main.nf

Co-authored-by: James A. Fellows Yates <[email protected]>

* Update modules/nf-core/kraken2/add/main.nf

Co-authored-by: James A. Fellows Yates <[email protected]>

* move taxonomy files

* Update modules/nf-core/kraken2/add/main.nf

Co-authored-by: James A. Fellows Yates <[email protected]>

* Update modules/nf-core/kraken2/add/meta.yml

Co-authored-by: James A. Fellows Yates <[email protected]>

* use gzip instead of untar

* use stageAs for taxonomy file

* run test with --update flag

* Add working container (singularity to come later)

* remove output directory assertions

* Add parallelisation

* check test output directory

* check versions nf-test

* Improve tests

* Fix singularity path

* Apply suggestions from code review

Co-authored-by: Thiseas C. Lamnidis <[email protected]>

---------

Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: Thiseas C. Lamnidis <[email protected]>
jennylsmith pushed a commit to RSC-RP/modules that referenced this pull request Mar 20, 2024
* first commit

* arrange inputs

* nf-test template

* fix test_data directories

* fix inputs

* updated test

* added tag:untar

* first commit

* arrange inputs

* nf-test template

* fix test_data directories

* fix inputs

* updated test

* added tag:untar

* remove whitespace

* Update modules/nf-core/kraken2/add/main.nf

Co-authored-by: James A. Fellows Yates <[email protected]>

* Update modules/nf-core/kraken2/add/main.nf

Co-authored-by: James A. Fellows Yates <[email protected]>

* move taxonomy files

* Update modules/nf-core/kraken2/add/main.nf

Co-authored-by: James A. Fellows Yates <[email protected]>

* Update modules/nf-core/kraken2/add/meta.yml

Co-authored-by: James A. Fellows Yates <[email protected]>

* use gzip instead of untar

* use stageAs for taxonomy file

* run test with --update flag

* Add working container (singularity to come later)

* remove output directory assertions

* Add parallelisation

* check test output directory

* check versions nf-test

* Improve tests

* Fix singularity path

* Apply suggestions from code review

Co-authored-by: Thiseas C. Lamnidis <[email protected]>

---------

Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: Thiseas C. Lamnidis <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new module Adding a new module nf-test
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

new module: kraken2/build
3 participants