[WIP] CNVkit tool definitions #93

anton-khodak · 2016-05-31T17:41:53Z

Standing PR to add tool descriptions (created by argparse2cwl) and tests for CNVkit tools .

Issues I encountered on first steps:

.gitignore prohibits to add bioinformatics stuff, i.e. test data, to the repository. How is it supposed to be tested then?
since I have no experience with bioinformatics yet, I don't know which data to use for running tools. I used random files with proper extensions which I downloaded from the Internet, but that approach doesn't work, for example, I got an error while running:

$ cnvkit.py batch --processes 1 
--normal test-files/s5DE199B-D6AF-C6EC-678A-DEC1179D1B97.fastq 
--fasta test-files/cnvkit-batch/ERCC92.fa 
--targets test-files/InfiniumPsychArray-24v1-1_A1.bed 
-annotate test-files/cnvkit-batch/refFlat.txt 
--split --access test-files/InfiniumPsychArray-24v1-1_A1.bed 
--output-dir . --scatter --diagram
Detected file format: BED
Applying annotations as target names
Splitting large targets
Traceback (most recent call last):
  File "/usr/local/bin/cnvkit.py", line 11, in <module>
    args.func(args)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/commands.py", line 96, in _cmd_batch
    args.processes, args.count_reads)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/commands.py", line 138, in batch_make_reference
    else {}))
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/commands.py", line 327, in do_targets
    ['chromosome', 'start', 'end', 'name'])
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/gary.py", line 66, in from_rows
    table = pd.DataFrame.from_records(rows, columns=columns)
  File "/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py", line 939, in from_records
    first_row = next(data)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/target.py", line 287, in split_targets
    for chrom, start, end, name in region_rows:
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/target.py", line 21, in assign_names
    ref_genes = read_refflat_genes(refflat_fname)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/target.py", line 80, in read_refflat_genes
    name, _rx, chrom, strand, start, end, _ex = parse_refflat_line(line)
  File "/usr/local/lib/python3.4/dist-packages/cnvlib/target.py", line 133, in parse_refflat_line
    assert len(exons) == int(exon_count), (
TypeError: object of type 'zip' has no len()

I think this error might be caused by irrelevant data.

Also, I couldn't find copy number reference profile sample files (.cnn) at all. If somebody who uses CNVkit frequently could give me a hint where to take proper data, my work in testing would have been much facilitated.

I didn't find where tool-name-test.yaml file format is specified. It was intuitively understandable what to write there, but I wish somebody pointed the standard for those files.
I didn't work with Docker images before, I need to spend some time learning how to write Dockerfiles.

mr-c · 2016-05-31T17:49:51Z

tools/cnvkit-batch.cwl

+                log-CNR of chrX; otherwise male samples would have -1 chrX).
+  inputBinding:
+    position: 2
+    prefix: --male-reference 


CWL tip: for Argparse only the position dependent arguments need their position specified. Arguments that have a prefix like --male-reference can occur in any order, so it would be nice if cwlargparse didn't specify the unneeded positions in these cases.

mr-c · 2016-06-08T13:05:54Z

@anton-khodak Did you look at https://travis-ci.org/common-workflow-language/workflows/builds/134234968 ?

mr-c · 2016-06-08T13:09:06Z

I think it is fine to just check in the generated descriptions; don't worry about writing a specific test. As long as the generated output parses, that is good enough for now.

brainstorm · 2016-06-10T13:10:45Z

I'm totally with @mr-c, we should focus first on CWL, not on specific tools since the amount of work can be quite substantial. If you want to see whether one of the CNVkit subtools works it's fine to dedicate some focused effort, but by no means aiming to cover the whole suite of tools.

Hope that makes sense ;)

brainstorm · 2016-06-10T13:17:16Z

OTOH, for a good example on how to test different tools (in my case SV callers), MetaSV has it quite well wrapped up:

https://github.com/bioinform/metasv

But this is just an example, don't spend too much time looking through it.

anton-khodak · 2016-06-10T13:34:31Z

@brainstorm , that's great! I misinterpreted the goal of the PR, it was not to pass Travis checks but to merely validate those tools. In that case, I'll fix the job file (@mr-c pointed indirectly on that issue) and push all other tools.

UPD. I should have looked more closely at test/cwltest.py... Travis CI checks the mere validity of tools, not how they are executed (with or without errors).

brainstorm · 2016-06-13T08:38:25Z

tools/cnvkit-docker.cwl

+  #################################################################
+
+  FROM python:2.7
+  MAINTAINER Anton Khodak <[email protected]>


This is great! Thanks for wrapping this on a docker container 👍

etal · 2016-06-21T14:50:09Z

Hi guys, I'm happy to help with testing CNVkit and/or tweaking the test suite to play better with argparse2cwl. You can skip wrapping anything marked "deprecated" (e.g. loh, genome2access), those parts will be removed in the next release. Just let me know anything else you need.

brainstorm · 2016-06-21T15:26:22Z

@etal, very happy to have you help Anton with that. I was looking at the outputs generated by argparse2cwl yesterday but since I never used CNVkit before, I'm missing a few bits of domain expertise there, so help is super welcome, thanks!

etal · 2016-07-04T03:05:41Z

tools/cnvkit-batch.cwl

+    prefix: --diagram 
+
+outputs:
+    []


There are several outputs from this command and they vary based on the input BAM filenames and the options given.

For each tumor/test-sample BAM named e.g. Sample.bam, the outputs are: "Sample.targetcoverage.cnn", "Sample.antitargetcoverage.cnn", "Sample.cnr", "Sample.cns"

If the --scatter option is given, then for each tumor/test sample, "Sample-scatter.pdf" is created

Similarly, the --diagram option creates "Sample-diagram.pdf"

For all of the above, if -d/--output-dir is specified, the created file names are relative to (i.e. in) that specified directory

If the -r/--reference option is not given, then a .cnn file is created either with the filename given by --output-reference (regardless of the -d/--output-dir path) or by default "cnv_reference.cnn"

etal · 2016-09-19T18:26:13Z

I've released a new minor version of CNVkit that drops the deprecated parts and introduces a few new options. I think the current CWL wrappers in Anton's repo should still work, but batch has a new --method option that's worth exposing. Let me know if there's anything else I can do to help complete and maintain these wrappers.

Initial incomplete cnvkit-batch tool definition

f205c3f

anton-khodak mentioned this pull request May 31, 2016

Open PR to add cnvkit wrappers to https://github.com/common-workflow-language/workflows common-workflow-lab/gxargparse#4

Closed

mr-c reviewed May 31, 2016
View reviewed changes

anton-khodak mentioned this pull request May 31, 2016

Remove posititon field for optional arguments common-workflow-lab/gxargparse#7

Closed

Anton Khodak added 5 commits June 10, 2016 17:00

Fix cnvkit-batch job

1de8a71

Fix cnvkit-batch job once again

467076d

remove stdout from test file

54a3847

confused letters

4a0d359

Add a few more tools

5da14f6

brainstorm reviewed Jun 13, 2016
View reviewed changes

etal mentioned this pull request Jun 21, 2016

Common Workflow Language tool descriptions etal/cnvkit#39

Open

etal reviewed Jul 4, 2016
View reviewed changes

Anton Khodak added 4 commits July 23, 2016 21:19

Update to CWL v.1.0

9d8fec8

Delete tools without jobs

1065d24

Remove all tools except from cnvkit

375acb2

Revert deleting files

5ce1aeb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] CNVkit tool definitions #93

[WIP] CNVkit tool definitions #93

anton-khodak commented May 31, 2016 •

edited

Loading

mr-c May 31, 2016

mr-c commented Jun 8, 2016

mr-c commented Jun 8, 2016

brainstorm commented Jun 10, 2016 •

edited

Loading

brainstorm commented Jun 10, 2016

anton-khodak commented Jun 10, 2016 •

edited

Loading

brainstorm Jun 13, 2016

etal commented Jun 21, 2016

brainstorm commented Jun 21, 2016

etal Jul 4, 2016

etal commented Sep 19, 2016 •

edited

Loading

[WIP] CNVkit tool definitions #93

Are you sure you want to change the base?

[WIP] CNVkit tool definitions #93

Conversation

anton-khodak commented May 31, 2016 • edited Loading

mr-c May 31, 2016

Choose a reason for hiding this comment

mr-c commented Jun 8, 2016

mr-c commented Jun 8, 2016

brainstorm commented Jun 10, 2016 • edited Loading

brainstorm commented Jun 10, 2016

anton-khodak commented Jun 10, 2016 • edited Loading

brainstorm Jun 13, 2016

Choose a reason for hiding this comment

etal commented Jun 21, 2016

brainstorm commented Jun 21, 2016

etal Jul 4, 2016

Choose a reason for hiding this comment

etal commented Sep 19, 2016 • edited Loading

anton-khodak commented May 31, 2016 •

edited

Loading

brainstorm commented Jun 10, 2016 •

edited

Loading

anton-khodak commented Jun 10, 2016 •

edited

Loading

etal commented Sep 19, 2016 •

edited

Loading