Skip to content

Commit

Permalink
Merge branch 'vfan001'
Browse files Browse the repository at this point in the history
  • Loading branch information
CallumWalley committed Dec 5, 2023
2 parents 07ae0b0 + 4c6d448 commit 1a170b4
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 102 deletions.
31 changes: 3 additions & 28 deletions docs/Scientific_Computing/Supported_Applications/GATK.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,6 @@ zendesk_article_id: 6443618773519
zendesk_section_id: 360000040076
---



[//]: <> (REMOVE ME IF PAGE VALIDATED)
[//]: <> (vvvvvvvvvvvvvvvvvvvv)
!!! warning
This page has been automatically migrated and may contain formatting errors.
[//]: <> (^^^^^^^^^^^^^^^^^^^^)
[//]: <> (REMOVE ME IF PAGE VALIDATED)

The Genome Analysis Toolkit (GATK), developed at the [Broad
Institute](http://www.broadinstitute.org/), provides a wide variety of
tools focusing primarily on variant discovery and genotyping. It is
Expand All @@ -28,21 +19,17 @@ germline DNA and RNAseq data.
General documentation for running GATK can be found at their website
[here.](https://gatk.broadinstitute.org/hc/en-us)



## Running GATK

GATK uses requires the Java Runtime Environment. The appropriate version
of Java is already included as part of the GATK module, you will not
need to load a Java module separately.



**Note**  :
!!! note

- `--time` and `--mem` defined in the following example are just place
- `--time` and `--mem` defined in the following example are just place
holders.
- Please load the GATK version of your choice
- Please load the GATK version of your choice

``` sl
#!/bin/bash -e
Expand All @@ -69,8 +56,6 @@ export _JAVA_OPTIONS=-Djava.io.tmpdir=${TMPDIR}
gatk MarkDuplicates I=input.bam O=marked_duplicates.bam M=marked_dup_metrics.txt
```



### GATK-Picard

GATK versions 4.0 or higher all contains a copy of the Picard toolkit,
Expand All @@ -89,8 +74,6 @@ the function of interest.
Please also note that there are some inconsistencies between Picard and
GATK flag naming conventions, so it is best to double check them.



## Common Issues

### Out of Memory or Insufficient Space for Shared Memory File
Expand All @@ -114,17 +97,9 @@ mkdir -p ${TMPDIR}
export _JAVA_OPTIONS=-Djava.io.tmpdir=${TMPDIR}
```



### File is not a supported reference file type

The error message "File is not a supported reference file type" comes in
one of the log files. It appears that sometimes GATK requires the file
extension of "fasta" or "fa", for fasta files. Please make sure your
file extensions correctly reflect the file type.






74 changes: 32 additions & 42 deletions docs/Scientific_Computing/Supported_Applications/Trinity.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,9 @@ zendesk_article_id: 360000980375
zendesk_section_id: 360000040076
---



[//]: <> (REMOVE ME IF PAGE VALIDATED)
[//]: <> (vvvvvvvvvvvvvvvvvvvv)
!!! warning
This page has been automatically migrated and may contain formatting errors.
[//]: <> (^^^^^^^^^^^^^^^^^^^^)
[//]: <> (REMOVE ME IF PAGE VALIDATED)

Trinity, developed at the [Broad
Institute](http://www.broadinstitute.org/) and the [Hebrew University of
Jerusalem](http://www.cs.huji.ac.il/), performs de novo reconstruction
Jerusalem](http://www.cs.huji.ac.il/), performs _de&nbsp;novo_ reconstruction
of transcriptomes from RNA-seq data. It combines three independent
software modules: Inchworm, Chrysalis, and Butterfly, applied
sequentially to process large volumes of RNA-seq reads. Trinity
Expand Down Expand Up @@ -77,10 +68,10 @@ The following Slurm script is a template for running Trinity Phase 1

**Note**  :

- `--cpus-per-task` and `--mem` defined in the following example are
just place holders. 
- Use a subset of your sample, run a test first to find the
suitable/required amount of CPUs and memory for your dataset
- `--cpus-per-task` and `--mem` defined in the following example are
just place holders. 
- Use a subset of your sample, run a test first to find the
suitable/required amount of CPUs and memory for your dataset



Expand All @@ -105,16 +96,16 @@ srun Trinity --no_distributed_trinity_exec \

The extra Trinity arguments are:

- `--no_distributed_trinity_exec` tells Trinity to stop before running
Phase 2
- `--CPU ${SLURM_CPUS_PER_TASK}` tells Trinity to use the number of
CPUs specified by the sbatch option `--cpus-per-task` (i.e. you only
need to update it in one place if you change it)
- `--max_memory` should be the same (or maybe slightly lower, so you
have a small buffer) than the value specified with the sbatch option
`--mem`
- `[your_other_trinity_options]` should be replaced with the other
trinity options you would usually use, e.g. `--seqType fq`, etc.
- `--no_distributed_trinity_exec` tells Trinity to stop before running
Phase 2
- `--CPU ${SLURM_CPUS_PER_TASK}` tells Trinity to use the number of
CPUs specified by the sbatch option `--cpus-per-task` (i.e. you only
need to update it in one place if you change it)
- `--max_memory` should be the same (or maybe slightly lower, so you
have a small buffer) than the value specified with the sbatch option
`--mem`
- `[your_other_trinity_options]` should be replaced with the other
trinity options you would usually use, e.g. `--seqType fq`, etc.

### Running Trinity Phase 2

Expand Down Expand Up @@ -186,16 +177,16 @@ cmds_per_node=100

 The important details are:

- `cmds_per_node` is the size of each batch of commands, i.e. here
each Slurm sub-job runs 100 commands and then exits
- `max_nodes` is the number of sub-jobs that can be in the queue at
any given time (each sub-job is single threaded, i.e. it uses just
one core)
- name this file SLURM.conf in the directory you will submit the job
from
- memory usage may be low enough that the sub-jobs can be run on
either the large or bigmem partitions, which should improve
throughput compared to bigmem alone
- `cmds_per_node` is the size of each batch of commands, i.e. here
each Slurm sub-job runs 100 commands and then exits
- `max_nodes` is the number of sub-jobs that can be in the queue at
any given time (each sub-job is single threaded, i.e. it uses just
one core)
- name this file `SLURM.conf` in the directory you will submit the job
from
- memory usage may be low enough that the sub-jobs can be run on
either the large or bigmem partitions, which should improve
throughput compared to bigmem alone

A template Slurm submission script for Trinity Phase 2 is shown below:

Expand All @@ -221,12 +212,12 @@ srun Trinity --CPU ${SLURM_CPUS_PER_TASK} --max_memory 20G \
[your_other_trinity_options]
```

- This assumes that you named the HPC GridRunner configuration script
SLURM.conf and placed it in the same directory that you submit this
job from
- The options `--CPU` and `--max_memory` aren't used by Trinity in
"grid mode" but are still required to be set (i.e. it shouldn't
matter what you set them to)
- This assumes that you named the HPC GridRunner configuration script
SLURM.conf and placed it in the same directory that you submit this
job from
- The options `--CPU` and `--max_memory` aren't used by Trinity in
"grid mode" but are still required to be set (i.e. it shouldn't
matter what you set them to)

## Benchmarks

Expand Down Expand Up @@ -255,9 +246,8 @@ mini-assemblies to run in Phase 2.
The table below summarises the timings for Phase 2, comparing the
default, single node way to run Phase 2, to using Trinity's "grid mode".

| | | | |
| Type of run | Number of cores / grid specification | Run time (hrs:mins:secs) | Approximate core hour cost |
|-----------------------|------------------------------------------|------------------------------|--------------------------------|
| **Type of run** | **Number of cores / grid specification** | **Run time (hrs:mins:secs)** | **Approximate core hour cost** |
| Single node (default) | 16 cores | 24:09:36 | 387 |
| Grid | max\_nodes=20; cmds\_per\_node=500 | 07:59:58 | 168 |
| Grid | max\_nodes=40; cmds\_per\_node=500 | 04:10:45 | 171 |
Expand Down
43 changes: 11 additions & 32 deletions docs/Scientific_Computing/Supported_Applications/snpEff.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,57 +10,38 @@ zendesk_article_id: 7403361932431
zendesk_section_id: 360000040076
---



[//]: <> (REMOVE ME IF PAGE VALIDATED)
[//]: <> (vvvvvvvvvvvvvvvvvvvv)
!!! warning
This page has been automatically migrated and may contain formatting errors.
[//]: <> (^^^^^^^^^^^^^^^^^^^^)
[//]: <> (REMOVE ME IF PAGE VALIDATED)

- [Description](#h_01HA8MKM9Z3D2QHTDCW5R6V2S5)
- [Configuration File](#h_01HA8M29QKYGBY6EA8Q6C5YS57)
- [Example Script](#h_01HA8M29QKGQ7JFP2E0YV2Q849)

## Description

snpEff is a genetic variant annotation, and functional effect prediction
tool.



## Configuration File

snpEff requires a one-off configuration of the `.config` file. The
following instructions are a one-off set up of the configuration file
required for snpEff.

1. Load the latest version of the `snpEff` module.
1. Load the latest version of the `snpEff` module.

2. Make a copy of the snpEff config file, replacing
&lt;project\_id&gt;, with your project ID.
2. Make a copy of the snpEff config file, replacing
&lt;project\_id&gt;, with your project ID.

``` sl
cp $EBROOTSNPEFF/snpEff.config /nesi/project/<project_id>/my_snpEff.config
```

3. Open the`my_snpEff.config` file, and edit **line 17** from the top
to point to a preferred path within your project directory or home
directory, e.g., edit line 17 `data.dir = ./data/` to something
like:`data.dir =/nesi/project/<project_id>`
Please note that you must have read and write permissions to this
directory.
3. Open the`my_snpEff.config` file, and edit **line 17** from the top
to point to a preferred path within your project directory or home
directory, e.g., edit line 17 `data.dir = ./data/` to something
like:`data.dir =/nesi/project/<project_id>`
Please note that you must have read and write permissions to this
directory.

4. Run `snpEff.jar` using the `-c` flag to point to your new config
file, e.g., `-c path/to/snpEff/my_snpEff.config` For example:
4. Run `snpEff.jar` using the `-c` flag to point to your new config
file, e.g., `-c path/to/snpEff/my_snpEff.config` For example:

``` sl
java -jar $EBROOTSNPEFF/snpEff.jar -c /nesi/project/<project_id>/my_snpEff.config
```



## Example Script

You will need to set up your configuration file before you run snpEff.
Expand All @@ -86,5 +67,3 @@ java -jar $EBROOTSNPEFF/snpEff.jar -h
# run snpEff
java -jar $EBROOTSNPEFF/snpEff.jar -c /nesi/project/<project_id>/my_snpEff.config <other flags>
```


0 comments on commit 1a170b4

Please sign in to comment.