Merge branch 'vfan001'

nesi · Dec 5, 2023 · 1a170b4 · 1a170b4
2 parents 07ae0b0 + 4c6d448
commit 1a170b4
Show file tree

Hide file tree

Showing 3 changed files with 46 additions and 102 deletions.
diff --git a/docs/Scientific_Computing/Supported_Applications/GATK.md b/docs/Scientific_Computing/Supported_Applications/GATK.md
@@ -10,15 +10,6 @@ zendesk_article_id: 6443618773519
 zendesk_section_id: 360000040076
 ---
 
-
-
-[//]: <> (REMOVE ME IF PAGE VALIDATED)
-[//]: <> (vvvvvvvvvvvvvvvvvvvv)
-!!! warning
-    This page has been automatically migrated and may contain formatting errors.
-[//]: <> (^^^^^^^^^^^^^^^^^^^^)
-[//]: <> (REMOVE ME IF PAGE VALIDATED)
-
 The Genome Analysis Toolkit (GATK), developed at the [Broad
 Institute](http://www.broadinstitute.org/), provides a wide variety of
 tools focusing primarily on variant discovery and genotyping. It is
@@ -28,21 +19,17 @@ germline DNA and RNAseq data.
 General documentation for running GATK can be found at their website
 [here.](https://gatk.broadinstitute.org/hc/en-us)
 
-
-
 ## Running GATK
 
 GATK uses requires the Java Runtime Environment. The appropriate version
 of Java is already included as part of the GATK module, you will not
 need to load a Java module separately.
 
-
-
-**Note**  :
+!!! note
 
--   `--time` and `--mem` defined in the following example are just place
+- `--time` and `--mem` defined in the following example are just place
     holders.
--   Please load the GATK version of your choice
+- Please load the GATK version of your choice
 
 ``` sl
 #!/bin/bash -e
@@ -69,8 +56,6 @@ export _JAVA_OPTIONS=-Djava.io.tmpdir=${TMPDIR}
 gatk MarkDuplicates I=input.bam O=marked_duplicates.bam M=marked_dup_metrics.txt
 ```
 
-
-
 ### GATK-Picard
 
 GATK versions 4.0 or higher all contains a copy of the Picard toolkit,
@@ -89,8 +74,6 @@ the function of interest.
 Please also note that there are some inconsistencies between Picard and
 GATK flag naming conventions, so it is best to double check them.
 
-
-
 ## Common Issues
 
 ### Out of Memory or Insufficient Space for Shared Memory File
@@ -114,17 +97,9 @@ mkdir -p ${TMPDIR}
 export _JAVA_OPTIONS=-Djava.io.tmpdir=${TMPDIR} 
 ```
 
-
-
 ### File is not a supported reference file type
 
 The error message "File is not a supported reference file type" comes in
 one of the log files. It appears that sometimes GATK requires the file
 extension of "fasta" or "fa", for fasta files. Please make sure your
 file extensions correctly reflect the file type.
-
-
-
-
-
-
diff --git a/docs/Scientific_Computing/Supported_Applications/Trinity.md b/docs/Scientific_Computing/Supported_Applications/Trinity.md
@@ -10,18 +10,9 @@ zendesk_article_id: 360000980375
 zendesk_section_id: 360000040076
 ---
 
-
-
-[//]: <> (REMOVE ME IF PAGE VALIDATED)
-[//]: <> (vvvvvvvvvvvvvvvvvvvv)
-!!! warning
-    This page has been automatically migrated and may contain formatting errors.
-[//]: <> (^^^^^^^^^^^^^^^^^^^^)
-[//]: <> (REMOVE ME IF PAGE VALIDATED)
-
 Trinity, developed at the [Broad
 Institute](http://www.broadinstitute.org/) and the [Hebrew University of
-Jerusalem](http://www.cs.huji.ac.il/), performs de novo reconstruction
+Jerusalem](http://www.cs.huji.ac.il/), performs _de&nbsp;novo_ reconstruction
 of transcriptomes from RNA-seq data. It combines three independent
 software modules: Inchworm, Chrysalis, and Butterfly, applied
 sequentially to process large volumes of RNA-seq reads. Trinity
@@ -77,10 +68,10 @@ The following Slurm script is a template for running Trinity Phase 1
 
 **Note**  :
 
--   `--cpus-per-task` and `--mem` defined in the following example are
-    just place holders. 
--   Use a subset of your sample, run a test first to find the
-    suitable/required amount of CPUs and memory for your dataset
+- `--cpus-per-task` and `--mem` defined in the following example are
+  just place holders. 
+- Use a subset of your sample, run a test first to find the
+  suitable/required amount of CPUs and memory for your dataset
 
 
 
@@ -105,16 +96,16 @@ srun Trinity --no_distributed_trinity_exec \
 
 The extra Trinity arguments are:
 
--   `--no_distributed_trinity_exec` tells Trinity to stop before running
-    Phase 2
--   `--CPU ${SLURM_CPUS_PER_TASK}` tells Trinity to use the number of
-    CPUs specified by the sbatch option `--cpus-per-task` (i.e. you only
-    need to update it in one place if you change it)
--   `--max_memory` should be the same (or maybe slightly lower, so you
-    have a small buffer) than the value specified with the sbatch option
-    `--mem`
--   `[your_other_trinity_options]` should be replaced with the other
-    trinity options you would usually use, e.g. `--seqType fq`, etc.
+- `--no_distributed_trinity_exec` tells Trinity to stop before running
+  Phase 2
+- `--CPU ${SLURM_CPUS_PER_TASK}` tells Trinity to use the number of
+  CPUs specified by the sbatch option `--cpus-per-task` (i.e. you only
+  need to update it in one place if you change it)
+- `--max_memory` should be the same (or maybe slightly lower, so you
+  have a small buffer) than the value specified with the sbatch option
+  `--mem`
+- `[your_other_trinity_options]` should be replaced with the other
+  trinity options you would usually use, e.g. `--seqType fq`, etc.
 
 ### Running Trinity Phase 2
 
@@ -186,16 +177,16 @@ cmds_per_node=100
 
  The important details are:
 
--   `cmds_per_node` is the size of each batch of commands, i.e. here
-    each Slurm sub-job runs 100 commands and then exits
--   `max_nodes` is the number of sub-jobs that can be in the queue at
-    any given time (each sub-job is single threaded, i.e. it uses just
-    one core)
--   name this file SLURM.conf in the directory you will submit the job
-    from
--   memory usage may be low enough that the sub-jobs can be run on
-    either the large or bigmem partitions, which should improve
-    throughput compared to bigmem alone
+- `cmds_per_node` is the size of each batch of commands, i.e. here
+  each Slurm sub-job runs 100 commands and then exits
+- `max_nodes` is the number of sub-jobs that can be in the queue at
+  any given time (each sub-job is single threaded, i.e. it uses just
+  one core)
+- name this file `SLURM.conf` in the directory you will submit the job
+  from
+- memory usage may be low enough that the sub-jobs can be run on
+  either the large or bigmem partitions, which should improve
+  throughput compared to bigmem alone
 
 A template Slurm submission script for Trinity Phase 2 is shown below:
 
@@ -221,12 +212,12 @@ srun Trinity --CPU ${SLURM_CPUS_PER_TASK} --max_memory 20G \
   [your_other_trinity_options]
 ```
 
--   This assumes that you named the HPC GridRunner configuration script
-    SLURM.conf and placed it in the same directory that you submit this
-    job from
--   The options `--CPU` and `--max_memory` aren't used by Trinity in
-    "grid mode" but are still required to be set (i.e. it shouldn't
-    matter what you set them to)
+- This assumes that you named the HPC GridRunner configuration script
+  SLURM.conf and placed it in the same directory that you submit this
+  job from
+- The options `--CPU` and `--max_memory` aren't used by Trinity in
+  "grid mode" but are still required to be set (i.e. it shouldn't
+  matter what you set them to)
 
 ## Benchmarks
 
@@ -255,9 +246,8 @@ mini-assemblies to run in Phase 2.
 The table below summarises the timings for Phase 2, comparing the
 default, single node way to run Phase 2, to using Trinity's "grid mode".
 
-|                       |                                          |                              |                                |
+| Type of run           | Number of cores / grid specification     | Run time (hrs:mins:secs)     | Approximate core hour cost     |
 |-----------------------|------------------------------------------|------------------------------|--------------------------------|
-| **Type of run**       | **Number of cores / grid specification** | **Run time (hrs:mins:secs)** | **Approximate core hour cost** |
 | Single node (default) | 16 cores                                 | 24:09:36                     | 387                            |
 | Grid                  | max\_nodes=20; cmds\_per\_node=500       | 07:59:58                     | 168                            |
 | Grid                  | max\_nodes=40; cmds\_per\_node=500       | 04:10:45                     | 171                            |

diff --git a/docs/Scientific_Computing/Supported_Applications/snpEff.md b/docs/Scientific_Computing/Supported_Applications/snpEff.md
@@ -10,57 +10,38 @@ zendesk_article_id: 7403361932431
 zendesk_section_id: 360000040076
 ---
 
-
-
-[//]: <> (REMOVE ME IF PAGE VALIDATED)
-[//]: <> (vvvvvvvvvvvvvvvvvvvv)
-!!! warning
-    This page has been automatically migrated and may contain formatting errors.
-[//]: <> (^^^^^^^^^^^^^^^^^^^^)
-[//]: <> (REMOVE ME IF PAGE VALIDATED)
-
--   [Description](#h_01HA8MKM9Z3D2QHTDCW5R6V2S5)
--   [Configuration File](#h_01HA8M29QKYGBY6EA8Q6C5YS57)
--   [Example Script](#h_01HA8M29QKGQ7JFP2E0YV2Q849)
-
-## Description
-
 snpEff is a genetic variant annotation, and functional effect prediction
 tool.
 
-
-
 ## Configuration File
 
 snpEff requires a one-off configuration of the `.config` file. The
 following instructions are a one-off set up of the configuration file
 required for snpEff.
 
-1.  Load the latest version of the `snpEff` module.
+1. Load the latest version of the `snpEff` module.
 
-2.  Make a copy of the snpEff config file, replacing
-    &lt;project\_id&gt;, with your project ID.
+2. Make a copy of the snpEff config file, replacing
+   &lt;project\_id&gt;, with your project ID.
 
     ``` sl
     cp $EBROOTSNPEFF/snpEff.config /nesi/project/<project_id>/my_snpEff.config
     ```
 
-3.  Open the`my_snpEff.config` file, and edit **line 17** from the top
-    to point to a preferred path within your project directory or home
-    directory, e.g., edit line 17 `data.dir = ./data/` to something
-    like:`data.dir =/nesi/project/<project_id>`  
-    Please note that you must have read and write permissions to this
-    directory.
+3. Open the`my_snpEff.config` file, and edit **line 17** from the top
+   to point to a preferred path within your project directory or home
+   directory, e.g., edit line 17 `data.dir = ./data/` to something
+   like:`data.dir =/nesi/project/<project_id>`  
+   Please note that you must have read and write permissions to this
+   directory.
 
-4.  Run `snpEff.jar` using the `-c` flag to point to your new config
-    file, e.g., `-c path/to/snpEff/my_snpEff.config` For example:
+4. Run `snpEff.jar` using the `-c` flag to point to your new config
+   file, e.g., `-c path/to/snpEff/my_snpEff.config` For example:
 
     ``` sl
     java -jar $EBROOTSNPEFF/snpEff.jar -c /nesi/project/<project_id>/my_snpEff.config
     ```
 
-
-
 ## Example Script
 
 You will need to set up your configuration file before you run snpEff.
@@ -86,5 +67,3 @@ java -jar $EBROOTSNPEFF/snpEff.jar -h
 # run snpEff
 java -jar $EBROOTSNPEFF/snpEff.jar -c /nesi/project/<project_id>/my_snpEff.config <other flags>
 ```
-
-