layout | title | author | author_url | date |
---|---|---|---|---|
protocols |
repp-assisted plasmid design |
Shaohe Wang |
2023-11-08 |
repp
stands for repository-based plasmid design. It is a command line tool that is very useful to automate some steps in plasmid design. See Timmons, J.J. & Densmore D. Repository-based plasmid design. PLOS One.. Because repp
is currently only available as a command line tool, the instructions below assumes some familiarity with the command line interface. Otherwise, please see this page. We assume Windows users use the Git Bash program for their command line interface.
The original repp
needs a multi-FASTA format sequence to build a database and the output is in json format, which is not convenient for typical molecular cloning work flow. Cristian Goina from the Janelia Scientific Computing Software team has helped us to implement several important i/o features to adapt for our typical cloning work flow, including building a database from a directory of plasmid seuqences in FASTA or GenBank format, re-using existing primers, and outputting convenient csv format spreadsheets.
Git is a powerful tool for version control. You don't need Git for installing and using repp
, but I encourage you to learn about it if unfamiliar. I highly recommend the Software Carpentry Lesson for Git.
-
Learn basic concepts of Git here.
-
Install Git.
- If you are using a new Mac, you probably have a quite new version of Git installed. Check by typing
git --version
in your terminal. If the version number is close to the current version, you can just use the pre-installed version. - Otherwise, follow instructions here to install Git on Windows, Mac or Linux systems.
- If you are using a new Mac, you probably have a quite new version of Git installed. Check by typing
-
Set up Git on your computer.
-
Install 3 dependencies:
go
,primer3
, andblast
. Skip to the next step if you are updatingrepp
.-
Install
go
(version >= 1.19) following instructions here. -
Install
blast
:- Go to the NCBI website to download the BLAST+ software.
- From this website, follow this ftp download link.
- For Mac (M1 or M2 chip OK), download "ncbi-blast-2.13.0+.dmg" for installation.
- (Optional but good practice) Check the md5 sum of the downloaded installation package by
cd
into the download folder and runmd5 ncbi-blast-2.13.0+.dmg
. Make sure the md5 sum matches the ncbi-blast-2.13.0+.dmg.md5 in the ftp download list. - Open the dmg file to install blast.
- Mac will warn you about this file is from "unidentified developer" and cannot be opened. You will need to go to "Security and Privacy" settings and click "Open Anyway" to open this installer.
- Check whether installation is OK by running
which blastn
, which should print out the path to theblastn
program (e.g., /usr/local/ncbi/blast/bin/blastn).
- (Optional but good practice) Check the md5 sum of the downloaded installation package by
- For Windows, download "ncbi-blast-2.13.0+-win64.exe"
- (Optional but good practice) Check the md5 sum of the downloaded installation package by
cd
into the download folder and runmd5sum ncbi-blast-2.13.0+-win64.exe
in Git Bash. Make sure the md5 sum matches what is listed in ncbi-blast-2.13.0+-win64.exe.md5 in the ftp download list. - Open the exe file to install blast.
- Check whether installation is OK by running
blastn -h
in Git Bash, which should print out the help message of the blastn program.
- (Optional but good practice) Check the md5 sum of the downloaded installation package by
- Go to the NCBI website to download the BLAST+ software.
-
Install
primer3
.-
If you use Git, run
git clone https://github.com/primer3-org/primer3.git
. -
If you don't use Git, go to the Primer3 GitHub page, click on the green button
Code
andDownload ZIP
. Unzip it. -
For Mac:
-
Assuming the source code of
primer3
is in the Downloads folder, run the following to compile, test, and installprimer3
:cd ~/Downloads/primer3/src make make test sudo make install
-
Check whether installation is OK by running
which primer3_core
, which should print out the path to theprimer3_core
program (e.g. /usr/local/bin/primer3_core).
-
-
For Windows:
-
Download and install TDM-GCC MinGW Compiler.
-
(Optional) Assuming the source code of
primer3
is in the Downloads folder, run the following to compile and testprimer3
:cd ~/Downloads/primer3/src mingw32-make TESTOPTS=--windows
-
Copy the "primer3/src" folder that contains the compiled binaries to your preferred location, e.g., "C:\Program Files\primer3\src".
-
Add the above folder to your
Path
Environment Variable.- Open Start Menu then type
Advanced system settings
and press Enter. - Click
Environment Variables
. - Select
Path
in the variable list and clickEdit...
to add the above directory.
- Open Start Menu then type
-
-
-
-
Install the Janelia SciComp version of
repp
.-
If you use Git, run
git clone https://github.com/JaneliaSciComp/repp.git
. -
If you don't use Git, go to the Janelia SciComp GitHub page, click on the green button
Code
andDownload ZIP
. Unzip it. -
Assuming the source code of
repp
is in the Downloads folder, run the following to compilerepp
:cd ~/Downloads/repp/cmd/repp go build
-
The above generates an executable in the same folder.
- For Mac, run
sudo mv repp /usr/local/bin/.
to move the executable to/usr/local/bin
or your preferred location. Type your password to give permission if prompted. - For Windows, copy the "repp.exe" file to your preferred location, e.g., "C:\Program Files\repp". Add this folder to your
Path
Environment Variable.- Open Start Menu then type
Advanced system settings
and press Enter. - Click
Environment Variables
towards the bottom of the dialogue. - Select
Path
in the variable list and clickEdit...
to add the above directory.
- Open Start Menu then type
- For Mac, run
-
Check whether installation is OK by running
which repp
, which should print out the path to therepp
program (e.g. /usr/local/bin/repp).
-
-
Download repp_test.zip for testing.
-
(Optional) Add sequence databases from remote repositories (e.g., Addgene, iGEM, DNASU).
As direct synthesis of DNA fragments becomes more affordable, the advantage of PCR amplification from existing plasmids diminishes. Moreover, acquiring a plasmid from repositories introduces additional time costs. If you have access to basic plasmid backbones, you may omit this step.
The original
repp
author has assembled FASTA files from Addgene, iGEM, and DNASU, and made it available from the S3 bucket. Run the following command to download and add them to therepp
sequence database on your computer:# download repository FASTA files for db in igem addgene dnasu; do curl -o "$db.fa.gz" "https://repp.s3.amazonaws.com/$db.fa.gz" gzip -d "$db.fa.gz" done # add sequence DBs with the cost of ordering a plasmid from each source repp add database --name igem --cost 0.0 < igem.fa repp add database --name addgene --cost 85.0 < addgene.fa repp add database --name dnasu --cost 99.0 < dnasu.fa
-
Add sequence databases from local collections.
Put all sequence files (GenBank or FASTA format) in a directory. In the repp_test example, this directory is called "lab-plasmid-collection". Run the following command to add them:
# -n is shorthand for --name # -c is shorthand for --cost # run "repp add database --help" for more options repp add database -n lab -c 0 lab-plasmid-collection
Note that adding sequence database is a one-time operation that stores the database files in a hidden directory. Adding a new directory to a database with the same name(e.g., -n lab) will overwrite it.
- On Mac, they are in "~/.repp/dbs".
- On Windows, they are in "C:/users/username/.repp/dbs".
-
(Recommened) Organize your primer database.
Although not strictly required, it is highly recommended to create an organized primer database for re-using primers. The primer database parameter "-m" of the Janelia version of repp accpets either a single spreadsheet or a folder containing multiple spreadsheets. When your primer database contains hundred or thousands of primers, it can be cumbersome to scroll down to the bottom of the spreadsheet to add new primers. Instead, it is much easier to maintain a small active spreadsheet with one or more archived spreadsheets. The primer database spreadsheet must have the "primer_id" and "sequence" columns, and optionally other columns for additional notes.
In the repp_test example, the "primer_database" folder has two spreadsheets: "1_archived_primer.csv" and "2_active_primer.csv". This is how the "2_active_primer.csv" looks like:
primer_id sequence oS41 ACTTTTCGGGGAAATGTGCG oS42 GTGAGCAAAAGGCCAGCAAA oS43 GTGCCAGTGGTCTCTTGTTG oS44 CTATTACCATGGTGATGCGGTTTTGGCAGTAC oS45 ACTGGATCTCTGCTGTCCCT oS46 GGCATGGACGAGCTGTACAA oS47 TTCAAGTCTGTTCACACGCC oS48 CTTGCAGCAGATTCAGACCC oS49 CCACGTGGGCTTTATCTTCC -
(Optional) Organize your fragment database.
Following the same logic of re-using primers, you may also wish to re-use synthesized fragments. For this purpose, you can organize your synthesized fragment database similarly as the primer database. Similar to the primer database parameter, the synthesized fragment database parameter "-s" can also accpet a single spreadsheet or a folder containing multiple spreadsheets.
In the repp_test example, the "fragment database" folder has two spreadsheets: "1_archived_frag.csv" and "2_active_frag.csv". This is how the "2_active_frag.csv" looks like:
frag_id sequence syn4 atgtca...(long sequence)...ataacc syn5 CAGGGA...(long sequence)...TCAAAG -
Put together the target plasmid sequence in GenBank format.
In the repp_test example, the target plasmid is called "pW256.gb".
We use ApE (A plasmid Editor) to edit and annotate DNA sequences. ApE is a free software written by M. Wayne Davis from University of Utah. You can use whatever software you prefer, but make sure to save it in GenBank format. Note that the default .ape file uses GenBank format and is compatible with
repp
. -
Run the
repp make sequence
command.The simpliest command is
repp make sequence -i pW256.gb
, which uses all available databases (in this case, "igem,addgene,dnasu,lab") and default parameters.To specify the database (e.g., only use local collection "lab"), use
repp make sequence -i pW256.gb -d lab
.To also specify the primer database and a synthesized fragment database, use
repp make sequence -i pW256.gb -d lab -m primer_database -s frag_database
. This result in two csv files: "pW256.output-strategy.csv" and "pW256.output-reagents.csv".The first 5 columns of "pW256.output-strategy.csv" is shown below. Other columns (Match Pct, GC%, 50 low GC%, 50 high GC%, and Homopolymer) can help users decide whether certain fragments may be difficult to PCR or to synthesize.
# 2023/11/08 13:37:36 # Solution 1 # Fragments:5 (3 - pcr 2 - synth) # Cost: 179.510000 Adjusted Cost: 179.510000 Frag ID Fwd Primer Rev Primer Template Size pW256_1_pcr oS44 oS45 pW212 1983 syn5 N/A N/A N/A 350 pW256_3_pcr oS50 oS51 pR92 2876 syn6 N/A N/A N/A 766 pW256_5_pcr oS52 oS53 pW212 5555 The "pW256.output-reagents.csv" is shown below. The priming region and Tm columns can help users optimize PCR amplification conditions.
# Solution 1 Reagent ID Seq Priming Region Tm *oS44 CTATTACCATGGTGATGCGGTTTTGGCAGTAC TGATGCGGTTTTGGCAGTAC 59.12 *oS45 ACTGGATCTCTGCTGTCCCT ACTGGATCTCTGCTGTCCCT 59.96 oS50 GCGGAGTGCAACATCAAAGT GCGGAGTGCAACATCAAAGT 59.41 oS51 GACTGCTTGCCTCCACCAC GACTGCTTGCCTCCACCAC 60.97 oS52 ACGCGTTAAGTCGACAATCA ACGCGTTAAGTCGACAATCA 57.95 oS53 CAAAACCGCATCACCATGGTAATAGCGATGACTAA ACCATGGTAATAGCGATGACTAA 57.2 *syn5 CAGGGA...(long sequence)...CAAAGTG N/A N/A syn6 TGGTGG...(long sequence)...CAATCAA N/A N/A Note that pre-existing primers and synthesized fragments are marked with an asterisk. The IDs of new primers and synthesized fragments are incremented from the last entry of the last spreadsheet of the database.
Sometimes
repp
generates multiple solutions, because some solution may have a larger cost but fewer fragments. You can choose your favorite solution to move forward. -
Order new reagents in the "output-reagents.csv" file.
For primers, copy the first two columns of new primers (those not marked with an asterisk) from the to the active primer database spreadsheet. These can then be ordered from your preferred supplier; our standard choice is IDT.
Likewise, for synthesized fragments, copy the first two columns of the new entries (again, those without an asterisk) to the active synthesized fragment database spreadsheet. You can order these from your favorite DNA fragment provider. We recommend Twist for its exceptionally low cost of 9 cents per base pair.
-
PCR amplification and Gibson Assembly.
Print the table in "output-strategy.csv" for bench work reference.
Follow this protocol for PCR amplification using a high-fidelity DNA polymerase.
Follow this protocol for assembling the fragments using Gibson Assembly.