This section is designed to give you a "very" brief introduction to the command line.
We will explain the commands, flags and special characters you need to be aware of in order to run a Nextflow pipeline.
This is not meant to be exhaustive, but gives you the main info you need to know.
The terminal is where you give instructions to tell the computer what to do!
We use a UNIX terminal, which is an operating system (OS) with a core set of commands/programs available to run code and operate on a computer.
The following commands in this tutorial are all UNIX commands and are the main way to interact with your computer.
These are some additional terms you may hear:
LINUX: A flavor of unix with extensions on the base OS <br> CLI: Command line interface (your terminal) <br> UI: User interface (your desktop layout with click icons)
On a MacOS or LinuxOS machine you will have a UNIX terminal by default. On Windows machines you may have to download a terminal app to have a UNIX feel terminal to run bioinformatics in. Such as the Windows Subsystem for Linux (WSL). For this tutorial we are in a unix compueter, so have a terminal in unix as standard.
These are some basic unix commands that you need to familiarize yourself with:
Help and localisation commands
man
Print the manual of a command
ls
Show all the files in the current directory
pwd
Tell me which directory I am in right now
tree
Print the directory structure
cd
Change directories
(use as cd ./directory
, change to a folder called directory in my current dir)
(or cd ..
, go up one directory)
(or cd -
, back to previous directory)
(or cd
, takes you to home directory)
Creating, moving and removing commands
cp
Move a file/folder (keep original; -r for recursively [also to copy a directory])
mv
Move a file/folder (and delete original)
rm
Remove/Delete a file (-r for recursively [also to remove a directory])
mkdir
Make a new directory
nano
Open the nano command line text editor
(nano file_name
, then exit/save by typing control X, checking the name is correct and entering y
)
Text manipulation commands
wc
Word Count (with flag –l prints the # of lines in a file). By default prints the lines, words, characters
grep
Search for a string/word inside a file and print lines
echo
Prints statement to terminal or prints the contents of a variable ($)
history
Check out all my previous commands
cat
Print all lines or concatenate files (zcat
prints gzipped files (ending .gz
))
head
Print the top lines of a file (-n number of lines)
tail
Print the bottom lines of a file (-n number of lines)
uniq
Print unique lines
sort
Sort a list
Other commands
wget
Copy the contents of a webpage to the current directory (-O
to specify output name)
curl
Copy the contents of a webpage to the current directory (-o
to specify output name)
which
Tell me the path to the script/program (e.g. which perl
)
Extra commands
Extra commands you should know (but not needed in this course):
ssh
Access a remote server/cluster
export
Usually setting an environmental variable
open
Try to open a file type in expected way, e.g. PDF
cut
Allows you to cut out sections of a specified file
gzip
Compresses or Decompresses files (to save space)
chmod
Change the users rights (mode) of a file
(u:users a:all g:group o:other)
(+-)
(r:read w:write x:execute)
(e.g. chmod a+r file
, make all users read file)
-l
long format
-r
reverse order
-a
show hidden files
-h
human readable (size)
-t
sort by time changed
-G
colour the output
-S
sort by size
To know the flags of other commands use man command_name
$
A variable (or a prompt)
>
Save and delete original. If this file already exists, it will delete the original file
>>
Append/Create a file. If this file already exists, it will add to the original file
.
Current directory
..
Directory one level up
/
Folder
-
Flag symbol
~
Home directory
|
Pipe (send output to another command)
*
Wildcard
#
Ignore line
Example:
cat file | sort | uniq > sorted_uniq_file
We read a file, then sort the output, and find the uniq lines, and save to a new file.
A full path start with /
e.g. /workspace/gitpod/eco-flow-training
A relative path starts with .
or ~
e.g. ./data/SRR6357070_1.fastq.gz
Variables in your terminal hold information and use the $
sign to declare them.
Environmental variables are accessible globally (anywhere in your machine) and are normally in capital letters:
PATH – All paths that are accessible
HOME – The base path
NXF_VER – Nextflow version to use
USER – Find out your user name
Try echo
-ing all of the above variables (e.g. echo $PATH
).
$PATH
shows all the locations on you machine (or gitpod environment in our case) that executable files can be found.
If you put executable files in one of these directories, then you don't need to put the full path to the script.
You can add directories to the $PATH using the command export
as follows:
export PATH=$PATH:/workspace/gitpod/eco-flow-training
The above would add the eco-flow-training directory to the $PATH
environmental variable, so any script here will be visible no matter what directory you are in.
bash
A unix command language interpreter (used as: bash my_script.sh
)
perl
A versatile programming language (used as: perl my_script.pl
)
python
A modern versatile programming language
R
A statistical programming language
java
A high-level, object-oriented programming language
Many programs have a shebang (#!
) on their first line. This first line tells unix what langauge the script is. This means you don't need to type the name of the program before running a script.
e.g.
#!/bin/bash
#!/usr/bin/env Rscript
#!/usr/bin/env python3
Now its your turn!
Step 0. Change directory and create new directories:
You can make new directories using the VS code environment by going to the explorer on the left hand side and clicking the new folder button... But we will do this all using the command line.
First, check the location you are in the command line using pwd
, which prints the working directory. Where you are right now.
You can see we are in:
/workspace/gitpod/eco-flow-training
.
Use the ls
command to check what current file and folders we already have in this directory.
Or use the tree
command to see the directory structure.
You can see, we have one directory called data
(containing some fq files for our RNA-Seq turorial later).
Equally you could have typed ls data
to see inside this directory.
Now create a new directory using mkdir
and name it "rnaseq_experiment", as we will use this during the day to run the RNA-Seq experiment. Then cd
into this directory.
Cheat sheet
mkdir command_practice cd command_practice
Now go back one directory to be in /workspace/gitpod/eco-flow-training
Cheat sheet
cd ..
Step 1. Create a new file
Now make a file called list.sh
(.sh indicates it is a unix/bash script) with the following text inside "ls -la" using nano
or another command line text editor (nano instructions are in the help list above, or use man
).
Cheat sheet
nano list.sh
quit nano using Control X
and type y (to agree to exit)
then press enter
Step 2. Run a bash script
Now try to run the bash script you just wrote in the previous exercise.
You execute a script by simply typing its name into the terminal.
It should say:
bash: list.sh: command not found
This is because the command line doesn't know where list.sh is even though its in our current directory.
To execute the script as a command we need to point to the file (in current directory "." execute list.sh):
bash ./list.sh
Again this should fail, because scripts need to be executable. The command line needs to know what to do with this file.
It should say:
bash: ./list.sh: Permission denied
Thats where the chmod
command comes in. Change the mode of the file so it is executable for the user.
chmod u+x ./list.sh
Now we have change the users rights to allow it to be executable (a script).
If you run the script now (list.sh
)
Now if you run the command, it should run:
bash ./list.sh
Step 3. Download a program
In this step, you will download and get the nextflow
command in your terminal.
nextflow
is already pre-downloaded, but we will download it and compile again (just for fun!).
First go to Nextflow to see how to download the program: https://www.nextflow.io/docs/latest/install.html
First you can see that it ask you to check which pre-requisite java
version you have:
java -version
Luckily in this environment we already have java
installed (v17.0.10), so we can skip this.
Next, we can install Nextflow (https://www.nextflow.io/docs/latest/install.html#install-nextflow):
curl -s https://get.nextflow.io | bash
*The above command uses curl (similar to wget) to pull Nextflow from a webserver and run it using bash
*-s is the silent option (to not print all the normal screen warnings).
Now check you have this downloaded file with ls -l
long format, to see the current file modes. Is the file executable?
If it is not, then you can change the mode using chmod
(a: all is default, so is skipped here)
chmod +x nextflow
Now it should be executable. You could run it by running the command ./nextflow
But it is still not the default nextflow. As we mentioned earlier, nextflow is already installed.
We can see that by using:
which nextflow
So to make our new version of the program we want to use, we need to put this script in a exectuable $PATH.
We can check which directories are executable by typing:
echo $PATH
which should give you:
/ide/bin/remote-cli:/opt/conda/bin:/home/gitpod/.local/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
$PATH is a special environmental variable that stores the executable directories. Any script in any of these paths, will be found no matter what directory you are currently in. BUT it will take the first copy of a script that it finds.
So now mv
the copy of nextflow to /ide/bin/remote-cli/nextflow
. Then check where the default nextflow script is using which
. Finally type nextflow info
to see the version of Nextflow you have downloaded.
Cheat sheet
mv nextflow /ide/bin/remote-cli/nextflow which nextflow nextflow info
Step 4 (extra) grep and wc (word count)
Now move to the directory called "exercise".
There is a poem in the file called "cancao_do_exilio".
Using unix commands alone.
- Count the number of lines, words and characters in the file.
- The number of time "palmeiras" is used.
Cheat sheet
wc cancao_do_exilio
#Then
grep palmeiras cancao_do_exilio | wc -l
#or
grep -c palmeiras cancao_do_exilio
Now try to find the line number that contains the word "Deus". (hint, check out the flags on grep
)
Cheat sheet
grep -n Deus cancao_do_exilio
Step 5. Learn to use aliases
In unix you can often have to use the same commands again and again, and this is where aliases come in handy.
alias
is used by assigning another command or set of commands to a single word.
These commands are saved in a file called the .bash_profile
which is normally in your home directory. On gitpod, it is in /workspace/gitpod/.bash_profile
These are a couple of examples, that reside in your .bash_profile
already:
alias lss='ls -al'
# Now lss will list the files in the directory in long form and with hidden files.
alias h1='head -n 1'
# Now h1 will head the top 1 line of a file
Now make your own command to print the last 5 commands you used from history
Cheat sheet
Save the following line in :
/workspace/gitpod/.bash_profile:
alias hist5='history | tail -n 5
Then us the command source
on the /workspace/gitpod/.bash_profile
file to tell unix to add this alias to the command line:
source /workspace/gitpod/.bash_profile file
"hist5" was the name I used, but you can call it whatever command you wish, as long as it doesn't already exist.
Also, try out the other commands lss
and h1
.
Step 6. Save your history
Finally, it is a good idea to save you command history
.
Save your current session command history
and then save it to a file called "my_history.txt"
Then use the VSCode file system, in the browser panel on the left hand side.
Right click the my_history.txt file and select DOWNLOAD, to download the file to your local machine.
Head back to menu -> click here
Head to part 2 -> click here