Skip to content

Commit

Permalink
added command line refresher (#28)
Browse files Browse the repository at this point in the history
* added link to command line refresher

* reknit the changes

* added project management and RStudio set up

* added project structure and file paths explanation

* updated paths explanation

* spelling
  • Loading branch information
elikreuz authored Sep 1, 2023
1 parent 5980cc1 commit 91b2eb8
Show file tree
Hide file tree
Showing 3 changed files with 363 additions and 20 deletions.
Binary file added figs/projectStruture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
105 changes: 99 additions & 6 deletions index.Rmd
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
pagetitle: "Tutorial 1: Linux & version control"
pagetitle: "Linux & version control"
author: "Loïc Dutrieux, Jan Verbesselt, Johannes Eberenz, Dainius Masiliūnas"
date: "`r format(Sys.time(), '%Y-%m-%d')`"
output:
Expand Down Expand Up @@ -33,11 +33,11 @@ code[class^="sourceCode bash"]::before { content: "Bash Source"; }
If you are a student following the course at Wageningen University (WUR), **please read** the information in the course guide in Teams and on [Brightspace](https://brightspace.wur.nl). All course-specific information and exercises can be found there. Information in the course guide overrules any information written in these pages, so **please read it carefully** and **check it often**. You will also find all the information on deliverables and exercises there.
```

# Week 1, Tutorial 1: Linux & version control
# Linux & version control

## Introduction

Welcome to the Geoscripting course! Today we will get familiar with Linux, which is an advanced environment optimised for scripting, and with version control software that helps you collaborate with one another and keep track of your file versions. These tools are very important, as we will use them throughout the course for all course activities, and they will continue to be very useful after the end of the course for all your scripting work.
Welcome to the Geoscripting course! Today we will get familiar with Linux, which is an advanced environment optimised for scripting, and with version control software that helps you collaborate with one another and keep track of your file versions. These tools are very important, as we will use them throughout the course for all course activities, and they will continue to be very useful after the end of the course for all your scripting work. Additionally you will learn about project structure, and familiarize yourself with RStudio.

```{block type="alert alert-info"}
Throughout the whole course, we will be working in a Linux environment, and all of **the material has only been tested on (and assumes) a Linux environment**. Every WUR student will get access to a Linux virtual machine.
Expand All @@ -51,6 +51,8 @@ At the end of the tutorial, you should be able to:
* Explain why software licenses are important and what software license options there are
* Apply a software license to your own code
* Use version control to develop, maintain, and share your code with others
* Set up a project structure
* Get familiar with (relative) paths
* Submit an exercise using Git and GitLab

# Linux
Expand Down Expand Up @@ -118,9 +120,10 @@ Have you ever worked on a project and ended up having so many versions of your w

The video below explains some basic concepts of version control and what the benefits of using it are.

<center>
<iframe src="https://player.vimeo.com/video/41027679" width="500" height="300" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
<p><a href="https://vimeo.com/41027679">What is VCS? (Git-SCM) &bull; Git Basics #1</a> from <a href="https://vimeo.com/github">GitHub</a> on <a href="https://vimeo.com">Vimeo</a>.</p>

</center>

So to sum up, version control allows to keep track of:

Expand All @@ -143,7 +146,8 @@ Additionally, version control:

The three most popular version control software are **Git**, **Mercurial** (abbreviated as hg) and **Subversion** (abbreviated as svn). *Git* is by far the most modern and popular one, so we will only use *Git* in this course.

## Git <img src="figs/Git_logo.png" alt="git" style="width: 80px"/>
## Git
<img src="figs/Git_logo.png" alt="git" style="width: 80px"/>

### What git does

Expand Down Expand Up @@ -179,7 +183,9 @@ Effective use of git includes two components: local software to manage the files
In this course, we will primarily use Git GUI as the client. It is a simple client that is included with Git itself, and is language-agnostic. There are more graphical clients as well, including one integrated into RStudio itself, but these clients are outside the scope of this course. Note that Git is language-agnostic, and we will be using it with both R and Python, so it's best to learn the language-neutral GUI, rather than an R-specific GUI.

```{block, type="alert alert-info"}
**Protip**: For those who are comfortable with working from the Linux terminal, the command line client is often the most efficient choice. Knowing how to use git from the command line is also useful when working on cloud virtual machines/servers for big data processing. So in protip boxes like this you will find command line equivalents to the GUI actions we will perform. Choose whichever way you find the most convenient for yourself.
**Protip**: For those who are comfortable with working from the Linux terminal, the command line client is often the most efficient choice. Knowing how to use git from the command line is also useful when working on cloud virtual machines/servers for big data processing. So in pro-tip boxes like this you will find command line equivalents to the GUI actions we will perform. Choose whichever way you find the most convenient for yourself.
In the [next tutorial](https://geoscripting-wur.github.io/Intro2Linux/) we will introduce you to more command line commands.
In addition, while this tutorial describes how to use the built-in `git-gui`, you can also use the more modern `git-cola`, which may be more intuitive to you. The general steps are the same as in `git-gui`. You can find Git Cola in the applications menu (or else you may need to install it).
```
Expand Down Expand Up @@ -429,6 +435,93 @@ You can also browse the history of a repository from your Git hosting service, a

That's it: now you know how to keep track of all your files, so you will never lose them again, and no longer have to worry about making backups or saving multiple versions. In addition, this is the way that free and open-source code development happens in actuality. Also, the exercises and assignments in the course will be delivered and submitted this way, so make sure you are familiar with the whole process!

# Project structure

Another beneficial tool for scripting work is maintaining a consistent project structure as it will make it easier for you to switch from one project to the other and immediately understand how things work. In most cases, the project structure is entirely up to you. In some cases, however, the structure may be somewhat mandated, for instance, if you wish to make a package. For this course we will be following the structure below:

![Project Structure Schema](figs/projectStruture.png)

* A `main` script at the root of the project. This script performs step by step the different operations of your project. It is the only non-generic part of your project (it contains paths, already set variables, etc). The file extension of this file will depend on what language you are using for your project.
* As we will be working with multiple languages throughout this course we will keep things organized by placing the scripts into their respective language sub-directories (`R/`, `Python/`, and `Bash/`). These directories should contain the functions you have defined as part of your project. These functions should be as generic as possible and are *sourced* and called by the `main` script. The way this is done depends on the language used by the `main` script. For example in R you would write `source("R/myfunction.R")`. Whereas in Python you would use `import Python.myfunction`. You will see this in action in later tutorials.
* A `data/` sub directory: This directory contains data sets of the project. Since Git is not as efficient with non-text files, and GitLab has storage limits, you should only put small data sets in that directory (<2-3 MB). These can be geopackages, small rasters, csv files.
* An `output/` sub directory (when applicable). This should not be tracked by git, since your scripts create the output, so there is no need to store it.
* A `README.md` file should be included, this file should contain a description of your project, its name and the name of the authors, along with a description of what other packages your package needs to function correctly.
* Finally, as you learned at the beginning of this tutorial, you should include a `LICENSE.txt` file with the software licence which you would like your code to have.

### Example `main` file

Typically the header of your main script will look like the following:

#### in R `main.R`

```{r, eval=FALSE}
# Team Teamname (John Doe and Jane Smith)
# January 2020
# Import packages
library(raster)
library(sf)
# Source functions
source('R/function1.R')
source('R/function2.R')
# Load datasets
postboxes <- st_read('data/postbox_locations.gpkg')
# Then the actual commands
```

#### and now in Python

```{python, eval=FALSE}
# Team Teamname (John Doe and Jane Smith)
# January 2020
# Import packages
import geopandas as gpd
import matplotlib.pyplot at plt
# Import functions
import Python.functions as funcs
# Load datasets
postboxes = gpd.read_file('data/postbox_locations.gpkg')
# Then the actual commands
```

## Working directory, relative and absolute file paths

*At the end of the following section you should be able to explain the difference between the following:*

* relative path
* absolute path
* working directory,
* And the following special directories:
* ` . `
* ` .. `
* ` / ` or the **root** directory

In the R and Python examples above we load the datasets by indicating the file location from the data directory `"data/postbox_locations.gpkg"`. However, you may have many data folders on your computer, for all types of different projects. So how does the system know to look in the correct one? Moreover, if you share your script with a friend the location of their project and data folders will be different than that of your setup. It would be a nuisance if they had to change all references to these files in their script. To deal with these issues, we use **relative file paths**.

In **relative** file paths, we don't include the location of the project (**working**) directory itself, these paths are *relative* to the *working directory* (the "Project_Structure" folder). In the example above, the **relative** file path for the post box locations file would be `data/postbox_locations.gpkg`, whereas the **absolute** file path would be `"/home/osboxes/Geoscripting/Project_Structure/data/postbox_locations.gpkg"`. The **absolute** file path refers to a file from the **root** of the entire file system. On Linux (and other UNIX-like systems like macOS), absolute file paths **always** start with ` / `.

```{block, type="alert alert-danger"}
**Note**: on Windows, you might see a backslash (`\\`) being used as a path separator instead of a slash (`/`). **Don't do this**! In many languages, including R, a backslash denotes an [escape sequence](https://en.wikipedia.org/wiki/Escape_sequence). In addition, a backslash is not a valid path separator on non-Windows platforms, whereas both a slash and a backslash are valid on Windows. So save yourself the trouble and **always use a slash as a path separator**!
```

So what is the **working directory**? By convention, the working directory is the same as the location of the script which you are working on. This means that you can simply assume that whoever runs your script, will run it from the directory that your script is located.

```{block, type="alert alert-danger"}
**Note**: this also means that when you test others' code, you should also make sure to run it from the directory that the script is located, unless stated otherwise!
```

To refer to directories or files that are within the working directory, we simply use their names. So to refer to a directory called `R` in our working directory, we type `R`. To refer to a file or directory within another directory, we type the name of the directory, a slash, and then the name of the file/directory, for instance, `R/function1.R`.

If we want to refer to a file or directory that is above the indicated directory, we use the special directory ` .. `. For instance, if our `main.R` is not in our project root, but located in the sub-directory `demo` (therefore our working directory is `demo`), we would refer to our `function1.R` file as `../R/function1.R`. Another special directory is ` . `, which refers to the indicated directory itself.

When making Git repository, you want to make sure that all your code is *portable* and self-contained, i.e. you can run it from any computer, and ideally using any operating system. That means that as a rule of thumb you should **always use relative file paths** in your scripts.

```{block, type="alert alert-success"}
> **Question 4**: what would be the location of this file: `././R/./.././R/./././function2.R`? How about `/./R/./.././R/./././function2.R`? What would be the meaning of `C:/Windows/cmd.exe` on Linux? Is it a relative or absolute file path?
```


# References

* Great 15 min interactive git commands tutorial: [try.github.io](https://try.github.io)
Expand Down
Loading

0 comments on commit 91b2eb8

Please sign in to comment.