index.Rmd

---
pagetitle: "Tutorial 8: Python Environments"
author: "Arno Timmer, Jan Verbesselt, Jorge Mendes de Jesus, Aldo Bergsma, Johannes Eberenz, Dainius Masiliunas, David Swinkels, Judith Verstegen, Corné Vreugdenhil"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
  rmdformats::html_clean:
    title: "Tutorial 8: Python Environments"
    theme: "simplex"
    highlight: zenburn
    menu: FALSE
    theme.chooser: TRUE
    highlight.chooser: TRUE
---

```{css, echo=FALSE}
@import url("https://netdna.bootstrapcdn.com/bootswatch/3.0.0/simplex/bootstrap.min.css");
.main-container {max-width: none;}
pre {color: inherit; background-color: inherit;}
code[class^="sourceCode"]::before {
  content: attr(class);
  display: block;
  text-align: right;
  font-size: 70%;
}
code[class^="sourceCode r"]::before { content: "R Source";}
code[class^="sourceCode python"]::before { content: "Python Source"; }
code[class^="sourceCode bash"]::before { content: "Bash Source"; }
```

<font size="6">[WUR Geoscripting](https://geoscripting-wur.github.io/)</font> <img src="https://www.wur.nl/upload/854757ab-168f-46d7-b415-f8b501eebaa5_WUR_RGB_standard_2021-site.svg" alt="WUR logo" style="height: 35px; margin:inherit;"/>

# Tutorial 8: Python Environments

## Introduction

Good afternoon and welcome to the python part of this course! Today we will introduce how we will work with Python during this course and show some alternative methods. If you are unfamiliar with Python and/or feel that you need more training, follow one of the Datacamp courses as introduction into Python *before today*:

* [Introduction to Python](https://www.datacamp.com/courses/intro-to-python-for-data-science) | recommended to follow if you haven't any scripting experience so far
* [Python for R users](https://www.datacamp.com/courses/python-for-r-users) | recommended if you have experience already in R

## Today’s Learning objectives

- Know how to work with virtual environments: *Conda* + *Mamba*
- Know how to run a Python script from the terminal
- Get introduced to Python editors and IDEs
- Refresh Python programming knowledge
- Familiarize yourself with some visualization techniques

# Introduction to Python

Python is a jack-of-all-trades programming language that is free, flexible, open-source, cross-platform and has a very large community behind it. If you ask Python programmers what they like most about Python, they will often cite its high readability and high availability of good packages. There are many Python packages out there for geoscripting, data wrangling, visualization, machine learning and for almost everything else. Relevant packages for this course are for example:

* Geoscripting
    * GeoPandas (Vector Processing)
    * Rasterio (Raster Processing)
    * GDAL/OGR (Vector and Raster Processing)
    * QGIS plugins (Open Source GIS)
    * ArcPy (Propietary GIS)
* Data Handling
    * Pandas (Dataframes and Data Analysis)
    * NumPy (Scientific Computing)
* Visualization
    * Matplotlib (General Graphics)
    * Seaborn (Statistical Graphics)
    * Folium (Interactive Maps)
* Machine Learning
    * scikit-learn (Machine Learning)
    * Keras + TensorFlow (Deep Learning)
    * PyTorch (Deep Learning)

## Python package management with Conda
The high availability of packages is also a threat sometimes. If a piece of software is developed depending on a package, but this packages changes later on, the code might not work anymore. Also, different packages require different dependencies that they are built upon. It is important to make sure all these dependencies are working together and that the right versions are used. Luckily, a set of tools exist for installing and managing Python packages. It is possible to install packages on your main Python installation (called the **base** python interpreter), but sooner or later you will get conflicting Python packages since packages have varying dependencies and you might have installed several versions of the same package. It can even [break your system Python interpreter](https://askubuntu.com/questions/95037/what-is-the-best-way-to-install-python-packages).

Therefor, we recommend to use a Python package manager that uses of virtual environments, such as *Conda* or *Mamba*. That way, you can create a separate environment on your machine for each project. In these environments, any dependencie of the project, such as software, C libraries or R packages can be installed. We will use them here for installing Python packages. Packages installed in one environment do not interfere with your base Python or with other environments. Additionally, it is possible to export and share the requirements for your (open source) project with collaborators or users of your code. 

## Mamba installation
For this course, we will make use of *Mamba*, a fast drop-in reimplementation of the *Conda* package manager. It has its core parts implemented in C++ for maximum efficiency, makes use of parallel downloading of repository data and package files using multi-threading, and uses `libsolv` for (much) faster dependency solving. 
To install *Mamba* in your Linux environment, we have prepared a short *Bash* script for you. Just run the following lines of code, line by line, in a new terminal window.

```{bash, eval=FALSE}
git clone https://github.com/GeoScripting-WUR/InstallLinuxScript.git
cd InstallLinuxScript/user
chmod u+x ./install.sh
./install.sh
```

<!-- This should be kept for external students, TODO: think about how to combine this
```{r, eval=FALSE, engine='bash'}
MINICONDA_VERSION="Miniconda3-latest-Linux-x86_64"
pushd /tmp
curl -O https://repo.continuum.io/miniconda/${MINICONDA_VERSION}.sh
## This installation script will require user input
bash ${MINICONDA_VERSION}.sh
rm ${MINICONDA_VERSION}.sh
popd
```

Scroll down the license with enter. Accept License (i.e. type yes). Use default install location (i.e. press Enter). When prompted to prepend the *Conda* install location to path agree (i.e. type yes). This makes the Python interpreter of *Conda* the standard Python interpreter and allows you to run *Conda* commands from command line. -->

This will install *Mamba* into `~/mamba`. Finally, restart your terminal to be able to use *Mamba* and *Conda* in the terminal. Next, let's see how to use *Mamba* in case you want make new virtual environments by yourself, or install packages after creating the environment.

## Mamba usage
*Mamba* creates isolated conda environments with sets of packages, that do not interfere with your base Python or with other conda environments. To create an environment:

```{r, eval=FALSE, engine='bash'}
mamba create --name geotest python numpy spyder
```

This creates a new environment called *geotest* with *Python*, *NumPy* and *Spyder* installed into the conda environment. Another option is to create an environment from a file, a YAML file. In this file all required packages are listed and if required which version should be used. An example of a YAML file is the following: 
```
name: geotest
dependencies:
  - python
  - numpy
  - spyder
```
The first line defines what the evnironment will be called (`geotest` in this example) and what packages should be installed (python, numpy and spyder). As you can see, this definition of the `geotest` environment is the exact same as the geotest environment as defined before. To create the geotest environment from such a file, save this yaml to a new file named `env.yaml`, or however you want to call it and use the argument `--file` (or `--f` in short):

```{r, eval=FALSE, engine='bash'}
mamba env create -f env.yaml
```


Let's list the currently available environments:

```{r, eval=FALSE, engine='bash'}
mamba info --envs
```

*Mamba* puts an asterisk (\*) in front of the active environment. Now we activate the environment. While *Mamba* replaces *Conda* for most commands, this is not the case for (de)activating environments:

```{r, eval=FALSE, engine='bash'}
# Cross-platform (but not always working, like in our VM, so we use the next option)
conda activate geotest

# Linux, macOS
source activate geotest

# Windows
activate geotest
```

After this, the current environment is shown in parentheses in front of your prompt (`(geotest)$`). Note that the activated environment is only valid for the shell in which you activated it. For instance, if you close the shell window and open a new one you will have to activate it again.

After creating a conda environment, (additional) Python packages can be installed. There are three possible ways to install packages, which we list below.

* Using *Mamba* to install and manage conda packages. This downloads conda packages using conda channels, which are URLs to directories containing the conda packages. **Generally, installing conda packages using *Mamba* is the preferred method.**
* Using *pip* to install packages and *Mamba* to manage these packages. *pip* is available for Windows, macOS and Linux. *pip* can also install [binary wheels on Windows](https://www.lfd.uci.edu/~gohlke/Pythonlibs). You should generally not install packages from *pip* in a conda environment unless it's the last resort. This is because after you use *pip* to modify an environment, you can no longer use `conda`/`mamba` to do so (trying that will break your environment, because *pip* does not communicate its changes to *Conda*). Hence install packages with `mamba` that you can first, and only then use `pip`, and then never touch the environment with `mamba` again (delete and start fresh if you need to).
* Using the distribution's package manager (only on Ubuntu, that is `sudo apt-get install python-*`).

The `mamba search` command searches a set of channels. By default, packages are automatically downloaded and updated from the default channel. To search for a package, type:

```{r, eval=FALSE, engine='bash'}
mamba search pandas
```

This gives a list of all packages that have "pandas" in the name and lists all available versions. To install:

```{r, eval=FALSE, engine='bash'}
mamba install pandas
```

This installs the latest compatible version of _Pandas_. Note that this would install it into your currently activated environment.

Note that you can also install multiple packages at the same time: 

```{r, eval=FALSE, engine='bash'}
mamba install geopandas matplotlib
```

As you saw with *Spyder* (which is an IDE, more on that later), *Mamba* is also able to install some non-Python packages that have Python bindings. This is useful for making sure your Python and binary versions match and do not interfere with the system-wide ones.

Some additional helpful utilities for package management in this context are:

* `mamba list` to check which packages are installed in `root` or in the active environment;
* `python --version` or `gdal-config --version` to check which Python or GDAL version is used in the environment;
* `which spyder` or `type spyder` to find out which *Spyder* executable is used either from system or conda environment.

Removing packages is just as simple:

```{r, eval=FALSE, engine='bash'}
mamba remove geopandas pandas folium
```

Now, we deactivate the environment and return to base environment.

```{r, eval=FALSE, engine='bash'}
# Cross-platform
conda deactivate

# Linux, macOS
source deactivate

# Windows
deactivate
```

When we are finished, and do not need the environment for next time, we can remove the environment `geotest`.

```{r, eval=FALSE, engine='bash'}
mamba remove --name geotest --all
```


## Running a Python script in the terminal

Within a conda environment, Python can be started directly, or can be called to run a script file. To start Python directly:

```{r, eval=FALSE, engine='bash'}
python
```

Now, you can type Python expressions that will be executed one by one:

```{Python,eval=FALSE, engine.path='/usr/bin/python3'}
import sys
print('Good morning, you are running Python:', sys.version)
```

To go back, type:
```{Python,engine.path='/usr/bin/python3', eval=FALSE}
exit()
# or 
quit()
```

Usually, we do not want to run expressions one by one, but build scripts instead, to ensure transferability and reproducibilty. Create a new text file and (re)name it (to) `test.py`. Open it, for example with a text editor, paste in the code you used above (`import sys` etc.), and save the script. Navigate in the terminal to the location where this script is stored, using `cd`. Finally, run the script with:

```{r, eval=FALSE, engine='bash'}
python test.py
```

The output is printed to the terminal. Running a script from the terminal is less error-prone than running it from an IDE (see the next section), such as Spyder, as IDEs often keep variables in memory after the script has finished running. Therefore, running a script from the terminal is a good final test before submitting an exercise or assignment. 

## Python editors and IDEs
There are many Integrated Development Environments (IDEs) for Python, and every programmer has their own preference. An IDE is a software application that provides facilities for software development.

* [Spyder](https://www.spyder-ide.org/) is a lightweight IDE. *In this course, Spyder is the recommended Python IDE.*
* [Jupyter notebook](http://jupyter.org/) integrates visualization with code and is suitable to make tutorials, simple dashboards, quick visualizations, and do prototype testing. Jupyter Notebook run in your browser on a localhost server or on a web server. They allow for various programming languages, e.g. Python, R, Julia, Spark or PySpark. 
* [PyCharm Community Edition](https://www.jetbrains.com/help/pycharm/install-and-set-up-pycharm.html) is a free professional Python IDE with a lot of advanced functionality, such as integrated GIT version control, code completion, code checking, debugging and navigation. This IDE can optionally be used by more advanced scripters during this course instead of Spyder, but do know that you will not be assisted for solving IDE-related issues.

### Spyder

Spyder is a IDE for developing python mainly for scientific purposes. Fun fact, it is [completely written in python](https://github.com/spyder-ide/spyder)! Spyder is a very complete IDE that looks a bit like Rstudio. It shows the variables present in the current session, it has a code editor, a console and a figures pane in the main view. 

The [Spyder IDE](https://docs.spyder-ide.org/) can be started in a terminal when the *Spyder* package is installed in the active conda environment. So, using *Mamba*, make an environment and install Spyder to that environment. Activate the environment. Spyder will automatically make use of the Python interpreter of the active conda environment. To start Spyder:

```{r, eval=FALSE,engine='bash'}
spyder
```

In Spyder you should see an editor, a file explorer and a console. Have a look at the toolbar. Some important shortcuts are:

* F5 to run your script
* CTRL + S to save your script
* CTRL + 1 to comment/uncomment your code
* TAB to indent your code
* SHIFT + TAB to unindent your code

Open a new file and save it somewhere as `main.py` (File -- > New File --> Save As). Test writing a few lines of code and running the script.


### Jupyter Notebook

Jupyter Notebook is actually not a IDE but it is very useful for writing code. Jupyter stands for the languages that once can use (*JU*lia, *PY*thon and *R*) and notebooks means that they are actually files instead of an IDE (such as Rstudio or Spyder). The notebooks can be interpreted and run by varying interpreters of which we will cover two later on. Jupyter Notebook integrates code and visualization, and are therefore very helpful for demonstration purposes and to be run by online interpreters (such as Google Colab). First we will show how to run Jupyter Notebook locally. To do this install `jupyter` and the module `folium` in an existing or new environment that includes Python. To start Jupyter type:

```{r, eval=FALSE, engine='bash'}
jupyter notebook
```

Jupyter should pop up in your browser. Note that although jupyter is opened in your browser, internet is not used, the code is interpreted and run locally. You will see a menu with all files in your working directory. The Jupyter Notebook will only see files that are accessible from the working directory in which you launched the notebook!

Make a new folder: *New* → *Folder*, rename the folder (check the box next to the new 'Untitled Folder' and click **'Rename'** in the top) and, in this folder, create a new Python3 Jupyter Notebook *New* → *Python 3*. Give your notebook a name by clicking on *untitled*. Note that this creates a file with the extension *.ipynb*, short for 'Interactive Python Notebook', which is the file format of Jupyter Notebook. 

Feel free to have a go at the user interface tour (*Help* → *User Interface Tour*), or hover over the toolbar to check out the tools. The main tools are:

- _Save and checkpoint_
- _Insert cell below_
- _Run_
- _Code/Markdown/Heading_ (List box)

Similar to RMarkdown, Jupyter Notebooks have code cells (*Code*) and text cells (*Markdown*). Insert two extra cells by clicking the + button and change the first cell from code to markdown. Enter some documentation for your code (e.g. your team name, exercise and date). Leave the other cell on code. 

Type the following Python code in the code cell:

```{Python,engine.path='/usr/bin/python3', eval=FALSE}
import folium

m = folium.Map(location=[51.9700000, 5.6666700], zoom_start=13)
m
```

Run the code cell by selecting it and pressing the *Run* button, or press *CTRL + Enter* or *Shift + Enter*. You'll see a map visualized below your code, similar to the one below. Try to drag the map to play around with it. 

<img src="images/WUR_Basic_Folium_Map.png" alt="Wageningen University Basic Folium map" width="100%"></img>

Your Jupyter Notebook is automatically saved as an `.ipynb` file on your computer. The notebook can be downloaded as a Python script, pdf or html. You can also save it manually. 

To exit a notebook properly, use *File* → *Close and Halt*. After that, by pressing Ctrl + c in the terminal where Jupyter Notebook server is running, you cancel the running process. The terminal goes back to command line and you can exit the virtual environment by typing `conda deactivate`.

```{r, engine='bash', eval=FALSE}
conda deactivate
```
### Google Colab
As said before, Jupyter is locally opened in your browser. It does not connect to the internet, but it does show the possibilities, one could create something online that can run your notebooks for you on the cloud. This is exactly what Google does with Google Colab. Google Colab is a cloud service that allows you to run your Jupyter notebooks on the Google cloud for free. Let's see what this looks like: 

* Go to https://colab.research.google.com/notebooks/empty.ipynb (note the similaritie and differences between Jupyter locally and on Google Colab);
* Type `!pip install folium` and press ctrl+enter to run and  install folium;
* In a new cell run the same python code as locally to create and show a new folium map.

For this course we will rarely use Jupyter Notebook and or Google Colab, but it is good to know they exist. Especially Google Colab is being used more and more in the scientific community and you are likely to come across these during other courses. 

# Putting it to the test

## Setting up the environment

Now that we know how to set up an environment and run code, lets use this new knowledge and run some Python code. Again, During this course advise you to code in Spyder, as this IDE is the recommended IDE for the Python part of this course. To practice you might also want to try out Jupyter locally and Google Colab to run the same code.

First, make a directory structure for this tutorial:

```{r, eval=FALSE,engine='bash'}
cd ~/Documents/
mkdir PythonRefresher #or give the directory a name to your liking
cd ./PythonRefresher
mkdir output
```

We only make a directory for output, because no input data or separate scripts are created in this tutorial. Next, we will create a conda environment from a file. First create a text file in your preferred text editor, e.g. `gedit`. Then, (re)name it (to) `refresher.yaml`, and copy the following content into the file:

```
name: refresher
dependencies:
  - python
  - numpy
  - matplotlib
  - geopandas
  - spyder
```

Now, create a new conda environment based on this file:

```{r, eval=FALSE, engine='bash'}
mamba env create --file refresher.yaml
```

Once everything is installed, activate the environment and start Spyder:

```{r, eval=FALSE, engine='bash'}
source activate refresher
spyder
```

Create a new Python script and save it. 

Important to note: for compatibility, it is best to install packages from the same channel as much as possible. Given that packages in the file `refresher.yaml` are installed from the `conda-forge` channel, it is wise to use this same channel when you want to install additional packages in your environment. 


## Quick refresher
In the tutorial about R and Python we have gone over the differences and similarities of python and R. This tutorial also contains some basic python syntax, in this tutorial we assume you know this content, but we will go over a few basics here as well. The examples below are mostly meant for reference purposes, we  assume you understand most of this refresher already. 

### Printing and basic data types
In Python we assign variable using the equals sign (`=`):

Printing in Python is done using the `print` function. We can print variables directly: 

```{r, engine = 'Python', eval=FALSE}
# Integer
age = 25

# Float
height = 1.75

# String
name = "John Doe"

# Boolean
is_student = True

# Print a name
print(name)
```

We can use string formatting to use flexible strings, for example for printing. to start a formatted string, we put a `f` before the string. We can use curly brackets `{}` in this formatted string. The text between these curly brackets is executed as regular Python code. 

```{r, engine = 'Python', eval=FALSE}
# String formatting and printing 
print(f'{name} is {age} years old and is {height} meters tall.')
```

### Basic arithmetic operations:
```{r, engine = 'Python', eval=FALSE}
a = 10
b = 5

addition = a + b
subtraction = a - b
multiplication = a * b
division = a / b
modulo = a % b
exponentiation = a ** b

print(addition, subtraction, multiplication, division, modulo, exponentiation)
```


### Conditional statements
```{r, engine = 'Python', eval=FALSE}
x = 15

if x > 10:
    print("x is greater than 10")
elif x == 10:
    print("x is equal to 10")
else:
    print("x is less than 10")
```


### Loops (for and while)
```{r, engine = 'Python', eval=FALSE}
# For loop
for i in range(5):
    print(i)

# While loop
count = 0
while count < 5:
    print(count)
    count += 1
```

### Lists and basic list operations
```{r, engine = 'Python', eval=FALSE}
# Creating a list
fruits = ["apple", "banana", "orange"]

# Accessing elements
print(fruits[0])  # Output: "apple"

# Adding elements
fruits.append("grape")

# Removing elements
fruits.remove("banana")

# Length of the list
print(len(fruits))  # Output: 3
```


### Functions
```{r, engine = 'Python', eval=FALSE}
# Function to add two numbers and return the result
def add_numbers(a, b):
    return a + b

result = add_numbers(5, 3)
print(result)  # Output: 8
```


### Dictionaries
```{r, engine = 'Python', eval=FALSE}
# Creating a dictionary
person = {
    "name": "Alice",
    "age": 30,
    "is_student": False
}

# Accessing values
print(person["name"])  # Output: "Alice"

# Adding a new key-value pair
person["occupation"] = "Engineer"

# Removing a key-value pair
del person["is_student"]
```


### Importing packages
Python is used by a very large community, as is said before. One of the reasons for this is that this entire community builds a lot of (open source) packages. It is therefor very useful to be able to build upon these packages. In R you have worked a with *dataframes* and *spatial dataframes*. In Python these are not standard datatypes, but they are implemented in very well known packages called `Pandas` and its spatial counterpart `GeoPandas.` We will go in much more detail during the Python-Vector tutorial but we will introduce them quickly here. 

In Python we import a package using the `import` statement (instead of th the `library` function in R) . For example importing the pandas package goes as follows:

```{r, engine = 'Python', eval=FALSE}
import pandas as pd
```

As you can see we can import a package *as* something. We use this if we want to point at specific functionality of this package. If we want to point at for example the `read_csv` function from pandas we we call `pd.read_csv`. This function is also implemented in other packages, but now we are sure we use the pandas version of this function. Importing pandas is a convention, used very widely in the python community. 

We can create a `dataframe` as follows: 
```{r, engine = 'Python', eval=FALSE}
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22],
    'City': ['New York', 'San Francisco', 'Chicago']
}

df = pd.DataFrame(data)
print(df)
```   

We can access some information from this `dataframe` as follows:

```{r, engine = 'Python', eval=FALSE}
# Display the first few rows of the DataFrame
print(df.head())

# Get statistical information about the DataFrame
print(df.describe())

# Access a specific column
print(df['Age'])
```   

### GeoDataFrame
The spatial counterpart of a `dataframe` is a 'GeoDataFrame', which we normally import *as* `gpd`: 
```{r, engine = 'Python', eval=FALSE}
import geopandas as gpd

# Dummy data for the GeoDataFrame
data = {
    'Name': ['Location A', 'Location B', 'Location C'],
    'Latitude': [40.7128, 34.0522, 41.8781],
    'Longitude': [-74.0060, -118.2437, -87.6298]
}

# Create the GeoDataFrame with a single line of code
gdf = gpd.GeoDataFrame(data, geometry=gpd.points_from_xy(data['Longitude'], data['Latitude']))

# Display the GeoDataFrame
print(gdf)
```


# Python help

There are several ways to find help with programming in Python. Searching the internet typically solves your problem the quickest, because it finds answers on multiple platforms, such as StackOverflow and GitHub. During Geoscripting we have the forum to ask and give help. Asking your friends or colleagues in person is also a great way to learn and fix programming problems. Another good option is get documentation from the package website or inside Python:

```{Python,engine.path='/usr/bin/python3', eval=FALSE}
import sys
help(sys)
```

See how the objects and functions in the `sys` package got listed.

```{block, type="alert alert-success"}
> **Question 4**: What kind of functionality does the `sys` package provide?
```

# More info
- [Official Python tutorial](https://docs.Python.org/3/contents.html)
- [Python Style guide ](https://www.python.org/dev/peps/pep-0008/)
- [Python 3 Cheatsheet](https://ugoproto.github.io/ugo_py_doc/py_cs/)
- [Overview Python package Cheatsheets](https://www.datacamp.com/community/data-science-cheatsheets?tag=python)