diff --git a/python-data/notebooks/ex01_numpy_arrays.ipynb b/python-data/notebooks/ex01_numpy_arrays.ipynb
index 1a3cc78..c74be41 100644
--- a/python-data/notebooks/ex01_numpy_arrays.ipynb
+++ b/python-data/notebooks/ex01_numpy_arrays.ipynb
@@ -143,7 +143,7 @@
"\n",
"- Create an array of shape (2, 3, 4) of zeros\n",
"- Create an array of shape (2, 3, 4) of ones.\n",
- "- Create an array with values 0 to 999 using the `np.arrange` function."
+ "- Create an array with values 0 to 999 using the `np.arange` function."
]
},
{
diff --git a/python-data/slides/02_python_text_formats.pdf b/python-data/slides/02_python_text_formats.pdf
index 655cb24..bec9544 100644
Binary files a/python-data/slides/02_python_text_formats.pdf and b/python-data/slides/02_python_text_formats.pdf differ
diff --git a/python-data/slides/02_python_text_formats.pptx b/python-data/slides/02_python_text_formats.pptx
index 099d47b..7f8b006 100644
Binary files a/python-data/slides/02_python_text_formats.pptx and b/python-data/slides/02_python_text_formats.pptx differ
diff --git a/python-data/slides/04_binary_formats.pdf b/python-data/slides/04_binary_formats.pdf
index 75ad03b..f22cd0a 100644
Binary files a/python-data/slides/04_binary_formats.pdf and b/python-data/slides/04_binary_formats.pdf differ
diff --git a/python-data/slides/04_binary_formats.pptx b/python-data/slides/04_binary_formats.pptx
index 1859e30..bbe2d01 100644
Binary files a/python-data/slides/04_binary_formats.pptx and b/python-data/slides/04_binary_formats.pptx differ
diff --git a/python-data/slides/05_netcdf_overview.pdf b/python-data/slides/05_netcdf_overview.pdf
index 1fa0bb7..ae55f18 100644
Binary files a/python-data/slides/05_netcdf_overview.pdf and b/python-data/slides/05_netcdf_overview.pdf differ
diff --git a/python-data/slides/05_netcdf_overview.pptx b/python-data/slides/05_netcdf_overview.pptx
index 6c98780..06e2b0a 100644
Binary files a/python-data/slides/05_netcdf_overview.pptx and b/python-data/slides/05_netcdf_overview.pptx differ
diff --git a/python-data/slides/07_ncgen_ncdump_cdl.pdf b/python-data/slides/07_ncgen_ncdump_cdl.pdf
index ef55bad..923e50c 100644
Binary files a/python-data/slides/07_ncgen_ncdump_cdl.pdf and b/python-data/slides/07_ncgen_ncdump_cdl.pdf differ
diff --git a/python-data/slides/07_ncgen_ncdump_cdl.pptx b/python-data/slides/07_ncgen_ncdump_cdl.pptx
index ed5f216..1f2837e 100644
Binary files a/python-data/slides/07_ncgen_ncdump_cdl.pptx and b/python-data/slides/07_ncgen_ncdump_cdl.pptx differ
diff --git a/python-data/slides/09_cfchecker.pdf b/python-data/slides/09_cfchecker.pdf
index 6f0e118..b69121a 100644
Binary files a/python-data/slides/09_cfchecker.pdf and b/python-data/slides/09_cfchecker.pdf differ
diff --git a/python-data/slides/09_cfchecker.pptx b/python-data/slides/09_cfchecker.pptx
index 8b3cc34..7dd9523 100644
Binary files a/python-data/slides/09_cfchecker.pptx and b/python-data/slides/09_cfchecker.pptx differ
diff --git a/python-data/solutions/ex01_numpy_arrays_solutions.ipynb b/python-data/solutions/ex01_numpy_arrays_solutions.ipynb
index 8947b66..09ec90c 100644
--- a/python-data/solutions/ex01_numpy_arrays_solutions.ipynb
+++ b/python-data/solutions/ex01_numpy_arrays_solutions.ipynb
@@ -173,7 +173,7 @@
"\n",
"- Create an array of shape (2, 3, 4) of zeros and print.\n",
"- Create an array of shape (2, 3, 4) of ones and print.\n",
- "- Create an array with values 0 to 999 using the `np.arrange` function and print."
+ "- Create an array with values 0 to 999 using the `np.arange` function and print."
]
},
{
diff --git a/python-data/solutions/ex05_pandas.ipynb b/python-data/solutions/ex05_pandas.ipynb
new file mode 100644
index 0000000..47c1975
--- /dev/null
+++ b/python-data/solutions/ex05_pandas.ipynb
@@ -0,0 +1,1680 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "1a4fb627-2190-43ac-b4b5-93352cd23aa4",
+ "metadata": {},
+ "source": [
+ "# Working with Pandas DataFrames in Python\n",
+ "\n",
+ "Part of: Data Analysis and Visualization in Python for EcologistsData Analysis and Visualization in Python for Ecologists (**Data Carpentry**)\n",
+ "\n",
+ "From: https://datacarpentry.org/python-ecology-lesson/02-starting-with-data/index.htmlhttps://datacarpentry.org/python-ecology-lesson/02-starting-with-data/index.html\n",
+ "\n",
+ "teaching: 30 mins\n",
+ "exercises: 30 mins\n",
+ "\n",
+ "Questions:\n",
+ "- \"How can I import data in Python?\"\n",
+ "- \"What is Pandas?\"\n",
+ "- \"Why should I use Pandas to work with data?\"\n",
+ " \n",
+ "Objectives:\n",
+ "- \"Navigate the workshop directory and download a dataset.\"\n",
+ "- \"Explain what a library is and what libraries are used for.\"\n",
+ "- \"Describe what the Python Data Analysis Library (Pandas) is.\"\n",
+ "- \"Load the Python Data Analysis Library (Pandas).\"\n",
+ "- \"Use `read_csv` to read tabular data into Python.\"\n",
+ "- \"Describe what a DataFrame is in Python.\"\n",
+ "- \"Access and summarize data stored in a DataFrame.\"\n",
+ "- \"Define indexing as it relates to data structures.\"\n",
+ "- \"Perform basic mathematical operations and summary statistics on data in a Pandas DataFrame.\"\n",
+ "- \"Create simple plots.\"\n",
+ " \n",
+ "Key points:\n",
+ "- \"Libraries enable us to extend the functionality of Python.\"\n",
+ "- \"Pandas is a popular library for working with data.\"\n",
+ "- \"A Dataframe is a Pandas data structure that allows one to access data by column (name or index) or row.\"\n",
+ "- \"Aggregating data using the `groupby()` function enables you to generate useful summaries of data quickly.\"\n",
+ "- \"Plots can be created from DataFrames or subsets of data that have been generated with `groupby()`.\"\n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3c62fdbe-6cd7-473a-b1d8-288bd5dec187",
+ "metadata": {},
+ "source": [
+ "We can automate the process of performing data manipulations in Python. It's efficient to spend time\n",
+ "building the code to perform these tasks because once it's built, we can use it\n",
+ "over and over on different datasets that use a similar format. This makes our\n",
+ "methods easily reproducible. We can also easily share our code with colleagues\n",
+ "and they can replicate the same analysis.\n",
+ "\n",
+ "### Starting in the same spot\n",
+ "\n",
+ "To help the lesson run smoothly, let's ensure everyone is in the same directory.\n",
+ "This should help us avoid path and file name issues. At this time please\n",
+ "navigate to the workshop directory. If you are working in Jupyter Notebook be sure\n",
+ "that you start your notebook in the workshop directory.\n",
+ "\n",
+ "A quick aside that there are Python libraries like [OS Library][os-lib] that can work with our\n",
+ "directory structure, however, that is not our focus today."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d3fd3a90-3455-4df0-a4f4-9a327d81b177",
+ "metadata": {},
+ "source": [
+ "### Our Data\n",
+ "\n",
+ "For this lesson, we will be using the Portal Teaching data, a subset of the data\n",
+ "from Ernst et al\n",
+ "[Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal,\n",
+ "Arizona, USA][ernst].\n",
+ "\n",
+ "We will be using files from the [Portal Project Teaching Database][pptd].\n",
+ "This section will use the `surveys.csv` file that can be downloaded here:\n",
+ "[https://ndownloader.figshare.com/files/2292172][figshare-ndownloader]\n",
+ "\n",
+ "We are studying the species and weight of animals caught in sites in our study\n",
+ "area. The dataset is stored as a `.csv` file: each row holds information for a\n",
+ "single animal, and the columns represent:\n",
+ "\n",
+ "| Column | Description |\n",
+ "|------------------|------------------------------------|\n",
+ "| record_id | Unique id for the observation |\n",
+ "| month | month of observation |\n",
+ "| day | day of observation |\n",
+ "| year | year of observation |\n",
+ "| plot_id | ID of a particular site |\n",
+ "| species_id | 2-letter code |\n",
+ "| sex | sex of animal (\"M\", \"F\") |\n",
+ "| hindfoot_length | length of the hindfoot in mm |\n",
+ "| weight | weight of the animal in grams |\n",
+ "\n",
+ "\n",
+ "The first few rows of our first file look like this:\n",
+ "\n",
+ "```\n",
+ "record_id,month,day,year,plot_id,species_id,sex,hindfoot_length,weight\n",
+ "1,7,16,1977,2,NL,M,32,\n",
+ "2,7,16,1977,3,NL,M,33,\n",
+ "3,7,16,1977,2,DM,F,37,\n",
+ "4,7,16,1977,7,DM,M,36,\n",
+ "5,7,16,1977,3,DM,M,35,\n",
+ "6,7,16,1977,1,PF,M,14,\n",
+ "7,7,16,1977,2,PE,F,,\n",
+ "8,7,16,1977,1,DM,M,37,\n",
+ "9,7,16,1977,1,DM,F,34,\n",
+ "```\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "0e8a0333-7d52-4739-a9d7-ee2388a07033",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Download the file\n",
+ "import requests\n",
+ "url = \"https://ndownloader.figshare.com/files/2292172\"\n",
+ "content = requests.get(url).text\n",
+ "\n",
+ "datafile = \"surveys.csv\"\n",
+ "with open(datafile, \"w\") as csv:\n",
+ " csv.write(content)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "f73c654d-65fc-40bc-a02f-b708eec455dd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6c9ee6ef-0b55-404e-8757-678e7a760dd1",
+ "metadata": {},
+ "source": [
+ "Each time we call a function that's in a library, we use the syntax\n",
+ "`LibraryName.FunctionName`. Adding the library name with a `.` before the\n",
+ "function name tells Python where to find the function. In the example above, we\n",
+ "have imported Pandas as `pd`. This means we don't have to type out `pandas` each\n",
+ "time we call a Pandas function.\n",
+ "\n",
+ "\n",
+ "# Reading CSV Data Using Pandas\n",
+ "\n",
+ "We will begin by locating and reading our survey data which are in CSV format. CSV stands for\n",
+ "Comma-Separated Values and is a common way to store formatted data. Other symbols may also be used, so\n",
+ "you might see tab-separated, colon-separated or space separated files. It is quite easy to replace\n",
+ "one separator with another, to match your application. The first line in the file often has headers\n",
+ "to explain what is in each column. CSV (and other separators) make it easy to share data, and can be\n",
+ "imported and exported from many applications, including Microsoft Excel. For more details on CSV\n",
+ "files, see the [Data Organisation in Spreadsheets][spreadsheet-lesson5] lesson.\n",
+ "We can use Pandas' `read_csv` function to pull the file directly into a [DataFrame][pd-dataframe].\n",
+ "\n",
+ "## So What's a DataFrame?\n",
+ "\n",
+ "A DataFrame is a 2-dimensional data structure that can store data of different\n",
+ "types (including characters, integers, floating point values, factors and more)\n",
+ "in columns. It is similar to a spreadsheet or an SQL table or the `data.frame` in\n",
+ "R. A DataFrame always has an index (0-based). An index refers to the position of\n",
+ "an element in the data structure."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "eb36110d-ebc1-4376-8e13-abe486c128f9",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
record_id
\n",
+ "
month
\n",
+ "
day
\n",
+ "
year
\n",
+ "
plot_id
\n",
+ "
species_id
\n",
+ "
sex
\n",
+ "
hindfoot_length
\n",
+ "
weight
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
1
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
2
\n",
+ "
NL
\n",
+ "
M
\n",
+ "
32.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
2
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
3
\n",
+ "
NL
\n",
+ "
M
\n",
+ "
33.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
3
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
2
\n",
+ "
DM
\n",
+ "
F
\n",
+ "
37.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
4
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
7
\n",
+ "
DM
\n",
+ "
M
\n",
+ "
36.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
5
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
3
\n",
+ "
DM
\n",
+ "
M
\n",
+ "
35.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
\n",
+ "
\n",
+ "
35544
\n",
+ "
35545
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
15
\n",
+ "
AH
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
35545
\n",
+ "
35546
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
15
\n",
+ "
AH
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
35546
\n",
+ "
35547
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
10
\n",
+ "
RM
\n",
+ "
F
\n",
+ "
15.0
\n",
+ "
14.0
\n",
+ "
\n",
+ "
\n",
+ "
35547
\n",
+ "
35548
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
7
\n",
+ "
DO
\n",
+ "
M
\n",
+ "
36.0
\n",
+ "
51.0
\n",
+ "
\n",
+ "
\n",
+ "
35548
\n",
+ "
35549
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
5
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
35549 rows × 9 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " record_id month day year plot_id species_id sex hindfoot_length \\\n",
+ "0 1 7 16 1977 2 NL M 32.0 \n",
+ "1 2 7 16 1977 3 NL M 33.0 \n",
+ "2 3 7 16 1977 2 DM F 37.0 \n",
+ "3 4 7 16 1977 7 DM M 36.0 \n",
+ "4 5 7 16 1977 3 DM M 35.0 \n",
+ "... ... ... ... ... ... ... ... ... \n",
+ "35544 35545 12 31 2002 15 AH NaN NaN \n",
+ "35545 35546 12 31 2002 15 AH NaN NaN \n",
+ "35546 35547 12 31 2002 10 RM F 15.0 \n",
+ "35547 35548 12 31 2002 7 DO M 36.0 \n",
+ "35548 35549 12 31 2002 5 NaN NaN NaN \n",
+ "\n",
+ " weight \n",
+ "0 NaN \n",
+ "1 NaN \n",
+ "2 NaN \n",
+ "3 NaN \n",
+ "4 NaN \n",
+ "... ... \n",
+ "35544 NaN \n",
+ "35545 NaN \n",
+ "35546 14.0 \n",
+ "35547 51.0 \n",
+ "35548 NaN \n",
+ "\n",
+ "[35549 rows x 9 columns]"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Note that pd.read_csv is used because we imported pandas as pd\n",
+ "pd.read_csv(datafile)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3686eebc-e266-4496-99ff-475f69ec72ce",
+ "metadata": {},
+ "source": [
+ "We can see that there were 35,549 rows parsed. Each row has 9\n",
+ "columns. The first column is the index of the DataFrame. The index is used to\n",
+ "identify the position of the data, but it is not an actual column of the DataFrame.\n",
+ "It looks like the `read_csv` function in Pandas read our file properly. However,\n",
+ "we haven't saved any data to memory so we can work with it. We need to assign the\n",
+ "DataFrame to a variable. Remember that a variable is a name for a value, such as `x`,\n",
+ "or `data`. We can create a new object with a variable name by assigning a value to it using `=`.\n",
+ "\n",
+ "Let's call the imported survey data `surveys_df`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "9c692fd0-32cf-4dad-be80-64458c563ab1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "surveys_df = pd.read_csv(datafile)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dcefc541-b4ac-4ebc-93ed-8348180f7779",
+ "metadata": {},
+ "source": [
+ "Notice when you assign the imported DataFrame to a variable, Python does not\n",
+ "produce any output on the screen. We can view the value of the `surveys_df`\n",
+ "object by typing its name into the Python command prompt."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "5429befb-4dba-4251-9071-a259e96738e4",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
record_id
\n",
+ "
month
\n",
+ "
day
\n",
+ "
year
\n",
+ "
plot_id
\n",
+ "
species_id
\n",
+ "
sex
\n",
+ "
hindfoot_length
\n",
+ "
weight
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
1
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
2
\n",
+ "
NL
\n",
+ "
M
\n",
+ "
32.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
2
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
3
\n",
+ "
NL
\n",
+ "
M
\n",
+ "
33.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
3
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
2
\n",
+ "
DM
\n",
+ "
F
\n",
+ "
37.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
4
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
7
\n",
+ "
DM
\n",
+ "
M
\n",
+ "
36.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
5
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
3
\n",
+ "
DM
\n",
+ "
M
\n",
+ "
35.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
...
\n",
+ "
\n",
+ "
\n",
+ "
35544
\n",
+ "
35545
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
15
\n",
+ "
AH
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
35545
\n",
+ "
35546
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
15
\n",
+ "
AH
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
35546
\n",
+ "
35547
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
10
\n",
+ "
RM
\n",
+ "
F
\n",
+ "
15.0
\n",
+ "
14.0
\n",
+ "
\n",
+ "
\n",
+ "
35547
\n",
+ "
35548
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
7
\n",
+ "
DO
\n",
+ "
M
\n",
+ "
36.0
\n",
+ "
51.0
\n",
+ "
\n",
+ "
\n",
+ "
35548
\n",
+ "
35549
\n",
+ "
12
\n",
+ "
31
\n",
+ "
2002
\n",
+ "
5
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
35549 rows × 9 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " record_id month day year plot_id species_id sex hindfoot_length \\\n",
+ "0 1 7 16 1977 2 NL M 32.0 \n",
+ "1 2 7 16 1977 3 NL M 33.0 \n",
+ "2 3 7 16 1977 2 DM F 37.0 \n",
+ "3 4 7 16 1977 7 DM M 36.0 \n",
+ "4 5 7 16 1977 3 DM M 35.0 \n",
+ "... ... ... ... ... ... ... ... ... \n",
+ "35544 35545 12 31 2002 15 AH NaN NaN \n",
+ "35545 35546 12 31 2002 15 AH NaN NaN \n",
+ "35546 35547 12 31 2002 10 RM F 15.0 \n",
+ "35547 35548 12 31 2002 7 DO M 36.0 \n",
+ "35548 35549 12 31 2002 5 NaN NaN NaN \n",
+ "\n",
+ " weight \n",
+ "0 NaN \n",
+ "1 NaN \n",
+ "2 NaN \n",
+ "3 NaN \n",
+ "4 NaN \n",
+ "... ... \n",
+ "35544 NaN \n",
+ "35545 NaN \n",
+ "35546 14.0 \n",
+ "35547 51.0 \n",
+ "35548 NaN \n",
+ "\n",
+ "[35549 rows x 9 columns]"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "surveys_df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3c8b9fbf-5b0a-44b2-ae9b-2cbc094f2eeb",
+ "metadata": {},
+ "source": [
+ "Note: if the output is too wide to print on your narrow terminal window, you may see something\n",
+ "slightly different as the large set of data scrolls past. You may see simply the last column\n",
+ "of data.\n",
+ "\n",
+ "Never fear, all the data is there, if you scroll up. Selecting just a few rows, so it is\n",
+ "easier to fit on one window, you can see that pandas has neatly formatted the data to fit\n",
+ "our screen:Never fear, all the data is there, if you scroll up. Selecting just a few rows, so it is\n",
+ "easier to fit on one window, you can see that pandas has neatly formatted the data to fit\n",
+ "our screen:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "7fa897f3-1237-434e-a4dd-44c1654578bd",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
record_id
\n",
+ "
month
\n",
+ "
day
\n",
+ "
year
\n",
+ "
plot_id
\n",
+ "
species_id
\n",
+ "
sex
\n",
+ "
hindfoot_length
\n",
+ "
weight
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
1
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
2
\n",
+ "
NL
\n",
+ "
M
\n",
+ "
32.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
2
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
3
\n",
+ "
NL
\n",
+ "
M
\n",
+ "
33.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
3
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
2
\n",
+ "
DM
\n",
+ "
F
\n",
+ "
37.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
4
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
7
\n",
+ "
DM
\n",
+ "
M
\n",
+ "
36.0
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
5
\n",
+ "
7
\n",
+ "
16
\n",
+ "
1977
\n",
+ "
3
\n",
+ "
DM
\n",
+ "
M
\n",
+ "
35.0
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " record_id month day year plot_id species_id sex hindfoot_length \\\n",
+ "0 1 7 16 1977 2 NL M 32.0 \n",
+ "1 2 7 16 1977 3 NL M 33.0 \n",
+ "2 3 7 16 1977 2 DM F 37.0 \n",
+ "3 4 7 16 1977 7 DM M 36.0 \n",
+ "4 5 7 16 1977 3 DM M 35.0 \n",
+ "\n",
+ " weight \n",
+ "0 NaN \n",
+ "1 NaN \n",
+ "2 NaN \n",
+ "3 NaN \n",
+ "4 NaN "
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "surveys_df.head() # The head() method displays the first several lines of a file. It\n",
+ " # is discussed below."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e1a68826-2a82-4700-8f3f-b543fb1c3c13",
+ "metadata": {},
+ "source": [
+ "## Exploring our Species Survey Data\n",
+ "\n",
+ "Again, we can use the `type` function to see what kind of thing `surveys_df` is:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "86fe32ef-6617-479c-9776-c0efa316e65f",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "pandas.core.frame.DataFrame"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "type(surveys_df)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7002ed40-3eca-49f9-8bb9-541e37c332e8",
+ "metadata": {},
+ "source": [
+ "As expected, it's a DataFrame (or, to use the full name that Python uses to refer\n",
+ "to it internally, a `pandas.core.frame.DataFrame`).\n",
+ "\n",
+ "What kind of things does `surveys_df` contain? DataFrames have an attribute\n",
+ "called `dtypes` that answers this:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "c586df32-ef8e-45c3-ad46-21c7f6aab133",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "record_id int64\n",
+ "month int64\n",
+ "day int64\n",
+ "year int64\n",
+ "plot_id int64\n",
+ "species_id object\n",
+ "sex object\n",
+ "hindfoot_length float64\n",
+ "weight float64\n",
+ "dtype: object"
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "surveys_df.dtypes"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "38184683-6906-4893-8289-3d756b1fbf08",
+ "metadata": {},
+ "source": [
+ "All the values in a column have the same type. For example, months have type\n",
+ "`int64`, which is a kind of integer. Cells in the month column cannot have\n",
+ "fractional values, but the weight and hindfoot_length columns can, because they\n",
+ "have type `float64`. The `object` type doesn't have a very helpful name, but in\n",
+ "this case it represents strings (such as 'M' and 'F' in the case of sex).\n",
+ "\n",
+ "We'll talk a bit more about what the different formats mean in a different lesson.\n",
+ "\n",
+ "### Useful Ways to View DataFrame objects in Python\n",
+ "\n",
+ "There are many ways to summarize and access the data stored in DataFrames,\n",
+ "using attributes and methods provided by the DataFrame object.\n",
+ "\n",
+ "To access an attribute, use the DataFrame object name followed by the attribute\n",
+ "name `df_object.attribute`. Using the DataFrame `surveys_df` and attribute\n",
+ "`columns`, an index of all the column names in the DataFrame can be accessed\n",
+ "with `surveys_df.columns`.\n",
+ "\n",
+ "Methods are called in a similar fashion using the syntax `df_object.method()`.\n",
+ "As an example, `surveys_df.head()` gets the first few rows in the DataFrame\n",
+ "`surveys_df` using **the `head()` method**. With a method, we can supply extra\n",
+ "information in the parens to control behaviour.\n",
+ "\n",
+ "Let's look at the data using these.\n",
+ "\n",
+ "> ## Challenge - DataFrames\n",
+ ">\n",
+ "> Using our DataFrame `surveys_df`, try out the attributes & methods below to see\n",
+ "> what they return.\n",
+ ">\n",
+ "> 1. `surveys_df.columns`\n",
+ "> 2. `surveys_df.shape` Take note of the output of `shape` - what format does it\n",
+ "> return the shape of the DataFrame in?\n",
+ ">\n",
+ "> HINT: [More on tuples, here][python-datastructures].\n",
+ "> 3. `surveys_df.head()` Also, what does `surveys_df.head(15)` do?\n",
+ "> 4. `surveys_df.tail()`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "65d9b5c5-187b-495c-af05-1d29ae10e20a",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e52d475d-1ab0-4d63-b40f-79d4c101d6eb",
+ "metadata": {},
+ "source": [
+ "## Calculating Statistics From Data In A Pandas DataFrame\n",
+ "\n",
+ "We've read our data into Python. Next, let's perform some quick summary\n",
+ "statistics to learn more about the data that we're working with. We might want\n",
+ "to know how many animals were collected in each site, or how many of each\n",
+ "species were caught. We can perform summary stats quickly using groups. But\n",
+ "first we need to figure out what we want to group by.\n",
+ "\n",
+ "Let's begin by exploring our data:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "9a898fab-d2cb-42c2-bf7d-231b1c674c1f",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Index(['record_id', 'month', 'day', 'year', 'plot_id', 'species_id', 'sex',\n",
+ " 'hindfoot_length', 'weight'],\n",
+ " dtype='object')"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Look at the column names\n",
+ "surveys_df.columns"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f62f48b9-2915-49a4-8b8d-2249f5f61698",
+ "metadata": {},
+ "source": [
+ "Let's get a list of all the species. The `pd.unique` function tells us all of\n",
+ "the unique values in the `species_id` column."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "026c7b71-f64a-4dd8-9c52-8b539c15f36c",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array(['NL', 'DM', 'PF', 'PE', 'DS', 'PP', 'SH', 'OT', 'DO', 'OX', 'SS',\n",
+ " 'OL', 'RM', nan, 'SA', 'PM', 'AH', 'DX', 'AB', 'CB', 'CM', 'CQ',\n",
+ " 'RF', 'PC', 'PG', 'PH', 'PU', 'CV', 'UR', 'UP', 'ZL', 'UL', 'CS',\n",
+ " 'SC', 'BA', 'SF', 'RO', 'AS', 'SO', 'PI', 'ST', 'CU', 'SU', 'RX',\n",
+ " 'PB', 'PL', 'PX', 'CT', 'US'], dtype=object)"
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.unique(surveys_df['species_id'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5aa04d1b-5743-4f76-b6e9-b72dbbec3db8",
+ "metadata": {},
+ "source": [
+ "> ## Challenge - Statistics\n",
+ ">\n",
+ "> 1. Create a list of unique site ID's (\"plot_id\") found in the surveys data. Call it\n",
+ "> `site_names`. How many unique sites are there in the data? How many unique\n",
+ "> species are in the data?\n",
+ ">\n",
+ "> 2. What is the difference between `len(site_names)` and `surveys_df['plot_id'].nunique()`?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "77564088-9128-4d94-ace5-110e1518f86b",
+ "metadata": {},
+ "source": [
+ "# Groups in Pandas\n",
+ "\n",
+ "We often want to calculate summary statistics grouped by subsets or attributes\n",
+ "within fields of our data. For example, we might want to calculate the average\n",
+ "weight of all individuals per site.\n",
+ "\n",
+ "We can calculate basic statistics for all records in a single column using the\n",
+ "syntax below:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "9f78f9f4-e387-405f-9dfa-1549eea21769",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "count 32283.000000\n",
+ "mean 42.672428\n",
+ "std 36.631259\n",
+ "min 4.000000\n",
+ "25% 20.000000\n",
+ "50% 37.000000\n",
+ "75% 48.000000\n",
+ "max 280.000000\n",
+ "Name: weight, dtype: float64"
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "surveys_df['weight'].describe()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3f9cccd0-5861-42c8-ac47-94d241760afe",
+ "metadata": {},
+ "source": [
+ "We can also extract one specific metric if we wish:\n",
+ "\n",
+ "```\n",
+ "surveys_df['weight'].min()\n",
+ "surveys_df['weight'].max()\n",
+ "surveys_df['weight'].mean()\n",
+ "surveys_df['weight'].std()\n",
+ "surveys_df['weight'].count()\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2546bf29-ae70-4897-ba4f-3892122b2d3b",
+ "metadata": {},
+ "source": [
+ "But if we want to summarize by one or more variables, for example sex, we can\n",
+ "use **Pandas' `.groupby` method**. Once we've created a groupby DataFrame, we\n",
+ "can quickly calculate summary statistics by a group of our choice."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "00fc49b2-c1a3-4db5-ba2c-0f6b2627c927",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Group data by sex\n",
+ "grouped_data = surveys_df.groupby('sex')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "612c8906-b9a7-491e-8fb4-1a2388277119",
+ "metadata": {},
+ "source": [
+ "The **pandas function `describe`** will return descriptive stats including: mean,\n",
+ "median, max, min, std and count for a particular column in the data. Pandas'\n",
+ "`describe` function will only return summary values for columns containing\n",
+ "numeric data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "421e6cc7-2f04-4732-93d3-c65eb844ec10",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
record_id
\n",
+ "
month
\n",
+ "
day
\n",
+ "
year
\n",
+ "
plot_id
\n",
+ "
hindfoot_length
\n",
+ "
weight
\n",
+ "
\n",
+ "
\n",
+ "
sex
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
F
\n",
+ "
18036.412046
\n",
+ "
6.583047
\n",
+ "
16.007138
\n",
+ "
1990.644997
\n",
+ "
11.440854
\n",
+ "
28.836780
\n",
+ "
42.170555
\n",
+ "
\n",
+ "
\n",
+ "
M
\n",
+ "
17754.835601
\n",
+ "
6.392668
\n",
+ "
16.184286
\n",
+ "
1990.480401
\n",
+ "
11.098282
\n",
+ "
29.709578
\n",
+ "
42.995379
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " record_id month day year plot_id \\\n",
+ "sex \n",
+ "F 18036.412046 6.583047 16.007138 1990.644997 11.440854 \n",
+ "M 17754.835601 6.392668 16.184286 1990.480401 11.098282 \n",
+ "\n",
+ " hindfoot_length weight \n",
+ "sex \n",
+ "F 28.836780 42.170555 \n",
+ "M 29.709578 42.995379 "
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Summary statistics for all numeric columns by sex\n",
+ "grouped_data.describe()\n",
+ "# Provide the mean for each numeric column by sex\n",
+ "grouped_data.mean()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f32f9aeb-3a0e-45cf-a58d-e1e855a4f142",
+ "metadata": {},
+ "source": [
+ "The `groupby` command is powerful in that it allows us to quickly generate\n",
+ "summary stats.\n",
+ "\n",
+ "> ## Challenge - Summary Data\n",
+ ">\n",
+ "> 1. How many recorded individuals are female `F` and how many male `M`?\n",
+ "> 2. What happens when you group by two columns using the following syntax and\n",
+ "> then calculate mean values?\n",
+ "> - `grouped_data2 = surveys_df.groupby(['plot_id', 'sex'])`\n",
+ "> - `grouped_data2.mean()`\n",
+ "> 3. Summarize weight values for each site in your data. HINT: you can use the\n",
+ "> following syntax to only create summary statistics for one column in your data.\n",
+ "> `by_site['weight'].describe()`\n",
+ ">\n",
+ ">\n",
+ ">> ## Did you get #3 right?\n",
+ ">> **A Snippet of the Output from challenge 3 looks like:**\n",
+ ">>\n",
+ ">> ```\n",
+ ">> site\n",
+ ">> 1 count 1903.000000\n",
+ ">> mean 51.822911\n",
+ ">> std 38.176670\n",
+ ">> min 4.000000\n",
+ ">> 25% 30.000000\n",
+ ">> 50% 44.000000\n",
+ ">> 75% 53.000000\n",
+ ">> max 231.000000\n",
+ ">> ...\n",
+ ">> ```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "00512ec3-e3be-4fd9-a3b0-f99633a0737e",
+ "metadata": {},
+ "source": [
+ "## Quickly Creating Summary Counts in Pandas\n",
+ "\n",
+ "Let's next count the number of samples for each species. We can do this in a few\n",
+ "ways, but we'll use `groupby` combined with **a `count()` method**."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "56735fff-81c5-47bc-abe3-f57a206a2b8e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "species_id\n",
+ "AB 303\n",
+ "AH 437\n",
+ "AS 2\n",
+ "BA 46\n",
+ "CB 50\n",
+ "CM 13\n",
+ "CQ 16\n",
+ "CS 1\n",
+ "CT 1\n",
+ "CU 1\n",
+ "CV 1\n",
+ "DM 10596\n",
+ "DO 3027\n",
+ "DS 2504\n",
+ "DX 40\n",
+ "NL 1252\n",
+ "OL 1006\n",
+ "OT 2249\n",
+ "OX 12\n",
+ "PB 2891\n",
+ "PC 39\n",
+ "PE 1299\n",
+ "PF 1597\n",
+ "PG 8\n",
+ "PH 32\n",
+ "PI 9\n",
+ "PL 36\n",
+ "PM 899\n",
+ "PP 3123\n",
+ "PU 5\n",
+ "PX 6\n",
+ "RF 75\n",
+ "RM 2609\n",
+ "RO 8\n",
+ "RX 2\n",
+ "SA 75\n",
+ "SC 1\n",
+ "SF 43\n",
+ "SH 147\n",
+ "SO 43\n",
+ "SS 248\n",
+ "ST 1\n",
+ "SU 5\n",
+ "UL 4\n",
+ "UP 8\n",
+ "UR 10\n",
+ "US 4\n",
+ "ZL 2\n",
+ "Name: record_id, dtype: int64\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Count the number of samples by species\n",
+ "species_counts = surveys_df.groupby('species_id')['record_id'].count()\n",
+ "print(species_counts)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "91527637-b894-4ab7-89d3-3baaf6c7c564",
+ "metadata": {},
+ "source": [
+ "Or, we can also count just the rows that have the species \"DO\":"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "014171dd-62da-4bcb-95e0-98b4818085a8",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "3027"
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "surveys_df.groupby('species_id')['record_id'].count()['DO']"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d402cd18-b262-4f1c-9cf9-d95fd68de038",
+ "metadata": {},
+ "source": [
+ "> ## Challenge - Make a list\n",
+ ">\n",
+ "> What's another way to create a list of species and associated `count` of the\n",
+ "> records in the data? Hint: you can perform `count`, `min`, etc. functions on\n",
+ "> groupby DataFrames in the same way you can perform them on regular DataFrames."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e7e8ae98-dd6c-4e70-b6ed-866d67160f8d",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2e4f40b1-6568-4db2-acfe-affbf491cfc1",
+ "metadata": {},
+ "source": [
+ "## Basic Math Functions\n",
+ "\n",
+ "If we wanted to, we could perform math on an entire column of our data. For\n",
+ "example let's multiply all weight values by 2. A more practical use of this might\n",
+ "be to normalize the data according to a mean, area, or some other value\n",
+ "calculated from our data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "a5acc742-0c2b-42d9-b91b-189e01f7f5bf",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0 NaN\n",
+ "1 NaN\n",
+ "2 NaN\n",
+ "3 NaN\n",
+ "4 NaN\n",
+ " ... \n",
+ "35544 NaN\n",
+ "35545 NaN\n",
+ "35546 28.0\n",
+ "35547 102.0\n",
+ "35548 NaN\n",
+ "Name: weight, Length: 35549, dtype: float64"
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Multiply all weight values by 2\n",
+ "surveys_df['weight']*2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d5e6fb78-a9b1-45b1-8205-08b93ab3c9fd",
+ "metadata": {},
+ "source": [
+ "# Quick & Easy Plotting Data Using Pandas\n",
+ "\n",
+ "We can plot our summary stats using Pandas, too."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "id": "da1e26aa-ce14-42a5-854b-945c98b7c000",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEOCAYAAABmVAtTAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAi3klEQVR4nO3df7xUVb3/8ddbUEGNhEQzMEGl/Fn+ILO8t1tiSVqhpYlmUlmUV029t7pQ3bQffK+Vt/yRWpYpplcl8wf90DTUyh5eFcVEVC4koiTpMS3JHyDw+f6x1uyzz5w9c2DO4ZyDvJ+Pxzxmz1p7r732zJ712WvtPXsUEZiZmQFs1NcVMDOz/sNBwczMCg4KZmZWcFAwM7OCg4KZmRUcFMzMrDCwryvQqq222ipGjRrV19UwM1uv3HPPPU9HxPBG+ettUBg1ahSzZ8/u62qYma1XJC1ulu/hIzMzKzgomJlZwUHBzMwKDgpmZlZwUDAzs4KDgpmZFRwUzMys4KBgZmaF9fbHa9a6UVN+WUw/esYhfVgTM+tv3FMwM7OCg4KZmRUcFMzMrOCgYGZmBQcFMzMrOCiYmVnBQcHMzApdBgVJP5b0lKQHSmnDJN0saUF+HlrKmyppoaT5kg4qpe8jaW7OO0eScvqmkq7K6XdKGtXD22hmZmtoTXoKlwDj69KmALMiYgwwK79G0q7ARGC3vMz5kgbkZS4AJgNj8qNW5nHAsxGxE/Bd4JutboyZmXVPl0EhIn4HPFOXPAGYnqenA4eW0q+MiOURsQhYCOwraVtgSETcEREBXFq3TK2sq4FxtV6EmZn1rlbPKWwTEUsB8vPWOX0E8HhpviU5bUSerk/vsExErAT+DrymxXqZmVk39PSJ5qoj/GiS3myZzoVLkyXNljS7ra2txSqamVkjrQaFJ/OQEPn5qZy+BNiuNN9I4ImcPrIivcMykgYCr6bzcBUAEXFhRIyNiLHDhw9vsepmZtZIq0FhJjApT08Cri+lT8xXFI0mnVC+Kw8xLZO0Xz5fcGzdMrWyDgduyecdzMysl3V562xJVwDvBLaStAQ4DTgDmCHpOOAx4AiAiJgnaQbwILASOCEiVuWijiddyTQYuCE/AC4CfiJpIamHMLFHtszMzNZal0EhIo5qkDWuwfzTgGkV6bOB3SvSXyIHFTMz61v+RbOZmRUcFMzMrOCgYGZmBQcFMzMrOCiYmVnBQcHMzAoOCmZmVnBQMDOzgoOCmZkVHBTMzKzgoGBmZgUHBTMzKzgomJlZwUHBzMwKDgpmZlZwUDAzs4KDgpmZFRwUzMys4KBgZmYFBwUzMys4KJiZWcFBwczMCg4KZmZWcFAwM7OCg4KZmRUcFMzMrOCgYGZmBQcFMzMrOCiYmVnBQcHMzArdCgqSTpU0T9IDkq6QNEjSMEk3S1qQn4eW5p8qaaGk+ZIOKqXvI2luzjtHkrpTLzMza03LQUHSCOCzwNiI2B0YAEwEpgCzImIMMCu/RtKuOX83YDxwvqQBubgLgMnAmPwY32q9zMysdd0dPhoIDJY0ENgMeAKYAEzP+dOBQ/P0BODKiFgeEYuAhcC+krYFhkTEHRERwKWlZczMrBe1HBQi4s/AmcBjwFLg7xFxE7BNRCzN8ywFts6LjAAeLxWxJKeNyNP16WZm1su6M3w0lHT0Pxp4HbC5pGOaLVKRFk3Sq9Y5WdJsSbPb2trWtspmZtaF7gwfHQgsioi2iHgZuAZ4O/BkHhIiPz+V518CbFdafiRpuGlJnq5P7yQiLoyIsRExdvjw4d2oupmZVelOUHgM2E/SZvlqoXHAQ8BMYFKeZxJwfZ6eCUyUtKmk0aQTynflIaZlkvbL5RxbWsbMzHrRwFYXjIg7JV0N3AusBOYAFwJbADMkHUcKHEfk+edJmgE8mOc/ISJW5eKOBy4BBgM35IeZmfWyloMCQEScBpxWl7yc1Guomn8aMK0ifTawe3fqYmZm3edfNJuZWcFBwczMCg4KZmZWcFAwM7OCg4KZmRUcFMzMrOCgYGZmBQcFMzMrOCiYmVnBQcHMzAoOCmZmVnBQMDOzgoOCmZkVHBTMzKzgoGBmZgUHBTMzKzgomJlZwUHBzMwKDgpmZlZwUDAzs4KDgpmZFRwUzMys4KBgZmYFBwUzMys4KJiZWcFBwczMCg4KZmZWcFAwM7OCg4KZmRUcFMzMrNCtoCBpS0lXS3pY0kOS3iZpmKSbJS3Iz0NL80+VtFDSfEkHldL3kTQ3550jSd2pl5mZtaa7PYWzgRsjYmfgzcBDwBRgVkSMAWbl10jaFZgI7AaMB86XNCCXcwEwGRiTH+O7WS8zM2tBy0FB0hDgHcBFABGxIiL+BkwApufZpgOH5ukJwJURsTwiFgELgX0lbQsMiYg7IiKAS0vLmJlZL+pOT2EHoA24WNIcST+StDmwTUQsBcjPW+f5RwCPl5ZfktNG5On6dDMz62XdCQoDgb2BCyJiL+B58lBRA1XnCaJJeucCpMmSZkua3dbWtrb1NTOzLnQnKCwBlkTEnfn11aQg8WQeEiI/P1Waf7vS8iOBJ3L6yIr0TiLiwogYGxFjhw8f3o2qm5lZlZaDQkT8BXhc0htz0jjgQWAmMCmnTQKuz9MzgYmSNpU0mnRC+a48xLRM0n75qqNjS8uYmVkvGtjN5U8CLpe0CfAI8HFSoJkh6TjgMeAIgIiYJ2kGKXCsBE6IiFW5nOOBS4DBwA35YWZmvaxbQSEi7gPGVmSNazD/NGBaRfpsYPfu1MXMzLrPv2g2M7OCg4KZmRUcFMzMrOCgYGZmBQcFMzMrOCiYmVnBQcHMzAoOCmZmVnBQMDOzgoOCmZkVHBTMzKzgoGBmZgUHBTMzKzgomJlZwUHBzMwKDgpmZlZwUDAzs4KDgpmZFRwUzMys4KBgZmYFBwUzMys4KJiZWcFBwczMCg4KZmZWcFAwM7OCg4KZmRUcFMzMrOCgYGZmBQcFMzMrOCiYmVmh20FB0gBJcyT9Ir8eJulmSQvy89DSvFMlLZQ0X9JBpfR9JM3NeedIUnfrZWZma68negonAw+VXk8BZkXEGGBWfo2kXYGJwG7AeOB8SQPyMhcAk4Ex+TG+B+plZmZrqVtBQdJI4BDgR6XkCcD0PD0dOLSUfmVELI+IRcBCYF9J2wJDIuKOiAjg0tIyZmbWi7rbUzgL+AKwupS2TUQsBcjPW+f0EcDjpfmW5LQRebo+3czMelnLQUHS+4CnIuKeNV2kIi2apFetc7Kk2ZJmt7W1reFqzcxsTXWnp7A/8AFJjwJXAgdIugx4Mg8JkZ+fyvMvAbYrLT8SeCKnj6xI7yQiLoyIsRExdvjw4d2oupmZVWk5KETE1IgYGRGjSCeQb4mIY4CZwKQ82yTg+jw9E5goaVNJo0knlO/KQ0zLJO2Xrzo6trSMmZn1ooHroMwzgBmSjgMeA44AiIh5kmYADwIrgRMiYlVe5njgEmAwcEN+mJlZL+uRoBARtwG35em/AuMazDcNmFaRPhvYvSfqYmZmrfMvms3MrOCgYGZmBQcFMzMrOCiYmVnBQcHMzAoOCmZmVnBQMDOzgoOCmZkVHBTMzKzgoGBmZgUHBTMzKzgomJlZYV3cJdXM+sCoKb/s8PrRMw7po5rY+sw9BTMzK7inYB2UjzZ9pGm24XFPwczMCg4KZmZW8PCRvaL4ZKtZ97inYGZmBQcFMzMrOCiYmVnBQcHMzAoOCmZmVvDVR2bWEl/p9crknoKZmRXcU7A14qNCsw2DewpmZlZwUDAzs4KDgpmZFRwUzMys4BPN1i/5xLZZ32i5pyBpO0m3SnpI0jxJJ+f0YZJulrQgPw8tLTNV0kJJ8yUdVErfR9LcnHeOJHVvs8zMrBXdGT5aCfx7ROwC7AecIGlXYAowKyLGALPya3LeRGA3YDxwvqQBuawLgMnAmPwY3416mZlZi1oOChGxNCLuzdPLgIeAEcAEYHqebTpwaJ6eAFwZEcsjYhGwENhX0rbAkIi4IyICuLS0jJmZ9aIeOdEsaRSwF3AnsE1ELIUUOICt82wjgMdLiy3JaSPydH161XomS5otaXZbW1tPVN3MzEq6HRQkbQH8DDglIp5rNmtFWjRJ75wYcWFEjI2IscOHD1/7ypqZWVPdCgqSNiYFhMsj4pqc/GQeEiI/P5XTlwDblRYfCTyR00dWpJuZWS/rztVHAi4CHoqI75SyZgKT8vQk4PpS+kRJm0oaTTqhfFceYlomab9c5rGlZczMrBd153cK+wMfBeZKui+nfRE4A5gh6TjgMeAIgIiYJ2kG8CDpyqUTImJVXu544BJgMHBDfpiZWS9rOShExO1Unw8AGNdgmWnAtIr02cDurdbFzMx6hm9zYWZmBQcFMzMr+N5HZl3wfZhsQ+KegpmZFRwUzMys4KBgZmYFBwUzMys4KJiZWcFXH5nhK4zMahwUbJ1yY2u2fvHwkZmZFRwUzMys4KBgZmYFBwUzMyv4RLP1GZ+ENut/3FMwM7OCewpm6xH3rmxdc0/BzMwKDgpmZlZwUDAzs4KDgpmZFRwUzMys4KBgZmYFBwUzMys4KJiZWeEV/eO18g99/CMfM7OuuadgZmYFBwUzMyus98NHHiIyM+s5631QMDNbFzbUA85+M3wkabyk+ZIWSprS1/UxM9sQ9YuegqQBwHnAu4ElwN2SZkbEg31bMzNb322oR/yt6hdBAdgXWBgRjwBIuhKYADgomPWh3v7/Bjfgfa+/BIURwOOl10uAt67LFTbb+V7peT2tLxuOtVnfuqhnszLXRV5v17Mn1tdT+2Zvl7ku6tKf8hpRRKzRjOuSpCOAgyLik/n1R4F9I+KkuvkmA5PzyzcC80vZWwFPN1hFK3k9XZ7znOc85/WHemwfEcMbzAsR0ecP4G3Ar0uvpwJT17KM2T2Z19PlOc95znNef6pHo0d/ufrobmCMpNGSNgEmAjP7uE5mZhucfnFOISJWSjoR+DUwAPhxRMzr42qZmW1w+kVQAIiIXwG/6kYRF/ZwXk+X5zznOc95/akelfrFiWYzM+sf+ss5BTMz6wccFMzMrOCg0ISk7SR9vgfKGSJpSE/Uqck6Pidpu3W5jrr1Ndx3JG3ZYpm/aLlCPUzS6/u6DjWSTunl9XX7XKOkLSRtXn7dZN4dG6Rv3N169BRJ+/V1HZqRNFSSupjnQ2tU1vp2TkHSGOBLwDPAd4AfAu8AFgK30PGX0WVDgTMj4u+5nHcBhwKLge9FxIqcvhVwBHAU6ZfWt5JOyFet7yHgqSbVfQT4PDCIFIDbgK8AdwCbR763U11d3hsR726w7R+sSwrSj1LuA74GHA4sAq4AfhoRT+fl/q1JHcnbMRXYNZf5IPBN4Dbg5Yh4OZfzRuDgXM8vA8dHxJ11dfwk8MWI2KHBNjwWEZUNrqQVEbFJg7y3RMTdDfI+GhE/qUvbCvgrsCOwTUT8oS7/n4FDImJKfn1ERPy0lP+XiHhtnv5ZRHyolDcBGBkR5+XXdwK1HwM9TNo3KkXEZxvlNSLpBeA3DbI3ynk7AXOBiyJiZV6u/nOv7S+3AysjovK7Iml+RLwxT58bpR+RSjqni+o+DEwBNgcELCPtS/9O+u3RjFJZg0j70ZERMSanCXgXcDTwIeBTEXF1Xf0+AoyJiNPz66ER8Wwpf27e1k6bBkREvCl/53bL8z0YEbdK+hRwW0QsyPX4ca7Do8DGEbFLg/fr53Xrq73Pt0bEZY3eKEl/zJM7kj6746KL+71JejdwAfC+iHhY0qbAjcCbgZXA0RFRua80++51mG89DAq3A5cCQ4BTgVOAnwP/DFxMarQBPg38oLTop0i/kn5C0p6kL9J/AW/K+bNIO+IbgGtJO+rILtb3a1KDPAN4grTT1RwKbAqcGO33dNoBOJv0Aa6KiNEVdTkkIrZqsO0XVyQPy8sdRwpg7yD9zmMC8EdSgLgk1/MGYHldPfcmBb8vALNz2ljgDGDLXJ8FknYC7gIuJwWPJ/J67wL+A9geOJ90i5JTI2JJg214PCIqezSSXibdB6vqiOdK0mc0NSL+luffPa8z8uMZ4OvAT0i/4tyIFCQ/FRH3161rLPCbiNgyv743IvYu5b8QEZvl6TkRsVcp7w/AxFqjKuk+YBypIbwB+Fae9avAaaXVfh94ubR9tS9frbGq7E1KWkX7Z3ln3ftzGumz+D3wXmBxRJyclzuNzoYBB5EOkv4b+E4piGyT0w6LiM0bvC8rgAeo3uffD2xG9T6/ANiFdIB1PKlBPhO4Lr9Pu5G+f4flOp4AnAgcHBFtde/Ha0n3StuiQR23r00CvyQdyNRsA5wLvATck+fZGxhM+o7vEREvSzqaFMjeA+wFXFdbXz1J/1KRPAw4BlhQO/CoWG4FcAjwO+ADwCcj4qCcdwBpf3ldfo/+H6kdEukAZPuIiHyXh6OAA0lt1/SI2LfB+hp+9zpYm1+69YcHcF9pemGTvDl1efeXps8EvpWnNwJWA78lNfS1QPnIGqxvLvAZUmN8M/BJYGjOWwAMqqj/YGAV8IEGdVkOfLDRo8F7sj1wZ13aANKXfw7wIqmRvw+4KO9Ate18EBhWUeZrgJdKr78OnJenN8nbPgD4BvAsqYf2njX4/B5rkrea1Nu7tcFjKqmR/wTwXdJtTt5HCmbvIfXwngX2y+XtDLzYZH0vlqbr95cXStP31uXdXff6e6Xp/21UZjf2+ceA8cD0/Hl+A9ittg+W5htYX9cG5Q0jBZkf5M/xAOBkUg/whHIZFdv+mib7/Pwm+/z/5enPk45ol5ACwTTSd2VWLus1wKL67+zafHZ189XX/1rgYxXzHQv8vfT6f4CTS69Xkn5QW/losO4BlNqPivwVjeqaP+d3kg4sDwWeq9WnvL3Az4BPN9re+v1oTfa3fvM7hbWwujT9XJO8+i5Q+YjmAFIDQ0SslvQX0hDPBcD/SLpqDdf3ckR8H/i+pBGkiD1P0n8AqyPipfrKR8SLklZGRO0X2/V1GUBq6KqOlgO4pqLMxeXxV0l7kHoLR5KGUKZGxFnAFElvz/U8N9dTEfFMRZl/rRuiPAD4ds5bIWk17cNsF5ACzZGSZgMfq6g7eZteK6nq1+q1o+UDGixb27aVwI9IR6m1nt83IuKmnP+1iPjfXM+Hm537oON7XL+/DJb0XJ6nNl1bZvPyjBFxYull+Z4yHcrMwyWfIQ313E/6kWbtKH1Znr+qFzE4Im4EbszDBUcBt0n6GqnnUavHyi6GlWvzPSNpdUR8WtLJpJ7qE6RgukTStyXdn9e9Y56u1SUi4k1U7/M02edXS5pK6tH+K+no/RzSUfiDpH3oFxHxkqTatg+SNLD2HpXex42BAZL2Ih1IDZK0d3meiLi3webvGhGHVdTxUkkXStqWdGAxjhSwalaTelFrLCJWSXp1xbAvpPdyQF3elqXXr46I2/L0dZLaIuLs/Hp57iU/SRpq+1x5+0qfV/36tlmTeq+PQWHnJjts5Th2doukGcBSUtf5FoC8E/w5It6au7pHkbprr8s7+i5drS/vkEeR/g/iBlK3dImkcRExq1yJ3C18qkldlkfEJ9bmDZG0c3rSf+Z6rCINt7wncjc+zzec9CXcg3Sk9hTwnKQ3R8Qf68p8M/APSWcCfyY1ZLWGd8u87R8BDoyIRZK+ROru3w3My+9BlctIR7yDgTGkL9ufSL2ZnXP5g/L6AvhTbih2JA0VrSINQ7wX+J2kaXQM3C/Wre95SZ+KiB+WEyUdB2zSpOFfGY2Hcy5vUOanScNpjUwnNeK/JzWKu5GO0ImIVzVZjhwMDiF9vqNIDeo1wFmlbYCOwSyqtiHvg8sk/YB0N+LxuT435CBROXZeV8ba7vMjScOU+0Q6r3ehpPeResp/Jg2fnCXp1rwNA/P2/VDSiRHxfC5rc9Lwz1JSIy3gL7mcmi0kfab0fpQDxmYNtmcj0vDjbNIR/szId1XIw0MvRsRvGyw7TNKwuuShpN7HatJBXodF8vMjdXm/JQ3BAWxVFzBUen0tcDXpAOQ7EbEoz3Awqed3BNXnVI6sqn+n7cndivVGabywQzJpp/sV6aQQpEZlYSl/NWncfltgRkT8OZf3DuDiiOhwBUQ+2j6a1PAdTecT2NuTTuzuTzpReyVwY+nIbzfgetJJvXtIH9Jb8vwnksbtn6+oy83AAVF9YvT0vEzZsLxNo0ld+lMiYm7dcm8jHekPIu1MMyLiqZz/T6TzBBfX1XMSaZhm31z+j2uBI/c2PhwRp9TVpTbe+98R8ZH6vJy/MekI7BOkYZHaZ3cJKTi+k3Q0uZh0FDgy1+2DpBOYv6A9YDxDGkY6knRCU6Rg80JtdXmb7wZW0B6oxpKGwA6LiL80qGezo/qtSQcOy4HaEek+pK7+HqR9raoum0XEgFzGQOCuyGPhXaxvOrA7qfG9MiIeqKpzxTZUnXAdRuoVbEsa6z+rtJ49SYF3cUQcVVHeAOCnpH1/bff5qVE6kV8qczDw5Yj4Un4P3kcKNvuT9ufFpGGlxXmR15OGQK8HHo2IpbmcSbSfFN6L6nM3kA5EbiR9T8qB5rukcwpfAJZFxLOSjs1lPkk6j3Z4bX8p5S0mnb9bVVrfalIP/Tbgb6T9pCZIF5zcTtqny1bTfiHAV+rqXe4C7ko6p1MrL0rLLSQFl4/W2pWigLpzL42sd0GhLO/ERwMfJo0130I6GdNhNlLD8sWIOLjBctdExLl1ZdeuXvl5XrbqROXdtB/lQscPcRPSzvwG0hGhSEfQC0iN+ykNyryRFBSq8i4ifSl/X1rfX0njsF8ljTk2quf9pEa4vp6Qbkf+r3X1PK/cYOZeBlF30q8qT9JX6ucpGZ/L/7eIWJbnH0I60nsLqTE5tSJvFanh/3jejlrAuAS4uTZ81IjS1Sa755fzIuKWLhriq2g/qu9wArdU5gH5PSvK7KIO9SdEi9fN1qc0VFc+GCgPLQWpAanahvoDqAD+GhHPSxoZjS8GOAnYgnRkP5N0oHIiaZji9VTv87W67Ev6bpX3pcurhpXyut5K6j2cn18fSzrYaiMN136L1LDuRDpgeD/pCqd/Ad6Vh8LeQQpQJwF7Am8HjmkQML5BOq/xcVKDHqQgN500ZDSuQZknAju2sL4X6Nx7rZ3sn0s6aV+VdzfpHGBNOWAcW/FW1pbblHRS+iuk71j5iro5UbpgopH1LihIegNpvPwoUoN4FfC5iNi+br496djw/47UsHRaTuka5DOovnrlhYgY1aAu80lHtVW9iNNp3PDf0mRo4qWIGNQg7zngn1oo8xHSl6BSrVtc0biLtHOdSHovNiKdcDs3Ir4m6fScp3IenXszkMbhjwO2AwZE3Y6Xj0JfBDZtkPc0qZdTFUxWkI6QqhrGcsNff8lms4Z4bkTskaebHdXXl9ks0KwqvTflnkTTXkQzXWzDmtazPu960rj6HaSGcijpIOfknN5QRCyuT8uf38dJwx31geZMYFZEHLyWDfFJ0X7l2HlAW7RfovoC6ZLhhg047YHmXaTeycOkg7HdG5T5YkQMbmF9u0TE4RXvyTDS1W+dPuOc9wAdr56E9ob/9Ii4ssFySyJis9xOXp7LOSEiXljTnkK3r4zo7QftVwrtVEqrXSn0BlIj9hApop5E+pJ0tVyzq1eWN6nL88CbKtLHAs81Wa5ZmStaXK5Z3sLS9HBgeOm1SAGsjRQsn6H99xSnkr68o0vz70C6FPe6JnmnltJeRboWfRHpevU/tbjtK8gHMXXpA0g9iMtIlyFfB5xdyr+qLu+sUl7DK3fofNXKvWtYZn3e2Y22qVH5FesbRLoU+nukXt3ANdyGhnXpIq9c5gDSd+JVXdR/AKlnPDXX89153zqRdES+lNSr+zRp6ONm0vdxfqmM80gNXu31i83yau8DqUF/xxou9wL5ajvS5dtPkI7qvw78vUmZL7W4vvuavGdz1jaPFBiaXWFUvmpuIOlgdz7p3FGXV6ZFxHoZFA7LO/TjpN8kjKP9ErZmDX+z5e4rzf9Q3fqeIV3nXl+P4yhdwlaR36yRXtakzMVN8pa1WOZVpIb/aVLD/yxr1vAvAbaqKHN4/nI1ypuTd95vkILB6bRftngdcGzFcseQGo5Gef9Yk/eazg1jqw3/KtLwxXP5vV1Zml7Vyvq62K+bre9lGjfgzbah2ba3+r4Mob3hfw+p4T+J5g3/njQINKQj2VYa4ieBP5DOLcyhfdRjJ9LBWisN+NImZS5ucX0PNPi8DyD17tcqL+fPabJcpzaCNOz2SFVe1WO9u/ooIq4Frs0nhw4lNWrbSLqA9KveXYBbJd1I6sZpDZYrX2JYP/63BPi40q8o609UdrqUs2SFGl/1MqtJmYeQrsyoyvtNi2XeQPoNxlui/UqFHUiXAe4G7Bn518/5vXpE0jGk65o7/cVfRLRJ2qhJ3kjSmOiFpB8D/aM0ywnANZI+QceTkYNJ3fjzG+TdLunYiOhwzijXs/jMovNlmc0u2XyzOl5xVL5y5/loPBxXXO64lutrKPLQUYP1zY2IY/L0RXS8wqnZNhRX2vTg+zKINCxxB6ln8HnSPjYB+ElEfCzX80ekg5DXR8QypR8m1ta3StKinH4F8FtJT5M+x9/n5XciNfyN8haSfly2LXBT5NaPNIx5cZPlVqv9MtdxtP+9L6QDpkZlHpa3c23XN1qdLxGtnezfskle1XmD2nmsMU2W6/RDuYi4TdI+pGDdpfXunEKVPJZ2BOlXyAeUGv6jSNFzOnBt1J2MLC33fZpcvRIRGzc4UXkFKaJXNdLvB7amyVUvVWWWyqha3zaky9HWqkxJc4B31zfi+RzCY5HHSive1xeb5BW/+K3IC9JR3ko6X0ERETFE7Sdples5q7R8pzyla+KvIX3p6gPGPjQeq9+C9NlW5UWjhr+ZLs4NrIv1NTxB3dv1rDvXUjvXU2v4m51Ib1aXjUi9wZui/YqgN+Q6lhviDnnR+LcIKJ0nrFrus6Srk54mnTTfOyIiN+DTI2L/Jm9pQ03WtxPpZHtN0H6yf/u6Ysp5za4cm0I62d9puVbq3mlbXglBoZn6gNHDZbfUSPfAeteqTEkPRD6BVpHXrOEP2huODll0bFTq8wZFxDq5mVmzYPJK1UWD2lKg6UZdWm34e7WezTRpwJsGmt7ULGCs83W/0oNCb1gXDX9PanZk2UXDv84ad1s/rS8Nv7XOQWEDUPdF7pCFG34zK3FQMDOzwkZ9XQEzM+s/HBTMzKzgoGBmZgUHBbMeIulXavH/qSvK+pqkAyvS36l+9F/W9sqz3v2i2ay/inwX3h4qq9mdZs3WGfcUbIMiaXNJv5T0R0kPSDpS0qOSvinprvzYKc87XNLPJN2dH/vn9C0kXSxprqT7JX0opz+qdMt1JB2Ty7pP0g8kDciPS/J650o6tUk9L5F0eJ4eL+lhpf8Lr78Hv1mPck/BNjTjgSci4hAASa8m3b31uYjYV+l+/meR7sN0NvDdiLhd0utJNwncBfhP0s0Qa7d7GFpegaRdSH/8s3+kP4E/n/RnTfOAEdF+e+Ytu6qs0i2uf0i6XctC0s0NzdYZ9xRsQzMXODD3DP450l9DAlxRen5bnj4Q+J6k+0j/ATBE0qty+nm1AiPi2bp1jCPdj+nuvOw40p1nHwF2kHSupPF0/s/vKjuT7ua7IN+E7bK12lqzteSegm1QIuL/8h0jDwb+S1LtJonlX3GW74D5tojocOdcSaLzzco6zEK6udrUThnpv68PIt0t9sOkvyXtstprMI9Zj3BPwTYokl5H+iOSy0j/+lW7J9SRpec78vRNpD+JqS27Z4P0DsNHpNuYH670X84o/bH79vl8w0YR8TPSEFTX/4KV7s8/WlLtP8Q7/XeyWU9yT8E2NHsA31b63+OXgeNJf/O5qaQ7SQdKtYb3s8B5SveuH0j6S9fPkP486DxJD5D+HOerpNt6AxARD0r6MnCTpI3yek4g3fb74pwG6c9qmoqIlyRNBn6pdL/+22m/+aJZj/O9j2yDJ+lRYGzVnwaZbWg8fGRmZgX3FMz6kKTzgPp/+zo7Ii7ui/qYOSiYmVnBw0dmZlZwUDAzs4KDgpmZFRwUzMys4KBgZmaF/w9kYYqYettt6gAAAABJRU5ErkJggg==\n",
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "total_count = surveys_df.groupby('plot_id')['record_id'].nunique()\n",
+ "# Let's plot that too\n",
+ "total_count.plot(kind='bar')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f2aa1252-3611-4c2a-aafa-12c0c892e43a",
+ "metadata": {},
+ "source": [
+ "> ## Challenge - Plots\n",
+ ">\n",
+ "> 1. Create a plot of average weight across all species per site.\n",
+ "> 2. Create a plot of total males versus total females for the entire dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "778470a0-3584-4b48-87c3-1d5f631bbe57",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5a2945e0-d51f-487d-81a6-44395601df1c",
+ "metadata": {},
+ "source": [
+ "> ## Summary Plotting Challenge\n",
+ ">\n",
+ "> Create a stacked bar plot, with weight on the Y axis, and the stacked variable\n",
+ "> being sex. The plot should show total weight by sex for each site. Some\n",
+ "> tips are below to help you solve this challenge:\n",
+ ">\n",
+ "> * For more information on pandas plots, see [pandas' documentation page on visualization][pandas-plot].\n",
+ "> * You can use the code that follows to create a stacked bar plot but the data to stack\n",
+ "> need to be in individual columns. Here's a simple example with some data where\n",
+ "> 'a', 'b', and 'c' are the groups, and 'one' and 'two' are the subgroups.\n",
+ ">\n",
+ "> ```\n",
+ "> d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}\n",
+ "> pd.DataFrame(d)\n",
+ "> ```\n",
+ ">\n",
+ "> shows the following data\n",
+ ">\n",
+ "> ```\n",
+ "> one two\n",
+ "> a 1 1\n",
+ "> b 2 2\n",
+ "> c 3 3\n",
+ "> d NaN 4\n",
+ "> ```\n",
+ ">\n",
+ "> We can plot the above with\n",
+ ">\n",
+ "> ```\n",
+ "> # Plot stacked data so columns 'one' and 'two' are stacked\n",
+ "> my_df = pd.DataFrame(d)\n",
+ "> my_df.plot(kind='bar', stacked=True, title=\"The title of my graph\")\n",
+ "> ```\n",
+ ">\n",
+ "> * You can use the `.unstack()` method to transform grouped data into columns\n",
+ "> for each plotting. Try running `.unstack()` on some DataFrames above and see\n",
+ "> what it yields.\n",
+ ">\n",
+ "> Start by transforming the grouped data (by site and sex) into an unstacked layout, then create a stacked plot.\n",
+ ">"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e1c9d455-6187-40ac-9953-b4ec0ac4b95d",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fb555c44-643c-4287-b80a-608ee0344b55",
+ "metadata": {},
+ "source": [
+ ">> ## Solution to Summary Challenge\n",
+ ">>\n",
+ ">> First we group data by site and by sex, and then calculate a total for each site.\n",
+ ">>\n",
+ ">> ```\n",
+ ">> by_site_sex = surveys_df.groupby(['plot_id', 'sex'])\n",
+ ">> site_sex_count = by_site_sex['weight'].sum()\n",
+ ">> ```\n",
+ ">>\n",
+ ">> This calculates the sums of weights for each sex within each site as a table\n",
+ ">>\n",
+ ">> ```\n",
+ ">> site sex\n",
+ ">> plot_id sex\n",
+ ">> 1 F 38253\n",
+ ">> M 59979\n",
+ ">> 2 F 50144\n",
+ ">> M 57250\n",
+ ">> 3 F 27251\n",
+ ">> M 28253\n",
+ ">> 4 F 39796\n",
+ ">> M 49377\n",
+ ">> \n",
+ ">> ```\n",
+ ">>\n",
+ ">> Below we'll use `.unstack()` on our grouped data to figure out the total weight that each sex contributed to each site.\n",
+ ">>\n",
+ ">> ```\n",
+ ">> by_site_sex = surveys_df.groupby(['plot_id', 'sex'])\n",
+ ">> site_sex_count = by_site_sex['weight'].sum()\n",
+ ">> site_sex_count.unstack()\n",
+ ">> ```\n",
+ ">>\n",
+ ">> The `unstack` method above will display the following output:\n",
+ ">>\n",
+ ">> ```\n",
+ ">> sex F M\n",
+ ">> plot_id\n",
+ ">> 1 38253 59979\n",
+ ">> 2 50144 57250\n",
+ ">> 3 27251 28253\n",
+ ">> 4 39796 49377\n",
+ ">> \n",
+ ">> ```\n",
+ ">>\n",
+ ">> Now, create a stacked bar plot with that data where the weights for each sex are stacked by site.\n",
+ ">>\n",
+ ">> Rather than display it as a table, we can plot the above data by stacking the values of each sex as follows:\n",
+ ">>\n",
+ ">> ```\n",
+ ">> by_site_sex = surveys_df.groupby(['plot_id', 'sex'])\n",
+ ">> site_sex_count = by_site_sex['weight'].sum()\n",
+ ">> spc = site_sex_count.unstack()\n",
+ ">> s_plot = spc.plot(kind='bar', stacked=True, title=\"Total weight by site and sex\")\n",
+ ">> s_plot.set_ylabel(\"Weight\")\n",
+ ">> s_plot.set_xlabel(\"Plot\")\n",
+ ">> ```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fa3cc636-053b-4b92-9ed3-be349d4d1285",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9a864b9b-918d-4d08-98b6-7a140ce679ca",
+ "metadata": {},
+ "source": [
+ "## References\n",
+ "\n",
+ "- ernst: http://www.esapubs.org/archive/ecol/E090/118/default.htm\n",
+ "- figshare-ndownloader: https://ndownloader.figshare.com/files/2292172\n",
+ "- os-lib: https://docs.python.org/3/library/os.html\n",
+ "- matplotlib: https://matplotlib.org\n",
+ "- numpy: https://www.numpy.org/\n",
+ "- pandas: https://pandas.pydata.org\n",
+ "- pandas-plot: http://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html#basic-plotting-plot\n",
+ "- pd-dataframe: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html\n",
+ "- pptd: https://figshare.com/articles/Portal_Project_Teaching_Database/1314459\n",
+ "- python-datastructures: https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences\n",
+ "- spreadsheet-lesson5: http://www.datacarpentry.org/spreadsheet-ecology-lesson/05-exporting-data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9e4c4a30-de40-4ffd-befd-9b8dda7da968",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7c6d9e59-e673-4ae3-9ca9-0a1f5b6b7509",
+ "metadata": {},
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 + Jaspy",
+ "language": "python",
+ "name": "jaspy"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.5"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/python-data/solutions/ex06_pandas_rainfall.ipynb b/python-data/solutions/ex06_pandas_rainfall.ipynb
new file mode 100644
index 0000000..b5e9192
--- /dev/null
+++ b/python-data/solutions/ex06_pandas_rainfall.ipynb
@@ -0,0 +1,1009 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "af6d2be1-db21-4ad7-9c12-ebf4ec6b79e1",
+ "metadata": {},
+ "source": [
+ "# Pandas to read CSV data\n",
+ "\n",
+ "Let's see Pandas in action, to understand some of its power and utility..."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1be9623f-bdbb-402f-8d74-7a8ef03ac2bd",
+ "metadata": {},
+ "source": [
+ "## If we just want the first 6 rows, just use readline()\n",
+ "\n",
+ "We know there are 6 header lines, we can get those using Python's `open()` and `f.readline()`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "d179c1f2-9730-492b-b6fe-9c6fddb33c96",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Item: UK Rainfall (mm)\n",
+ "Item: Areal series, starting from 1910\n",
+ "Item: Allowances have been made for topographic, coastal and urban effects where relationships are found to exist.\n",
+ "Item: Seasons: Winter=Dec-Feb, Spring=Mar-May, Summer=June-Aug, Autumn=Sept-Nov. (Winter: Year refers to Jan/Feb).\n",
+ "Item: Values are ranked and displayed to 1 dp. Where values are equal, rankings are based in order of year descending.\n",
+ "Item: Data are provisional from December 2014 & Winter 2015. Last updated 07/04/2015\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Set the path and read metadata\n",
+ "fpath = \"../example_data/uk_rainfall.txt\"\n",
+ "with open(fpath) as f:\n",
+ " metadata = [f.readline().strip() for i in range(6)]\n",
+ " \n",
+ "for item in metadata:\n",
+ " print(\"Item:\", item)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3294a316-66fc-4234-a226-a2ba4bf72480",
+ "metadata": {},
+ "source": [
+ "## Now let's see what Pandas can do to read the actual tabular data\n",
+ "\n",
+ "Pandas can read many formats, and stores data very efficiently. In this case we use:\n",
+ "\n",
+ "`pandas.read_csv()`\n",
+ "\n",
+ "See docs: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html\n",
+ "\n",
+ "In one call, we tell it to:\n",
+ "- read from file `fpath`\n",
+ "- skip the first 6 rows of the header (captured above)\n",
+ "- use a regular expression to split the fields (i.e. `\"\\s+\"` which means split on white space\n",
+ "- use the first column (Year) as the index\n",
+ "- values specified as `\"---\"` should be treated as missing values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "3f9a4009-7ae6-450c-8a8c-ceee826d55f5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Read it in one line with Pandas!\n",
+ "\n",
+ "import pandas as pd\n",
+ "df = pd.read_csv(fpath, skiprows=6, sep=\"\\s+\",\n",
+ "\t\t\t index_col=0, na_values=\"---\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c1a49fbb-b387-4b6c-86de-95d84149cefb",
+ "metadata": {},
+ "source": [
+ "View the data as the DataFrame `df`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "65d6658e-1cf3-452a-be82-b3d16d1fd8f3",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "