Etherpad text.txt

Welcome to AARNet's 'Introduction to Jupyter Notebooks' Etherpad!

This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.

This etherpad is from etherpad.wikimedia.org. Please keep in mind all current as well as past content in any pad is public.

All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/
 ____________________________________________________________________________________________________________

Sign in: Name, Institution, Email, Twitter (optional)
Please sign in below:
________________________________________________________________________________________________________
  
I. Welcome

Instructors introduce themselves: Name, work, and aims for the day.

_____________________________________________________________________________________________________________l

II. Introductions

Information for Today’s Learners
Add your name to the Etherpad above
Introduce yourselves! In your introduction, (a) explain your work in 3 words and (b) say something that happened to you or your saw on the way to the workshop (either this morning or in your travels to the workshop).
_____________________________________________________________________________________________________________

III. A Brief Overview of AARNet
www.aarnet.edu.au

_____________________________________________________________________________________________________________

IV. Introduction to Jupyter Notebooks - Workshop Overview

This workshop will introduce you to Jupyter Notebooks. You will learn what they are, what they do and why you might like to use them. It is an introductory set of lessons for those who are brand new, have little or no knowledge of coding and computational methods in research. By the end of the workshop you will have a good understanding of what notebooks can do, how to open one up, perform some basic tasks and save it for later. If you are really into it, you will also be able to continue to experiment after the workshop by using other people's notebooks as springboards for your own adventures!

EPISODE 1 - 90 mins
Introduction and jargon-busting
What do Jupyter Notebooks do?
Why use Jupyter Notebooks?
How do Jupyter Notebooks work?
How to open a Jupyter Notebook - hands on 10
Introduction to Markdown - hands on 15

30 MIN BREAK

EPISODE 2 - 90 mins
Working in Jupyter Notebooks in R - hands on 30
Working in Jupyter Notebooks in Python - hands on 30
Using Jupyter Notebooks in the cloud 
How to choose the right notebook for you
Wrapping up

_____________________________________________________________________________________________________________

V. Introduction to Jupyter Notebooks - Episode 1A - 15 mins

Introduction

Computational notebooks have been around since the late 1980s. Essentially a notebook is an advanced word processor. Also known as a notebook interface, the concept is that it is a virtual notebook environment used for literate programming. 
'Literate programming' pairs the functionality of word processing software with both the shell and kernel of that notebook's programming language.
Notebooks are documents that contain both code and rich text elements, such as links, equations and different ways of visualising data via graphs, tables and figures. 
Because of the mix of code and text, notebooks are an ideal place to bring together results and an analysis description. 
Notebooks are really smart documents - they can be executed to perform the data analysis in real time.

Jupyter Notebooks

Jupyter is named after three computer programming languages - Julia, Python and R. 
It is a free, open-source, interactive web tool which researchers use so they can combine software code, computational output, explanatory text and multimedia resources in a single document.
Jupyter has exploded in popularity over the past couple of years and now supports more languages and is being used by more and more people from different disciplines.

Jargon Busting

This exercise is an opportunity to begin to ask questions and to get a firmer grasp on the concepts around data, code or software development in libraries.

Activity

In pairs, talk about the language used in the introductions. Are you familiar with these terms? What are the words that trip you up? Think of a way to remember what that word means in this context that might help others understand it better. How could you re-write some of the introductory text above to make it easier to understand? 

Add your definitions to some of the terms we'll be using in today's workshop here, and remember to keep adding them as we go. This will be a useful resource for us all later!

Computational notebook 
Literate programming
Code
Rich text
Open-source
Computational output
Documentation
Code
Cell
Kernel
Markdown
Command line
Vector
Array 
_____________________________________________________________________________________________________________

V. Introduction to Jupyter Notebooks - Episode 1B - 10 mins

What do Jupyter Notebooks do?

Jupyter Notebooks offer a hybrid environment in which you can perform computational tasks while also using text to annotate or describe what you and your code blocks are doing. It's a like a mix between the command line and a word processor.
What can Jupyter Notebooks do?

“The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.” –description from Project Jupyter

Data cleaning

Data cleaning is about finding and correcting (or removing) inaccuracies from a dataset, a table, or a database. The process involves identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting them.
In Jupyter Notebooks you can preview and analyse a limited number of columns and rows of data at a time, so you can see if there are blanks or repeated errors or inaccuracies. In addition, working directly with a large dataset without having to download it can save a lot of time.

Data transformation

Data transformation is the process of converting data values from one source format or structure to another so they become consistent or intelligible to a target structure or system. A typical scenario where information needs to be shared involves the extraction of the data from the source application or data custodian, the transformation of that data into another format, and finally loading the transformed data into the target location.

Numerical simulation

Numerical simulation is when you use maths to create models, essentially computer programs that are designed to simulate what might or what did happen in a situation.  By using numerical analysis you can approximate the real solution of the problem.

Statistical modeling

A statistical model can be thought of as a statistical assumption (or set of statistical assumptions) with a certain property: that the assumption allows us to calculate the probability of any event. Purposes of statistical models can be for prediction, estimation or description.

Visualisation

One of the really powerful attributes of Jupyter Notebooks is that of visualisation. In notebooks you can create graphs, tables, plots, heatmaps, charts, mathematical equations and so on. These tools are very helpful for exploration as well as demonstration.

A Word on Languages (teehee)

Jupyter Notebooks can be used with a variety of different programming languages. Initially they were for Julia, Python and R but now they support many more. If you don't know any languages, it might be helpful to think about the types of tasks you want to perform. Python is currently the most popular language used in Jupyter Notebooks but you should also consider what is commonly used in your field.

Python and R

The two most popular programming tools for data science work are Python and R at the moment. While Python is often praised for being a general-purpose language with an easy-to-understand syntax, R's functionality is developed with statisticians in mind. It is said that Python is simple and easy to understand and learn, but Python doesn’t have specialized packages for statistical computing, unlike R. 

Activity

In pairs, talk about which programming language might make the most sense for you and why. When you are ready, think about how you would recommend Python or R to someone else in your field. Would you consider using both? Share this with the group.

https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis
_____________________________________________________________________________________________________________

VI. Introduction to Jupyter Notebooks - Episode 1C - 10 mins

Why use Jupyter Notebooks?

Even if you think you don't use computational methods, if you use Excel or even advanced search terms in a library catalogue or on Google, you are already doing it!

Jupyter Notebooks help you to perform some tasks really quickly. 
They are great for exploration in data analysis, presenting results, and sharing ideas.
You can experiment and work on large datasets without having to download them. 
Jupyter Notebooks are also great at performing rapid visualisations that you can test out, change and share easily. 
They are also freely available and you can use them in a normal browser (no license fee!).

Jupyter Notebooks offer a way to experiment with data processing without having to be a programmer. You can learn from others’ efforts and understand their data and research processes. Because you work in code blocks (not whole scripts) they help you learn how to code just enough for you to do what you need to do. 

Learn how to code and experiment with data processing.
Interactive, provides immediate feedback.
Work in code blocks (not whole scripts).
Learn from others’ efforts and understand their data and research processes.
Test out calculations and visualisations that highlight important data points.

The notebook environment lets you test out calculations and visualisations that highlight important data points in a way that is immediate and easy to understand. 
Notebooks permit a quick set of steps: you can document and run code then look at code outcome, e.g. equations or visualisations, all in one place.

Importantly, they also help you keep track of your methods so you have a record of how you performed an analysis and came up with a conclusion. They are interactive and provide instant feedback, which is helpful for those just starting out.

What are Jupyter Notebooks used for?

Notebooks are being used in an ever-increasing number of domains, by a large range of researchers. Currently the main fields using Jupyter Notebooks are the following:
    
Programming and Computer Science
Statistics, Machine Learning and Data Science
Mathematics, Physics, Chemistry, Biology
Earth Science and Geo-Spatial data
Linguistics and Text Mining
Signal Processing
Engineering Education

Activity

More humanities and social science researchers are adopting Jupyter Notebooks as part of their research practice. Discuss with your partner how Jupyter Notebooks might be useful in different fields. eg: Linguistics and Text Mining  Workshop on text analysis by Neal Caren at https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks#linguistics-and-text-mining

_____________________________________________________________________________________________________________

VII. Introduction to Jupyter Notebooks - Episode 1D - 20 mins 

How do Jupyter Notebooks work?

Jupyter Notebooks don't need much to get going. They are editable and viewable in a web browser. You can also run them on a local machine with no internet or a remote machine with internet. They are very flexible and free!

Jupyter Notebooks use a “kernel”, which is kind of like an interpreter. This is what turns a programming language into instructions the computer understands so it can do the work. In regular computers a kernel connects the application software to the computer hardware. In the case of Jupyter Notebooks, this application permits displaying, editing and running program commands via a web browser.
Different kernels can be installed for different types and versions of programming languages. The kernel in the notebook is a program that runs code written in a specific programming language.
Notebooks use blocks of code to perform computational processes resulting in outputs, or results.
In this workshop we will look at the two most used languages in data analysis, Python and R.

What makes them different to other applications?

Notebooks can run and store code and output with “markdown” notes.

Let's break that down:
    
Code
"Running code" means making the computer do what you are telling it to do. "Executing code" is the same thing.
Output
In Jupyter Notebooks "output" is the result of the computational process, such as a visualisation, graph, model, equation and so on.
Markdown
"Markdown" is the material you want to include that isn't code. It's just writing - "markdown" is the name of the language used herefor what you do to turn plain text into formatted text so you can add headings, italics, quotes and other types of styling. It might be a description, a note, a question. These do not interact with the code, but are very useful in helping you understand the steps in your process and what you are trying to achieve.

Jupyter notebooks are a series of “cells” containing executable code, or markdown and outputs.
Cells might contain code executed (through the kernel) or markdown formatted text (including LaTeX) to embed the description of the work process next to the code.

How are they different to the command line?

The command line does not include notes. In Jupyter Notebooks you can also go back and delete or change code or text as you go, which you cannot do using the command line. Notebooks present markdown and visualisations inline - meaning you can see the both at the same time and the parts that aren't code do not interfere with the code. It results in a highly flexibly but user-friendly environment that can perform complicated tasks very quickly.

What is the file type?

Jupyter Notebooks are saved as a JSON (JavaScript Object Notation) file with an .ipynb extension.

Short summary

A notebook can either run on your desktop with no internet or on a remote server via the internet
A notebook requires a kernel (computational engine) to execute code e.g. Python or R
A notebook runs and stores the code and output, with markdown notes
A notebook is an editable document with input and output cells

Activity

In small groups, take a look at an example of a Jupyter Notebook in GitHub. Start here: https://github.com/ingridbmason/Intro-to-Jupyter/blob/master/AARNet_Intro_Jupyter.ipynb
See if you can identify the cells, what is input and what is output, and what is markdown. Discuss the types of output.
Examine the code. Different colours are used. Have you seen that before? Why do you think different colours are used?
If you have seen or used the command line before, can you think of any reasons why Notebooks might be easier to use? Discuss your ideas and experiences with the group. If you haven't used the command line before, have a think about why notebooks could be less daunting for beginners.

_____________________________________________________________________________________________________________
VIII. Introduction to Jupyter Notebooks - Episode 1E - 5 mins

How to open a Jupyter Notebook

Follow these step-by-step instructions to get started with Jupyter Notebooks in CloudStor:

LOG IN TO CLOUDSTOR

1. Open AARNet website: https://www.aarnet.edu.au/
2. Click on 'Log In and Tools' in the top righthand corner of the page.
3. Select 'CloudStor'.
4. Choose your organisation and click on 'Login at AARNet'.
5. Sign-in with your credentials - user name and password - and click 'Login'.  

You are now in CloudStor, which is a cloud storage environment.

CREATE A NOTEBOOK

1. At the top of the page there is a black banner that shows several icons. Double-click on the swan.
2. From the 'Wecome to SWAN' (service for web-based analysis), click on 'Go to my Notebooks'.
3. You will notice here that you can see 'Spawning new notebook' come up on the screen. This means that a notebook is being created. This can take a minute or so.
4. When the next screen comes up you will see a menu for files. On the right hand side there is a button called 'New', with a triangle next to it. If you click on this you will see a dropdown menu.
5. Underneath the heading 'Notebook' you will see a list of computer languages. Click on on 'R'.
6. Select 'File' at the top left hand side of the screen and select 'Save As'. Name your notebook 'Intro to Jupyter Notebooks'.

...
If you don't have access to CloudStor, follow these instructions:
    
Open up MyBinder: https://mybinder.org/
Paste GitHub Repo: https://github.com/ingridbmason/Intro-to-Jupyter/
Open your new notebook, select Python 3 and save. (The free version of MyBinder does not support R - please be patient while we do that bit).

There are many different ways you can access Jupyter Notebooks, such as MyBinder.org or via Anaconda - we will talk about these options at the end of the workshop.

FEATURES OF THE NOTEBOOK

Take a good look around the dashboard. You can see there is a menu bar showing some titles that might be recognisable, like the 'File' menu we used before. 

Click on each of these to see what is in the menu. Make sure you click on the 'Help' function to see what kind of options there are when you hit a problem.

Underneath the menu bar there are some buttons that you can use to perform certain tasks, such as saving your notebook, adding a cell, deleting a cell, running a code cell and so on. Hover your mouse over each of these to see what these buttons do.

_____________________________________________________________________________________________________________

IX. Introduction to Jupyter Notebooks - Episode 1F - 10 mins

Introduction to Markdown

We talked about Markdown earlier in the workshop. Markdown is a lightweight markup language with plain text formatting syntax. An example of a markup language is HTML. In Jupyter Notebooks you use it to create the text you want to accompany your analyses. Remember that Markdown is for writing down comments outside of the code cells, so you can describe what you are doing as you go.

Let's get hands on with Markdown

Let's now start with some basic markdown. Remember that [markdown](https://en.wikipedia.org/wiki/Markdown) is how you can make rich (or formatted) text in a plain text editor.

In Jupyter Notebooks the first thing you need to do is select the role of the cell you are typing into. We are going to select 'Markdown' from the dropdown menu on the righthand side of the row of buttons showing the various icons (save, cut, copy etc). 

Headings

Let's start with a heading. To create a heading in Markdown you use a hash and a space before the words in the heading:

 - Type

# Introduction to Juypter Notebooks

into the cell, making sure you have selected 'Markdown' from the dropdown menu above where it shows 'Code' as the default. 

Already here you can see how notebooks are flexible, as you can choose what kind of cell you are writing in (and toggle it at any time!)
  
  - Click on 'Run' - the button with the triangle next to a vertical line (it looks  like a 'play' icon), or use the shortcut Shift+Enter to execute the cell.
   
  - You have just created a heading in your notebook! Hooray!
 
 Now let's add a subheading. This time you use two hashtags before the words in your subheading.
 
  - Type
  
  ## A lesson in Markdown

  - Click on 'Run' - the button with the triangle next to a vertical line (it looks  like a 'play' icon), or use the shortcut Shift+Enter to execute the cell. You now have a subheading.
  
  Body text
 
 To write in your notebook in normal body text, you just have to type your text in the Markdown cell and press 'Run' or use the shortcut Shift+Enter.
 
  - Type
  
  This is my first lesson in Markdown. 

 - Click on 'Run' or use the shortcut Shift+Enter.
 
  You can now type your comments in your Jupyter Notebook.
  
  Editing a cell
  
 Let's say we want to add some text to the cell you executed above. Double-click on that line and you can open up the cell again.
  
   -  After the first sentence, type
   
I'm doing really well! 
  
 If you want to add a new cell you can click on the 'up arrow' icon from the buttons above.  To delete or edit a cell, you can toggle up and down the cells.
 
 Adding a new cell
 
 Let's add a new cell. Under your subheading, you can add another heading. Go to your subheading 'A lesson in Markdown' and click on the 'plus' button. This will create a new cell. Select 'Markdown' from the drop down menu.
 
 - Type
 
### Use it to create rich text in a plain text editor 
 
 - Press 'Run' or use the shortcut Shift+Enter.
 
 You now have a level three heading.

Bold

Now let's try bold font. In a new cell, select 'Markdown' from the dropdown menu again.

 - Type
  
 This is **really** interesting.
  
   
  - Click on 'Run' or use the  shortcut Shift+Enter to execute the cell.
  - Voila! Bold!
 
 Italics

Now let's try italics. In a new cell, select 'Markdown' from the dropdown menu again.

 - Type
  
 This is really _interesting_.
  
   
  - Click on 'Run' or use the  shortcut Shift+Enter to execute the cell.
  - Voila! Italics!
 
Activity

Spend a couple of minutes practicing these skills: Headings, plain/bold/italics text, adding, removing and editing cells.
 
It can feel a little strange, as you already know how to do formatting in programs like Word. However, what we are doing here is 'speaking' directly to the computer, with a different kind of interface so you can also perform calcuations, visualisations and use computational methods. Remember that the reason Jupyter Notebooks is becoming so popular is because it is a format that allows for commenting and text to sit within the same 'document' as code, mathematical equations and visualisations. You can tell the story of what you are doing as you go, and this is a really useful way of being about to reproduce your results. 

If you want to know more about markdown, take a look at these pages: 

https://guides.github.com/features/mastering-markdown/
https://www.firstpythonnotebook.org/markdown/

_____________________________________________________________________________________________________________

TAKE A 30 MIN BREAK
_____________________________________________________________________________________________________________

IX. Introduction to Jupyter Notebooks - Episode 2A - 45 mins

Working in Jupyter Notebooks with R

Now we are going to start using the code cells. We selected the kernel for R when we opened the notebook. This means that the code we write using R can be run in the notebook. It's important to know here that you do not need to be a programmer to use Jupyter Notebooks. It is absolutely fine to know just a little bit - what you need - to get to do the tasks you want to do. Lots of people find this is a great way to start, and as you find better and faster ways of doing things you will gain the motivation to learn more about the language. But for now, what we are doing is showing you a couple of commands that can help you automate certain tasks, and all you have to do is copy and paste!

Add a new cell using the 'plus' icon. This time we can leave it as a default 'code' cell. 

If you want to add a comment within the code cell, you can do this if you like. Just place a hash in front of the comment.

 - Type
 
 # For comments inside the code cell use a hash

  - Click on 'Run' or use the  shortcut Shift+Enter to execute the cell. You can see that the text remains in the cell, which shows 'In' and the number of the line next to it. This helps you see that it is a code cell, not a Markdown cell. 
  
Sequence
  
  Now let's create a sequence of numbers, or integers:
      
 - In a new cell, type
 
 1:19

  - Click on 'Run' or use the  shortcut Shift+Enter to execute the cell. See how quickly you can create a sequence, automating a task that you might otherwise have to do manually? 
   
Sum  

This next bit of code adds up that sequence. 

 - In a new cell, type
 
 sum (1:19)

  - Click on 'Run' or use the  shortcut Shift+Enter to execute the cell. This command has added up all of the numbers in the sequence. 
  
Vector
 
 This sounds  'mathsy' and can go in the 'jargon' type of list that we looked at initially. Sometimes you will come across some terms that look unfamiliar or remind you of a bad experience in a high school maths class! Fear not! We are here to help.
 
A vector is a sequence of data elements of the same basic type.  A vector in programming is a type of array for storing and structuring data.  Here we are going to assign to 'x' the combination of four different components (3, 5, 8, and 9). This creates then a shortcut for any manipulation of that sequence of elements. 

 - In a new cell, type
 
  x = c(3, 5, 8, 9)
  sum(x)
  
 - Click on 'Run' or use the  shortcut Shift+Enter to execute the cell. This command has added up all of the components of the array. 
    
You can create vectors using different kinds of data. Let's try some text and see how we can isolate one component of the array:
    
 - In a new cell, type
 
  y = c("Jack", "Queen", "King")
  y [1]
  
   - Click on 'Run' or use the  shortcut Shift+Enter to execute the cell. This command shows you just the first component of the array.
   
 Matrix
 
 A matrix is an array of arrays, made up of collections of the same data types.  A matrix in R is like a mathematical matrix, containing all the same type of thing. Put really simply, it's like data in a table, with rows and columns but all columns in a matrix must have the same data type. . 
 
  - In a new cell, type
 
matrix(y,2,3,byrow=T)
  
   - Click on 'Run' or use the  shortcut Shift+Enter to execute the cell. This command shows you all of the components of the array as a matrix. The '2' is the number of rows. The '3' is the number of columns.
 
   
Activity

Create your own sequence, changing the numbers so you can see how you can create long sequences of numbers.
2. Use the sum command on the sequence you created.
3. Create a new vector using numbers, and use the sum command to calculate the sum of all components.
4. Create a new vector using text, and use square brackets to to select any single element of the list (selecting different positions in that list). 
5. Have a go at manipulating the matrix data display. Can you change it to one column and three rows? Three columns and ten rows?


Dataframe

A dataframe combines features of matrices and lists. All columns in a matrix must have the same data type (numeric, character, etc.). A data frame is more general than a matrix, in that different columns can have different types of data (numeric, character, factor, etc.). Just like a table in a database or excel sheet.  

Let's create a dataframe, with column headings.  

 - In a new cell, type

employee <- c("Juanita Lopez", "Peter Gynn", "Jolie Talofa")
salary <- c(81000, 83400, 96800)
startdate <- as.Date(c("2010-11-1", "2008-3-25", "2007-3-14"))

In these commands we are assigning the data to a heading.
Continue in the same cell. Type

employ.data <- data.frame(employee, salary, startdate)
employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = FALSE)
employ.data

In these commands we are assigning each of the data groups to the main employee data table. 

   - Click on 'Run' or use the  shortcut Shift+Enter to execute the cell. This group of commands shows you all of the components of the array as a table with headings for each column. The '2' is the number of rows. You have also assigned the dates a machine readable date format. 


Activity

Add a new employee, salary and start date to this dataframe.

** The difference between a matrix and a dataframe can be hard to understand - this might help (might not!) https://www.quora.com/What-is-the-difference-between-a-matrix-and-a-dataframe-in-R

_____________________________________________________________________________________________________________

VIII. Introduction to Jupyter Notebooks - Episode 2B - 45 mins

Working with Jupyter Notebooks in Python

Now let's try using Python for some of the things we did in R. The first thing we need to do is change the kernel. Click on the 'Kernel' menu from the menu bar at the top of the page. Select 'change kernel' and click on 'Python 3'. Watch the top right hand corner of the screen to see it working on changing the kernel. When it changes to 'Trusted' we're ready to go.

In this part of the workshop we'll be having a go at using Python, doing some of the same things we did in R, though I'll also be introducing a couple of new concepts as we go, because the two programming languages work differently. 

Sequence

Let's start with creating a sequence of numbers again.

In a new cell, select 'code'. Remember that the code cell looks different to the markdown cell. How can you tell?

- Type the following inside the cell:

list(range(1, 20))

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

You have just created a list of the range of numbers from 1 to 20. Woohoo!

Value

 - In the next cell type the following:

     a = list(range(1, 20))

This command tells the computer that the list of numbers you created can now be called 'a'. This is called a 'value'.

Sum

- In the SAME cell type the following underneath:

x = sum(a)

This second instruction tells the computer to add each of those numbers in the list ('a') together and give that total a value of 'x'.

Print

- In the SAME cell type the following underneath:
    
print(x)

-  Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

This last instruction tells the computer to print the total of the list of numbers on the screen.

You just did some computing! Hooray! 

Mindbender: With these three instructions you have performed a 'sequence'. In computing, this is a list of instructions to be carried out in order and forms one of the backbones of programming. It is different to the sequence of numbers we created above. 

Activity

Take a few minutes to change the range of numbers, and/or change the values and see what happens. Have a bit of fun with it - see if you can beat the computer with your lightning speed mental arithmetic skills. Or just be amazed at how fast it can be.

Creating a list
 
Lists are ordered sequences of elements, and values can be repeated. 

- In your 'code' cell, write the following series of commands:
    
arr = ["Jack", "Queen", "King"]
print(arr[0])
print(arr[1])
print(arr[2])

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

Using the command 'arr=' gives the content of the square brackets the value of an 'array'. Each thing in the list has then an automatically appointed number, from its order in the list. The first thing, the word "Jack" is position 0, the second, "Queen" is position 1, and so on. 
Those numbers are then what you use to perform your computation or visualisation, as you see when you use the command 'print' to show each one as a printed output on the screen. What you can see is that you have assigned each word a number (also known as a key, or array index.

Activity

What would you do to print the items in the list in a different order? (HINT: There are more ways than one!)

Creating a dictionary or set

Now let's create a dictionary or set, using data of different types. In this table we'll use text, number and date data types. 

Sets are collections of unique elements and you cannot order them. Lists are ordered sequences of elements, and values can be repeated. 

Curly braces are used in Python to define a dictionary. A dictionary is a data structure that maps one value to another - kind of like how an English dictionary maps a word to its definition.

- In your 'code' cell, write the following series of commands:

d = {'employee': 'Juanita Lopez','salary':81000, 'startdate': '2010-11-1'}
e = {'employee': 'Peter Gynn','salary':83400, 'startdate': '2008-3-25'}
f = {'employee': 'Jolie Talofa','salary':96800, 'startdate': '2007-3-14'}
print (d)
print (e)
print (f)

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

Here we have assigned these three different rows (or dictionaries) a value of either 'd', 'e', or 'f'. Using the print command you can print them on the screen.

Dictionaries map keys to values, and the keys must be unique. This and other restrictions help Python keep track of them efficiently and know they are and that they remain unique.

In Python, the key is the term used before the colon and the value is the term used after it. The quote mark encapsulates the whole term, the comma separates them. The curly braces hold the whole 'dictionary'.

- In your 'code' cell, write the following series of commands:
    
d.keys()
d.values()
d.items()

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

This set of commands has now created a list of dictionary items as values. The 'd.' prefix refers to the dictionary we called 'd' above.

- In your 'code' cell, write the following series of commands:
    
for k,v in d.items():
    print (k, v)

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

The tab key is important here. In this case we are showing the keys and values in the dictionary called 'd'.

- In your 'code' cell, write the following series of commands:

for k,v in e.items():
    print(k, v)

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

In this case we are showing the keys and values in the dictionary called 'd'.

Remember: Curly braces create dictionaries or sets. Square brackets create lists.

Activity

Print the dictionary called 'f' using the 'for' command. 
Create a new dictionary to add to the employee dataset.

A new dictionary

Dictionaries can be contained in lists and vice versa. A list is an ordered sequence of objects, whereas dictionaries are unordered sets. But the main difference is that items in dictionaries are accessed via keys and not via their position.

More theoretically, we can say that dictionaries are the Python implementation of an abstract data type, known in computer science as an associative array.

Associative arrays consist - like dictionaries of (key, value) pairs, such that each possible key appears at most once in the collection. Any key of the dictionary is associated (or mapped) to a value. The values of a dictionary can be any Python data type.

Let's make an English-German dictionary:

- In your 'code' cell, write the following series of commands:
    
en_de = {"red" : "rot", "blue" : "blau", "yellow" : "gelb"}
print (en_de)

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

You now have the beginnings of a list of colours in both languages. Let's see if we can make it work:

- In your 'code' cell, write the following series of commands:
    
print (en_de["red"])

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

Hooray! You just translated 'red' into German.

Now let's add some French.

- In your 'code' cell, write the following series of commands:
    
de_fr = {"rot" : "rouge", "blau" : "bleu", "gelb" : "jaune"}
print ("The French word for red is: " + de_fr[en_de["red"]])

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

By creating a dictionary structure you can now go to French via German.

Activity

See if you can translate from French to English, and German to French.
Expand on this dictionary.

_____________________________________________________________________________________________________________

EXTENSION ACTIVITIES FOR THOSE WHO ARE MORE ADVANCED:
  
1. Using data in CloudStor

- In a code cell type 

import pandas

- Execute the cell.

- In a new code cell type

pandas.read_csv ("")

and place the public link to the data saved in CloudStor between the quotes:  https://cloudstor.aarnet.edu.au/plus/s/x2uHIEZubsNuqEh/download

Upload your own data set and do it again

2. Using data from Google Sheets

Using Google Sheets, you can quickly import a dataset into your Jupyter Notebook.

Take a look at the GIF attached to this Tweet: https://twitter.com/choldgraf/status/1141436794359046144?s=12 

You can create a basic dataset using Google Sheets and use the 'pandas' command to import the data into Jupyter Notebooks. REMEMBER: The data used in this way will be made public via the public link. Not for sensitive data!

Here's the code to copy, with a link to a Fortune 500 dataset we prepared earlier :)
    
- In a code cell, type:

import pandas

- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.

- In the next code cell, type: 
    
    pandas.read_csv ("")
    
    And then, in between the quote marks, paste this link: https://docs.google.com/spreadsheets/d/e/2PACX-1vQctQqQu1baZQJfhV333sEcjnkmvnRFtCGF0HVfoV3WnSmeDhhFneZ7bYtaxe3xFeMS9-pmzk83AuR4/pub?output=csv (this is what is known as a 'token')

You can continue to work with this data set, following along with the tutorial here: https://www.dataquest.io/blog/jupyter-notebook-tutorial/


2. Scraping data from Wikipedia

https://github.com/mboudour/var/blob/master/Boudourides_ScrapingWebPageTablesForBipartiteGraphs.ipynb

3. CloudStor access via WebDAV by Tim Sherrat

CloudStor is data storage service provided by AARNet. Individual researchers in AARNet connected institutions get 100gb of storage space for free, and research projects can apply for additional space.

CloudStor is an instance of OwnCloud, and OwnCloud provides WebDAV access, so I thought I'd have a go at using WebDAV to access file data on CloudStor.

It works, but there are a few tricks: https://nbviewer.jupyter.org/github/wragge/sydney-stock-exchange/blob/master/notebooks/Cloudstor-access-via-WebDAV.ipynb

More on publicly shared data: https://nbviewer.jupyter.org/github/wragge/sydney-stock-exchange/blob/master/notebooks/Cloudstor-access-to-a-public-share-via-WebDAV.ipynb

_____________________________________________________________________________________________________________

VIII. Introduction to Jupyter Notebooks - Episode 2C - 15 mins

Jupyter Notebooks in the researcher's toolkit

Top three data science/analytics tools, technologies and languages used in the past year:
    
Python 60%
R 46%
Jupyter notebooks 32%

The survey included a question for data professionals who were employed, “For work, which data science/analytics tools, technologies, and languages have you used in the past year? (Select all that apply).”  2017 survey by Kaggle of 16,000 data professionals.   

Designed to make data analysis easier to share and reproduce
Used increasingly by researchers who want to keep detailed records of their work
Devise teaching modules and collaborate with colleagues 
Researchers are publishing the notebooks to back up their research papers
Using Jupyter notebooks as a new form of interactive research publishing

Example of Jupyter Notebooks in the field:

OzGLAM Data Workbench - Dr Tim Sherratt (University of Canberra) 
https://github.com/GLAM-Workbench/ozglam-workbench
https://github.com/wragge/ozglam-workbench/blob/master/1-Introduction-and-table-of-contents.ipynb

Stored in GitHub
Viewable in nbviewer in Jupyter.org
Rendered with MyBinder 

Something extra: Jupyter Notebooks and teaching: https://jupyter4edu.github.io/jupyter-edu-book/ 

Using Jupyter Notebooks in the cloud 

One of the benefits of using Jupyter notebooks is that you can run them in the cloud, without having to use anything other than your browser. This lesson will review six services you can use to run your notebook in the cloud.

Services available

* [Binder](https://mybinder.org/)
* [Kaggle Kernels](https://www.kaggle.com/kernels)
* [Google Colaboratory (Colab)](https://colab.research.google.com)
* [Microsoft Azure Notebooks](https://notebooks.azure.com/)
* [CoCalc](https://cocalc.com/doc/jupyter-notebook.html)
* [Datalore](https://datalore.io/)

Benefits across all services

*        No need to install anything on your local machine
*        Free (or free plan)
*        Access to Jupyter Notebook environment (or Jupyter-like environment)
*        Ability to import and export notebooks using the standard .ipynb file format
*        Support Python language (and most support other languages)

Comparisons

* (https://docs.google.com/spreadsheets/d/12thaaXg1Idr3iWST8QyASNDs08sjdPd6m9mbCGtHFn0/edit#gid=1505836451)

Activity

Create a Jupyter Notebook then export it to a different platform. If you don't have your own notebook, find one you are interested in on GitHub then import it to one of the services described above. Choose the platform you think is the one you might find the most useful to try out. 

How to choose the right notebook for you

CloudStor (via AARNet) 
Jupyter.org (example notebooks) https://jupyter.org/try
MyBinder (notebooks in GitHub) https://mybinder.org/
Anaconda (desktop app) https://anaconda.org/anaconda/python
CoLaboratory (Google) https://research.google.com/colaboratory/faq.html

There are many options. Think about how you work, whether desktop or cloud is more useful or reliable. Do you work in the field, away from the network, for example? Desktop might be better. Do you want to work with integrated storage so you can use your datasets in the same location. Try your cloud service. Is there data available in an integrated environment, so you can work directly with the data, without having to download large datasets? Here in Australia the Tinker Studio https://app.tinker.edu.au/ offers Jupyter Notebooks along with a collection of datasets. 


Wrapping up

Thanks everyone for coming along on the big Jupyter Notebooks ride! You now know what they are, what they look like and a little bit of what they can do. You also know about CloudStor and where you can keep your research data safe and warm. Working on datasets within CloudStor helps to make your life and research that little bit easier! Tell your friends and go out to see which Jupyter Notebooks communities are out there in your field, or if there isn't one, make one!