-
Notifications
You must be signed in to change notification settings - Fork 2
/
Etherpad text.txt
812 lines (472 loc) · 41.7 KB
/
Etherpad text.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
Welcome to AARNet's 'Introduction to Jupyter Notebooks' Etherpad!
This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.
This etherpad is from etherpad.wikimedia.org. Please keep in mind all current as well as past content in any pad is public.
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/
____________________________________________________________________________________________________________
Sign in: Name, Institution, Email, Twitter (optional)
Please sign in below:
________________________________________________________________________________________________________
I. Welcome
Instructors introduce themselves: Name, work, and aims for the day.
_____________________________________________________________________________________________________________l
II. Introductions
Information for Today’s Learners
Add your name to the Etherpad above
Introduce yourselves! In your introduction, (a) explain your work in 3 words and (b) say something that happened to you or your saw on the way to the workshop (either this morning or in your travels to the workshop).
_____________________________________________________________________________________________________________
III. A Brief Overview of AARNet
www.aarnet.edu.au
_____________________________________________________________________________________________________________
IV. Introduction to Jupyter Notebooks - Workshop Overview
This workshop will introduce you to Jupyter Notebooks. You will learn what they are, what they do and why you might like to use them. It is an introductory set of lessons for those who are brand new, have little or no knowledge of coding and computational methods in research. By the end of the workshop you will have a good understanding of what notebooks can do, how to open one up, perform some basic tasks and save it for later. If you are really into it, you will also be able to continue to experiment after the workshop by using other people's notebooks as springboards for your own adventures!
EPISODE 1 - 90 mins
Introduction and jargon-busting
What do Jupyter Notebooks do?
Why use Jupyter Notebooks?
How do Jupyter Notebooks work?
How to open a Jupyter Notebook - hands on 10
Introduction to Markdown - hands on 15
30 MIN BREAK
EPISODE 2 - 90 mins
Working in Jupyter Notebooks in R - hands on 30
Working in Jupyter Notebooks in Python - hands on 30
Using Jupyter Notebooks in the cloud
How to choose the right notebook for you
Wrapping up
_____________________________________________________________________________________________________________
V. Introduction to Jupyter Notebooks - Episode 1A - 15 mins
Introduction
Computational notebooks have been around since the late 1980s. Essentially a notebook is an advanced word processor. Also known as a notebook interface, the concept is that it is a virtual notebook environment used for literate programming.
'Literate programming' pairs the functionality of word processing software with both the shell and kernel of that notebook's programming language.
Notebooks are documents that contain both code and rich text elements, such as links, equations and different ways of visualising data via graphs, tables and figures.
Because of the mix of code and text, notebooks are an ideal place to bring together results and an analysis description.
Notebooks are really smart documents - they can be executed to perform the data analysis in real time.
Jupyter Notebooks
Jupyter is named after three computer programming languages - Julia, Python and R.
It is a free, open-source, interactive web tool which researchers use so they can combine software code, computational output, explanatory text and multimedia resources in a single document.
Jupyter has exploded in popularity over the past couple of years and now supports more languages and is being used by more and more people from different disciplines.
Jargon Busting
This exercise is an opportunity to begin to ask questions and to get a firmer grasp on the concepts around data, code or software development in libraries.
Activity
In pairs, talk about the language used in the introductions. Are you familiar with these terms? What are the words that trip you up? Think of a way to remember what that word means in this context that might help others understand it better. How could you re-write some of the introductory text above to make it easier to understand?
Add your definitions to some of the terms we'll be using in today's workshop here, and remember to keep adding them as we go. This will be a useful resource for us all later!
Computational notebook
Literate programming
Code
Rich text
Open-source
Computational output
Documentation
Code
Cell
Kernel
Markdown
Command line
Vector
Array
_____________________________________________________________________________________________________________
V. Introduction to Jupyter Notebooks - Episode 1B - 10 mins
What do Jupyter Notebooks do?
Jupyter Notebooks offer a hybrid environment in which you can perform computational tasks while also using text to annotate or describe what you and your code blocks are doing. It's a like a mix between the command line and a word processor.
What can Jupyter Notebooks do?
“The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.” –description from Project Jupyter
Data cleaning
Data cleaning is about finding and correcting (or removing) inaccuracies from a dataset, a table, or a database. The process involves identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting them.
In Jupyter Notebooks you can preview and analyse a limited number of columns and rows of data at a time, so you can see if there are blanks or repeated errors or inaccuracies. In addition, working directly with a large dataset without having to download it can save a lot of time.
Data transformation
Data transformation is the process of converting data values from one source format or structure to another so they become consistent or intelligible to a target structure or system. A typical scenario where information needs to be shared involves the extraction of the data from the source application or data custodian, the transformation of that data into another format, and finally loading the transformed data into the target location.
Numerical simulation
Numerical simulation is when you use maths to create models, essentially computer programs that are designed to simulate what might or what did happen in a situation. By using numerical analysis you can approximate the real solution of the problem.
Statistical modeling
A statistical model can be thought of as a statistical assumption (or set of statistical assumptions) with a certain property: that the assumption allows us to calculate the probability of any event. Purposes of statistical models can be for prediction, estimation or description.
Visualisation
One of the really powerful attributes of Jupyter Notebooks is that of visualisation. In notebooks you can create graphs, tables, plots, heatmaps, charts, mathematical equations and so on. These tools are very helpful for exploration as well as demonstration.
A Word on Languages (teehee)
Jupyter Notebooks can be used with a variety of different programming languages. Initially they were for Julia, Python and R but now they support many more. If you don't know any languages, it might be helpful to think about the types of tasks you want to perform. Python is currently the most popular language used in Jupyter Notebooks but you should also consider what is commonly used in your field.
Python and R
The two most popular programming tools for data science work are Python and R at the moment. While Python is often praised for being a general-purpose language with an easy-to-understand syntax, R's functionality is developed with statisticians in mind. It is said that Python is simple and easy to understand and learn, but Python doesn’t have specialized packages for statistical computing, unlike R.
Activity
In pairs, talk about which programming language might make the most sense for you and why. When you are ready, think about how you would recommend Python or R to someone else in your field. Would you consider using both? Share this with the group.
https://www.datacamp.com/community/tutorials/r-or-python-for-data-analysis
_____________________________________________________________________________________________________________
VI. Introduction to Jupyter Notebooks - Episode 1C - 10 mins
Why use Jupyter Notebooks?
Even if you think you don't use computational methods, if you use Excel or even advanced search terms in a library catalogue or on Google, you are already doing it!
Jupyter Notebooks help you to perform some tasks really quickly.
They are great for exploration in data analysis, presenting results, and sharing ideas.
You can experiment and work on large datasets without having to download them.
Jupyter Notebooks are also great at performing rapid visualisations that you can test out, change and share easily.
They are also freely available and you can use them in a normal browser (no license fee!).
Jupyter Notebooks offer a way to experiment with data processing without having to be a programmer. You can learn from others’ efforts and understand their data and research processes. Because you work in code blocks (not whole scripts) they help you learn how to code just enough for you to do what you need to do.
Learn how to code and experiment with data processing.
Interactive, provides immediate feedback.
Work in code blocks (not whole scripts).
Learn from others’ efforts and understand their data and research processes.
Test out calculations and visualisations that highlight important data points.
The notebook environment lets you test out calculations and visualisations that highlight important data points in a way that is immediate and easy to understand.
Notebooks permit a quick set of steps: you can document and run code then look at code outcome, e.g. equations or visualisations, all in one place.
Importantly, they also help you keep track of your methods so you have a record of how you performed an analysis and came up with a conclusion. They are interactive and provide instant feedback, which is helpful for those just starting out.
What are Jupyter Notebooks used for?
Notebooks are being used in an ever-increasing number of domains, by a large range of researchers. Currently the main fields using Jupyter Notebooks are the following:
Programming and Computer Science
Statistics, Machine Learning and Data Science
Mathematics, Physics, Chemistry, Biology
Earth Science and Geo-Spatial data
Linguistics and Text Mining
Signal Processing
Engineering Education
Activity
More humanities and social science researchers are adopting Jupyter Notebooks as part of their research practice. Discuss with your partner how Jupyter Notebooks might be useful in different fields. eg: Linguistics and Text Mining Workshop on text analysis by Neal Caren at https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks#linguistics-and-text-mining
_____________________________________________________________________________________________________________
VII. Introduction to Jupyter Notebooks - Episode 1D - 20 mins
How do Jupyter Notebooks work?
Jupyter Notebooks don't need much to get going. They are editable and viewable in a web browser. You can also run them on a local machine with no internet or a remote machine with internet. They are very flexible and free!
Jupyter Notebooks use a “kernel”, which is kind of like an interpreter. This is what turns a programming language into instructions the computer understands so it can do the work. In regular computers a kernel connects the application software to the computer hardware. In the case of Jupyter Notebooks, this application permits displaying, editing and running program commands via a web browser.
Different kernels can be installed for different types and versions of programming languages. The kernel in the notebook is a program that runs code written in a specific programming language.
Notebooks use blocks of code to perform computational processes resulting in outputs, or results.
In this workshop we will look at the two most used languages in data analysis, Python and R.
What makes them different to other applications?
Notebooks can run and store code and output with “markdown” notes.
Let's break that down:
Code
"Running code" means making the computer do what you are telling it to do. "Executing code" is the same thing.
Output
In Jupyter Notebooks "output" is the result of the computational process, such as a visualisation, graph, model, equation and so on.
Markdown
"Markdown" is the material you want to include that isn't code. It's just writing - "markdown" is the name of the language used herefor what you do to turn plain text into formatted text so you can add headings, italics, quotes and other types of styling. It might be a description, a note, a question. These do not interact with the code, but are very useful in helping you understand the steps in your process and what you are trying to achieve.
Jupyter notebooks are a series of “cells” containing executable code, or markdown and outputs.
Cells might contain code executed (through the kernel) or markdown formatted text (including LaTeX) to embed the description of the work process next to the code.
How are they different to the command line?
The command line does not include notes. In Jupyter Notebooks you can also go back and delete or change code or text as you go, which you cannot do using the command line. Notebooks present markdown and visualisations inline - meaning you can see the both at the same time and the parts that aren't code do not interfere with the code. It results in a highly flexibly but user-friendly environment that can perform complicated tasks very quickly.
What is the file type?
Jupyter Notebooks are saved as a JSON (JavaScript Object Notation) file with an .ipynb extension.
Short summary
A notebook can either run on your desktop with no internet or on a remote server via the internet
A notebook requires a kernel (computational engine) to execute code e.g. Python or R
A notebook runs and stores the code and output, with markdown notes
A notebook is an editable document with input and output cells
Activity
In small groups, take a look at an example of a Jupyter Notebook in GitHub. Start here: https://github.com/ingridbmason/Intro-to-Jupyter/blob/master/AARNet_Intro_Jupyter.ipynb
See if you can identify the cells, what is input and what is output, and what is markdown. Discuss the types of output.
Examine the code. Different colours are used. Have you seen that before? Why do you think different colours are used?
If you have seen or used the command line before, can you think of any reasons why Notebooks might be easier to use? Discuss your ideas and experiences with the group. If you haven't used the command line before, have a think about why notebooks could be less daunting for beginners.
_____________________________________________________________________________________________________________
VIII. Introduction to Jupyter Notebooks - Episode 1E - 5 mins
How to open a Jupyter Notebook
Follow these step-by-step instructions to get started with Jupyter Notebooks in CloudStor:
LOG IN TO CLOUDSTOR
1. Open AARNet website: https://www.aarnet.edu.au/
2. Click on 'Log In and Tools' in the top righthand corner of the page.
3. Select 'CloudStor'.
4. Choose your organisation and click on 'Login at AARNet'.
5. Sign-in with your credentials - user name and password - and click 'Login'.
You are now in CloudStor, which is a cloud storage environment.
CREATE A NOTEBOOK
1. At the top of the page there is a black banner that shows several icons. Double-click on the swan.
2. From the 'Wecome to SWAN' (service for web-based analysis), click on 'Go to my Notebooks'.
3. You will notice here that you can see 'Spawning new notebook' come up on the screen. This means that a notebook is being created. This can take a minute or so.
4. When the next screen comes up you will see a menu for files. On the right hand side there is a button called 'New', with a triangle next to it. If you click on this you will see a dropdown menu.
5. Underneath the heading 'Notebook' you will see a list of computer languages. Click on on 'R'.
6. Select 'File' at the top left hand side of the screen and select 'Save As'. Name your notebook 'Intro to Jupyter Notebooks'.
...
If you don't have access to CloudStor, follow these instructions:
Open up MyBinder: https://mybinder.org/
Paste GitHub Repo: https://github.com/ingridbmason/Intro-to-Jupyter/
Open your new notebook, select Python 3 and save. (The free version of MyBinder does not support R - please be patient while we do that bit).
There are many different ways you can access Jupyter Notebooks, such as MyBinder.org or via Anaconda - we will talk about these options at the end of the workshop.
FEATURES OF THE NOTEBOOK
Take a good look around the dashboard. You can see there is a menu bar showing some titles that might be recognisable, like the 'File' menu we used before.
Click on each of these to see what is in the menu. Make sure you click on the 'Help' function to see what kind of options there are when you hit a problem.
Underneath the menu bar there are some buttons that you can use to perform certain tasks, such as saving your notebook, adding a cell, deleting a cell, running a code cell and so on. Hover your mouse over each of these to see what these buttons do.
_____________________________________________________________________________________________________________
IX. Introduction to Jupyter Notebooks - Episode 1F - 10 mins
Introduction to Markdown
We talked about Markdown earlier in the workshop. Markdown is a lightweight markup language with plain text formatting syntax. An example of a markup language is HTML. In Jupyter Notebooks you use it to create the text you want to accompany your analyses. Remember that Markdown is for writing down comments outside of the code cells, so you can describe what you are doing as you go.
Let's get hands on with Markdown
Let's now start with some basic markdown. Remember that [markdown](https://en.wikipedia.org/wiki/Markdown) is how you can make rich (or formatted) text in a plain text editor.
In Jupyter Notebooks the first thing you need to do is select the role of the cell you are typing into. We are going to select 'Markdown' from the dropdown menu on the righthand side of the row of buttons showing the various icons (save, cut, copy etc).
Headings
Let's start with a heading. To create a heading in Markdown you use a hash and a space before the words in the heading:
- Type
# Introduction to Juypter Notebooks
into the cell, making sure you have selected 'Markdown' from the dropdown menu above where it shows 'Code' as the default.
Already here you can see how notebooks are flexible, as you can choose what kind of cell you are writing in (and toggle it at any time!)
- Click on 'Run' - the button with the triangle next to a vertical line (it looks like a 'play' icon), or use the shortcut Shift+Enter to execute the cell.
- You have just created a heading in your notebook! Hooray!
Now let's add a subheading. This time you use two hashtags before the words in your subheading.
- Type
## A lesson in Markdown
- Click on 'Run' - the button with the triangle next to a vertical line (it looks like a 'play' icon), or use the shortcut Shift+Enter to execute the cell. You now have a subheading.
Body text
To write in your notebook in normal body text, you just have to type your text in the Markdown cell and press 'Run' or use the shortcut Shift+Enter.
- Type
This is my first lesson in Markdown.
- Click on 'Run' or use the shortcut Shift+Enter.
You can now type your comments in your Jupyter Notebook.
Editing a cell
Let's say we want to add some text to the cell you executed above. Double-click on that line and you can open up the cell again.
- After the first sentence, type
I'm doing really well!
If you want to add a new cell you can click on the 'up arrow' icon from the buttons above. To delete or edit a cell, you can toggle up and down the cells.
Adding a new cell
Let's add a new cell. Under your subheading, you can add another heading. Go to your subheading 'A lesson in Markdown' and click on the 'plus' button. This will create a new cell. Select 'Markdown' from the drop down menu.
- Type
### Use it to create rich text in a plain text editor
- Press 'Run' or use the shortcut Shift+Enter.
You now have a level three heading.
Bold
Now let's try bold font. In a new cell, select 'Markdown' from the dropdown menu again.
- Type
This is **really** interesting.
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
- Voila! Bold!
Italics
Now let's try italics. In a new cell, select 'Markdown' from the dropdown menu again.
- Type
This is really _interesting_.
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
- Voila! Italics!
Activity
Spend a couple of minutes practicing these skills: Headings, plain/bold/italics text, adding, removing and editing cells.
It can feel a little strange, as you already know how to do formatting in programs like Word. However, what we are doing here is 'speaking' directly to the computer, with a different kind of interface so you can also perform calcuations, visualisations and use computational methods. Remember that the reason Jupyter Notebooks is becoming so popular is because it is a format that allows for commenting and text to sit within the same 'document' as code, mathematical equations and visualisations. You can tell the story of what you are doing as you go, and this is a really useful way of being about to reproduce your results.
If you want to know more about markdown, take a look at these pages:
https://guides.github.com/features/mastering-markdown/
https://www.firstpythonnotebook.org/markdown/
_____________________________________________________________________________________________________________
TAKE A 30 MIN BREAK
_____________________________________________________________________________________________________________
IX. Introduction to Jupyter Notebooks - Episode 2A - 45 mins
Working in Jupyter Notebooks with R
Now we are going to start using the code cells. We selected the kernel for R when we opened the notebook. This means that the code we write using R can be run in the notebook. It's important to know here that you do not need to be a programmer to use Jupyter Notebooks. It is absolutely fine to know just a little bit - what you need - to get to do the tasks you want to do. Lots of people find this is a great way to start, and as you find better and faster ways of doing things you will gain the motivation to learn more about the language. But for now, what we are doing is showing you a couple of commands that can help you automate certain tasks, and all you have to do is copy and paste!
Add a new cell using the 'plus' icon. This time we can leave it as a default 'code' cell.
If you want to add a comment within the code cell, you can do this if you like. Just place a hash in front of the comment.
- Type
# For comments inside the code cell use a hash
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell. You can see that the text remains in the cell, which shows 'In' and the number of the line next to it. This helps you see that it is a code cell, not a Markdown cell.
Sequence
Now let's create a sequence of numbers, or integers:
- In a new cell, type
1:19
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell. See how quickly you can create a sequence, automating a task that you might otherwise have to do manually?
Sum
This next bit of code adds up that sequence.
- In a new cell, type
sum (1:19)
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell. This command has added up all of the numbers in the sequence.
Vector
This sounds 'mathsy' and can go in the 'jargon' type of list that we looked at initially. Sometimes you will come across some terms that look unfamiliar or remind you of a bad experience in a high school maths class! Fear not! We are here to help.
A vector is a sequence of data elements of the same basic type. A vector in programming is a type of array for storing and structuring data. Here we are going to assign to 'x' the combination of four different components (3, 5, 8, and 9). This creates then a shortcut for any manipulation of that sequence of elements.
- In a new cell, type
x = c(3, 5, 8, 9)
sum(x)
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell. This command has added up all of the components of the array.
You can create vectors using different kinds of data. Let's try some text and see how we can isolate one component of the array:
- In a new cell, type
y = c("Jack", "Queen", "King")
y [1]
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell. This command shows you just the first component of the array.
Matrix
A matrix is an array of arrays, made up of collections of the same data types. A matrix in R is like a mathematical matrix, containing all the same type of thing. Put really simply, it's like data in a table, with rows and columns but all columns in a matrix must have the same data type. .
- In a new cell, type
matrix(y,2,3,byrow=T)
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell. This command shows you all of the components of the array as a matrix. The '2' is the number of rows. The '3' is the number of columns.
Activity
Create your own sequence, changing the numbers so you can see how you can create long sequences of numbers.
2. Use the sum command on the sequence you created.
3. Create a new vector using numbers, and use the sum command to calculate the sum of all components.
4. Create a new vector using text, and use square brackets to to select any single element of the list (selecting different positions in that list).
5. Have a go at manipulating the matrix data display. Can you change it to one column and three rows? Three columns and ten rows?
Dataframe
A dataframe combines features of matrices and lists. All columns in a matrix must have the same data type (numeric, character, etc.). A data frame is more general than a matrix, in that different columns can have different types of data (numeric, character, factor, etc.). Just like a table in a database or excel sheet.
Let's create a dataframe, with column headings.
- In a new cell, type
employee <- c("Juanita Lopez", "Peter Gynn", "Jolie Talofa")
salary <- c(81000, 83400, 96800)
startdate <- as.Date(c("2010-11-1", "2008-3-25", "2007-3-14"))
In these commands we are assigning the data to a heading.
Continue in the same cell. Type
employ.data <- data.frame(employee, salary, startdate)
employ.data <- data.frame(employee, salary, startdate, stringsAsFactors = FALSE)
employ.data
In these commands we are assigning each of the data groups to the main employee data table.
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell. This group of commands shows you all of the components of the array as a table with headings for each column. The '2' is the number of rows. You have also assigned the dates a machine readable date format.
Activity
Add a new employee, salary and start date to this dataframe.
** The difference between a matrix and a dataframe can be hard to understand - this might help (might not!) https://www.quora.com/What-is-the-difference-between-a-matrix-and-a-dataframe-in-R
_____________________________________________________________________________________________________________
VIII. Introduction to Jupyter Notebooks - Episode 2B - 45 mins
Working with Jupyter Notebooks in Python
Now let's try using Python for some of the things we did in R. The first thing we need to do is change the kernel. Click on the 'Kernel' menu from the menu bar at the top of the page. Select 'change kernel' and click on 'Python 3'. Watch the top right hand corner of the screen to see it working on changing the kernel. When it changes to 'Trusted' we're ready to go.
In this part of the workshop we'll be having a go at using Python, doing some of the same things we did in R, though I'll also be introducing a couple of new concepts as we go, because the two programming languages work differently.
Sequence
Let's start with creating a sequence of numbers again.
In a new cell, select 'code'. Remember that the code cell looks different to the markdown cell. How can you tell?
- Type the following inside the cell:
list(range(1, 20))
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
You have just created a list of the range of numbers from 1 to 20. Woohoo!
Value
- In the next cell type the following:
a = list(range(1, 20))
This command tells the computer that the list of numbers you created can now be called 'a'. This is called a 'value'.
Sum
- In the SAME cell type the following underneath:
x = sum(a)
This second instruction tells the computer to add each of those numbers in the list ('a') together and give that total a value of 'x'.
Print
- In the SAME cell type the following underneath:
print(x)
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
This last instruction tells the computer to print the total of the list of numbers on the screen.
You just did some computing! Hooray!
Mindbender: With these three instructions you have performed a 'sequence'. In computing, this is a list of instructions to be carried out in order and forms one of the backbones of programming. It is different to the sequence of numbers we created above.
Activity
Take a few minutes to change the range of numbers, and/or change the values and see what happens. Have a bit of fun with it - see if you can beat the computer with your lightning speed mental arithmetic skills. Or just be amazed at how fast it can be.
Creating a list
Lists are ordered sequences of elements, and values can be repeated.
- In your 'code' cell, write the following series of commands:
arr = ["Jack", "Queen", "King"]
print(arr[0])
print(arr[1])
print(arr[2])
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
Using the command 'arr=' gives the content of the square brackets the value of an 'array'. Each thing in the list has then an automatically appointed number, from its order in the list. The first thing, the word "Jack" is position 0, the second, "Queen" is position 1, and so on.
Those numbers are then what you use to perform your computation or visualisation, as you see when you use the command 'print' to show each one as a printed output on the screen. What you can see is that you have assigned each word a number (also known as a key, or array index.
Activity
What would you do to print the items in the list in a different order? (HINT: There are more ways than one!)
Creating a dictionary or set
Now let's create a dictionary or set, using data of different types. In this table we'll use text, number and date data types.
Sets are collections of unique elements and you cannot order them. Lists are ordered sequences of elements, and values can be repeated.
Curly braces are used in Python to define a dictionary. A dictionary is a data structure that maps one value to another - kind of like how an English dictionary maps a word to its definition.
- In your 'code' cell, write the following series of commands:
d = {'employee': 'Juanita Lopez','salary':81000, 'startdate': '2010-11-1'}
e = {'employee': 'Peter Gynn','salary':83400, 'startdate': '2008-3-25'}
f = {'employee': 'Jolie Talofa','salary':96800, 'startdate': '2007-3-14'}
print (d)
print (e)
print (f)
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
Here we have assigned these three different rows (or dictionaries) a value of either 'd', 'e', or 'f'. Using the print command you can print them on the screen.
Dictionaries map keys to values, and the keys must be unique. This and other restrictions help Python keep track of them efficiently and know they are and that they remain unique.
In Python, the key is the term used before the colon and the value is the term used after it. The quote mark encapsulates the whole term, the comma separates them. The curly braces hold the whole 'dictionary'.
- In your 'code' cell, write the following series of commands:
d.keys()
d.values()
d.items()
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
This set of commands has now created a list of dictionary items as values. The 'd.' prefix refers to the dictionary we called 'd' above.
- In your 'code' cell, write the following series of commands:
for k,v in d.items():
print (k, v)
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
The tab key is important here. In this case we are showing the keys and values in the dictionary called 'd'.
- In your 'code' cell, write the following series of commands:
for k,v in e.items():
print(k, v)
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
In this case we are showing the keys and values in the dictionary called 'd'.
Remember: Curly braces create dictionaries or sets. Square brackets create lists.
Activity
Print the dictionary called 'f' using the 'for' command.
Create a new dictionary to add to the employee dataset.
A new dictionary
Dictionaries can be contained in lists and vice versa. A list is an ordered sequence of objects, whereas dictionaries are unordered sets. But the main difference is that items in dictionaries are accessed via keys and not via their position.
More theoretically, we can say that dictionaries are the Python implementation of an abstract data type, known in computer science as an associative array.
Associative arrays consist - like dictionaries of (key, value) pairs, such that each possible key appears at most once in the collection. Any key of the dictionary is associated (or mapped) to a value. The values of a dictionary can be any Python data type.
Let's make an English-German dictionary:
- In your 'code' cell, write the following series of commands:
en_de = {"red" : "rot", "blue" : "blau", "yellow" : "gelb"}
print (en_de)
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
You now have the beginnings of a list of colours in both languages. Let's see if we can make it work:
- In your 'code' cell, write the following series of commands:
print (en_de["red"])
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
Hooray! You just translated 'red' into German.
Now let's add some French.
- In your 'code' cell, write the following series of commands:
de_fr = {"rot" : "rouge", "blau" : "bleu", "gelb" : "jaune"}
print ("The French word for red is: " + de_fr[en_de["red"]])
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
By creating a dictionary structure you can now go to French via German.
Activity
See if you can translate from French to English, and German to French.
Expand on this dictionary.
_____________________________________________________________________________________________________________
EXTENSION ACTIVITIES FOR THOSE WHO ARE MORE ADVANCED:
1. Using data in CloudStor
- In a code cell type
import pandas
- Execute the cell.
- In a new code cell type
pandas.read_csv ("")
and place the public link to the data saved in CloudStor between the quotes: https://cloudstor.aarnet.edu.au/plus/s/x2uHIEZubsNuqEh/download
Upload your own data set and do it again
2. Using data from Google Sheets
Using Google Sheets, you can quickly import a dataset into your Jupyter Notebook.
Take a look at the GIF attached to this Tweet: https://twitter.com/choldgraf/status/1141436794359046144?s=12
You can create a basic dataset using Google Sheets and use the 'pandas' command to import the data into Jupyter Notebooks. REMEMBER: The data used in this way will be made public via the public link. Not for sensitive data!
Here's the code to copy, with a link to a Fortune 500 dataset we prepared earlier :)
- In a code cell, type:
import pandas
- Click on 'Run' or use the shortcut Shift+Enter to execute the cell.
- In the next code cell, type:
pandas.read_csv ("")
And then, in between the quote marks, paste this link: https://docs.google.com/spreadsheets/d/e/2PACX-1vQctQqQu1baZQJfhV333sEcjnkmvnRFtCGF0HVfoV3WnSmeDhhFneZ7bYtaxe3xFeMS9-pmzk83AuR4/pub?output=csv (this is what is known as a 'token')
You can continue to work with this data set, following along with the tutorial here: https://www.dataquest.io/blog/jupyter-notebook-tutorial/
2. Scraping data from Wikipedia
https://github.com/mboudour/var/blob/master/Boudourides_ScrapingWebPageTablesForBipartiteGraphs.ipynb
3. CloudStor access via WebDAV by Tim Sherrat
CloudStor is data storage service provided by AARNet. Individual researchers in AARNet connected institutions get 100gb of storage space for free, and research projects can apply for additional space.
CloudStor is an instance of OwnCloud, and OwnCloud provides WebDAV access, so I thought I'd have a go at using WebDAV to access file data on CloudStor.
It works, but there are a few tricks: https://nbviewer.jupyter.org/github/wragge/sydney-stock-exchange/blob/master/notebooks/Cloudstor-access-via-WebDAV.ipynb
More on publicly shared data: https://nbviewer.jupyter.org/github/wragge/sydney-stock-exchange/blob/master/notebooks/Cloudstor-access-to-a-public-share-via-WebDAV.ipynb
_____________________________________________________________________________________________________________
VIII. Introduction to Jupyter Notebooks - Episode 2C - 15 mins
Jupyter Notebooks in the researcher's toolkit
Top three data science/analytics tools, technologies and languages used in the past year:
Python 60%
R 46%
Jupyter notebooks 32%
The survey included a question for data professionals who were employed, “For work, which data science/analytics tools, technologies, and languages have you used in the past year? (Select all that apply).” 2017 survey by Kaggle of 16,000 data professionals.
Designed to make data analysis easier to share and reproduce
Used increasingly by researchers who want to keep detailed records of their work
Devise teaching modules and collaborate with colleagues
Researchers are publishing the notebooks to back up their research papers
Using Jupyter notebooks as a new form of interactive research publishing
Example of Jupyter Notebooks in the field:
OzGLAM Data Workbench - Dr Tim Sherratt (University of Canberra)
https://github.com/GLAM-Workbench/ozglam-workbench
https://github.com/wragge/ozglam-workbench/blob/master/1-Introduction-and-table-of-contents.ipynb
Stored in GitHub
Viewable in nbviewer in Jupyter.org
Rendered with MyBinder
Something extra: Jupyter Notebooks and teaching: https://jupyter4edu.github.io/jupyter-edu-book/
Using Jupyter Notebooks in the cloud
One of the benefits of using Jupyter notebooks is that you can run them in the cloud, without having to use anything other than your browser. This lesson will review six services you can use to run your notebook in the cloud.
Services available
* [Binder](https://mybinder.org/)
* [Kaggle Kernels](https://www.kaggle.com/kernels)
* [Google Colaboratory (Colab)](https://colab.research.google.com)
* [Microsoft Azure Notebooks](https://notebooks.azure.com/)
* [CoCalc](https://cocalc.com/doc/jupyter-notebook.html)
* [Datalore](https://datalore.io/)
Benefits across all services
* No need to install anything on your local machine
* Free (or free plan)
* Access to Jupyter Notebook environment (or Jupyter-like environment)
* Ability to import and export notebooks using the standard .ipynb file format
* Support Python language (and most support other languages)
Comparisons
* (https://docs.google.com/spreadsheets/d/12thaaXg1Idr3iWST8QyASNDs08sjdPd6m9mbCGtHFn0/edit#gid=1505836451)
Activity
Create a Jupyter Notebook then export it to a different platform. If you don't have your own notebook, find one you are interested in on GitHub then import it to one of the services described above. Choose the platform you think is the one you might find the most useful to try out.
How to choose the right notebook for you
CloudStor (via AARNet)
Jupyter.org (example notebooks) https://jupyter.org/try
MyBinder (notebooks in GitHub) https://mybinder.org/
Anaconda (desktop app) https://anaconda.org/anaconda/python
CoLaboratory (Google) https://research.google.com/colaboratory/faq.html
There are many options. Think about how you work, whether desktop or cloud is more useful or reliable. Do you work in the field, away from the network, for example? Desktop might be better. Do you want to work with integrated storage so you can use your datasets in the same location. Try your cloud service. Is there data available in an integrated environment, so you can work directly with the data, without having to download large datasets? Here in Australia the Tinker Studio https://app.tinker.edu.au/ offers Jupyter Notebooks along with a collection of datasets.
Wrapping up
Thanks everyone for coming along on the big Jupyter Notebooks ride! You now know what they are, what they look like and a little bit of what they can do. You also know about CloudStor and where you can keep your research data safe and warm. Working on datasets within CloudStor helps to make your life and research that little bit easier! Tell your friends and go out to see which Jupyter Notebooks communities are out there in your field, or if there isn't one, make one!