-
Notifications
You must be signed in to change notification settings - Fork 4
/
coding_01_variables.Rmd
169 lines (108 loc) · 6.05 KB
/
coding_01_variables.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: 'Variables'
output:
pdf_document: default
html_document:
df_print: paged
html_notebook: default
author: Marius 't Hart
---
```{r setup, cache=FALSE, include=FALSE}
#install.packages('knitr')
library(knitr)
opts_chunk$set(comment='', eval=FALSE)
```
In this first tutorial you will be introduced to a few basic variable types and how to change them: numbers, text, sequences of numbers (and texts) and you'll briefly touch on matrices.
# Numeric variables
Most analyses and code will deal with numbers and these are introduced here.
## Simple Math
You could use R as a simple calculator by typing in the exact formula's and checking the results yb hand. For example, if we have some starting position A, and two other positions B and C, and we want to see if B or C is closer to A, we can just use the pythagorean theorem twice, and compare the results. Let's say A is at (2, 3), B is at (3, 0) and C is at (4, 5).
First we calculate the distance between A and B:
```{r distance_A_B}
sqrt((3-2)^2 + (0-3)^2)
```
If you press the green 'play' button at the top right of the chunk of code, you should see that the distance between A and B is 3.162278...
Then we calculate the distance between A and C, complete this yourself:
```{r distance_A_C}
sqrt(( - )^2 + ( - )^2)
```
You should see a distance of around 2.828, so that C is closer to A than B is to A. Of course, for just two points this is easy and you probably shouldn't bother writing code. But let's say you have a list with 100 cities and towns, and want to see all their distances to the capital, it becomes worth your time to automate this, and then usually, you would use variables.
## What Are Variables?
A short and incomplete answer is: variables are "containers" for data. They resemble the A's, B's, X's and Y's from high school math, in that they have some form of name, that represents a value. The names can be 'A' or 'X', but in code it is better to use descriptive labels, like 'slope' or 'age'. Such labels can refer to a single number, but also to vectors or matrices or other types of variables that can hold more than a single value. There are formal definitions, but for a practical understanding it is better to just try out how they work yourself, which is what you will do here.
## Numeric variables
Let's put the starting positions coordinates in the variables startX and startY, complete the code yourself:
```{r assign_start_position_coordinates}
startX <- 2
startY <-
```
You see the `<-` operator, called 'gets', is used to assign a value to a label, two times. You can check what the values are by typing the variable's names at the command line again:
```{r check_start_position_coordinates}
startX
startY
```
If you did not complete the previous chunk of code, the above chunk will generate an error, because startY has no value yet. Go back and fill in the value to complete the rest of this notebook.
## Integer numbers
If you only want round numbers, you can use `as.integer()` to create an integer number from any other number:
```{r as_integer_pi}
as.integer(pi)
```
In accordance with the [Indiana Pi Bill](https://en.wikipedia.org/wiki/Indiana_Pi_Bill). But if you then do some calculations with the numbers, the output will again be numeric, even when it could be an integer:
```{r integer_to_numeric}
a <- as.integer(pi)
class(a)
class(a + 3)
```
However, if you have very large datasets that can be effectively represented by integers, it could still make sense to use them, as they take up a little less computer memory, and some computations can be done a little faster with them.
## Floating point numbers
By default, R will create numeric variables that can hold floating point doubles. These can be very large or very small and they be fractions, like $\pi$:
```{r show_pi}
pi
```
But they are not infinitely precise, so that sometimes you get rounding errors:
```{r rounding_errors}
round((.575*100),0)
round(57.5,0)
```
Which also means that the number $\pi$ that R, or any other programming language, provides is a good approximation but not the exact value. It's also another reason to always check if your calculations make sense.
# Text
Not all data consists of numbers, or just vectors of numbers. Here we will briefly introduce character variables and how they can be used as keys in lists. But most data we deal with will be numeric, and in data frame format. Data frames are the topic of the next tutorial.
## Character variables
Sometimes data does not consist of numbers, but text. In that case, character variables can represent the data in R:
```{r obligatory_hello_world}
my_text <- "hello world!"
my_text
```
Numbers can also be converted to character variables:
```{r as_character_pi}
as.character(pi)
```
When you `paste()` character variables, the two bits of text are simply concatenated:
```{r paste_text}
paste(my_text, as.character(pi))
```
By default, `paste()` adds a space between the character variables, but this can be changed. If you don't want anything in between the pieces of text, you can use `paste0()`. Both function will convert any input that is not already a character variable to one -- if possible:
```{r paste0_text_numeric}
paste0(my_text, pi)
```
There are some operations you can do on character variables. Since they can be seen as a vector of single characters, you can use a kind of indexing on them. Correct this code chunk so it outputs the word 'hello':
```{r character_indexing}
substr(my_text, ,10)
```
Similarly you can find certain letters:
```{r find_letters_in_text}
grepRaw('l', my_text)
grepRaw('l', my_text, all=TRUE)
```
Change the code to find the location of the whole word 'world':
```{r find_hello}
grepRaw('', my_text)
```
You can split a character variable into several character variables. The split will occur at specified characters. Split `my_text` into the words 'hello' and 'world!' by changing this chunk of code:
```{r split_my_text}
strsplit(my_text, split='l')
```
Another example of an operation on a string is to capitalize all letters in the string:
```{r capitalize_my_text}
toupper(my_text)
```
But I don't recommend screaming.