Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added installation tutorial for R basics #2

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions R_Basics.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# R Basics


# First step: load in function libraries
install.packages("ggplot2") # plotting package
install.packages("psych")
Expand Down
22 changes: 22 additions & 0 deletions R_Installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Guide to Download R

R is a language and an environment for statistical programming.
In order to get up and running with R you must first download it on your machine.


1. Go to [www.r-project.org]().

2. Choose your Comprehensive R Archive Network (CRAN) Mirror. Choose one that is closest to you.

And that is you have R installed in your machine.

If you would like to have a richer experience with R download R Studio.

R Studio is an open source Integrated Development Environment(IDE). An IDE a software that helps you write code.
R Studio comes with many functions such as a code editor, autocompletion, and many more.

1. Head to www.rstudio.com


**Note** In order to use RStudio you MUST download R on your machine first.

18 changes: 18 additions & 0 deletions R_Tour_Resources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Resources

Here are more resources to guide you in your journey to understanding R.

## R-Basics

[Software Carpentry - Programming with R](http://swcarpentry.github.io/r-novice-inflammation/)

[Software Carpentry - R for Reproducible Scientific Analysis](https://swcarpentry.github.io/r-novice-gapminder/)

## Tidyverse

[R for Data Science](https://r4ds.had.co.nz/)
[Jenny Bryan's Stats 545](http://stat545.com/topics.html)
[dplyr](https://dplyr.tidyverse.org/)

## Blogs/Podcasts
[Rbloggers](https://www.r-bloggers.com/)
16 changes: 16 additions & 0 deletions basicsProject/chicago_crime/crime.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@


#Load the csv data

#How many rows of data (observations) are in this dataset?

#How many variables are in this dataset?

#Using the "max" function, what is the maximum value of the variable "ID"?

#What is the minimum value of the variable "Beat"?

#How many observations have value TRUE in the Arrest variable (this is the number of crimes for which an arrest was made)?

#How many observations have a LocationDescription value of ALLEY?

1 change: 1 addition & 0 deletions basicsProject/chicago_crime/mvtWeek1.csv

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions basicsProject/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# 1. R Projects - Hands on practice for R programming

Welcome to R Ladies Chicago organization Github Repo. Here you will find R programming challenges to
help you get a better grasp of R. Challenge problems are fun, hands-on coding exercises covering a variety of topics -- such as data exploration, data visualization and more.


[Talk to us on Slack](https://rladies-chicago.slack.com/messages/C6A01PTU3/)


## What you need

- [R and RStudio](../R_Installation.md)

175 changes: 175 additions & 0 deletions subsettingR.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
title: "R Notebook"
output: html_notebook
---

This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the *Run* button within the chunk or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*.

```{r}
plot(cars)
```

Add a new chunk by clicking the *Insert Chunk* button on the toolbar or by pressing *Ctrl+Alt+I*.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the *Preview* button or press *Ctrl+Shift+K* to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike *Knit*, *Preview* does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

So, I've just finished up the R Programming course that is apart of Coursera's John Hopkins Data Science specialization. And I must say that it did some judo mortal kombat moves on my mind. This course is not beginner friend but I've learned a lot and I think it's safe to say that I am becoming a master at subsetting and filtering data in R. In retrospect if you are planning to take this specialization you should do the Getting and Cleaning Data Course before you start the R Programming course.

A collection of notes on how to select different rows and columns within R.

R has four main data structures to store and manipulate data which are vectors, matrices, data frames, and list. So far in my on again off again relationship with R I mostly worked with data frames. I will be using the airquality dataset that comes preinstalled in R.

Selecting
```{r}
data("airquality")


names("airquality")
names(airquality)
```
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
```{r}
head(airquality)
```
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6

Subsetting data by index.
While subsetting the placement of the comma is important.

Extracting a specific observation(row) make sure that you include a comma to the right of the object you are extracting from. DONT FORGET THE COMMA

```{r}
airquality[1,]
```
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
# First two rows and all columns

```{r}
airquality[1:2,]
```
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2


Extracting a specific variable(column)make sure that you include a comma to the left of the object you are extracting from.

# First column and all rows

```{r}
airquality[,1]

```
[1] 41 36 12 18 NA 28 23 19 8 NA 7 16 11 14 18 14 34 6 30 11 1 11 4 32
[25] NA NA NA 23 45 115 37 NA NA NA NA NA NA 29 NA 71 39 NA NA 23 NA NA 21 37
[49] 20 12 13 NA NA NA NA NA NA NA NA NA NA 135 49 32 NA 64 40 77 97 97 85 NA
[73] 10 27 NA 7 48 35 61 79 63 16 NA NA 80 108 20 52 82 50 64 59 39 9 16 78
[97] 35 66 122 89 110 NA NA 44 28 65 NA 22 59 23 31 44 21 9 NA 45 168 73 NA 76
[121] 118 84 85 96 78 73 91 47 32 20 23 21 24 44 21 28 9 13 46 18 13 24 16 13
[145] 23 36 7 14 30 NA 14 18 20
#First two columns and all rows

```{r}
airquailty[,1:2]
```
Ozone Solar.R
1 41 190
2 36 118
3 12 149
4 18 313
5 NA NA
6 28 NA
7 23 299

You can also select a column from data frame�using the variable name with a dollar sign

```{r}
airquality$Ozone
```

[1] 41 36 12 18 NA 28 23 19 8 NA 7 16 11 14 18 14 34 6 30 11 1 11 4 32
[25] NA NA NA 23 45 115 37 NA NA NA NA NA NA 29 NA 71 39 NA NA 23 NA NA 21 37
[49] 20 12 13 NA NA NA NA NA NA NA NA NA NA 135 49 32 NA 64 40 77 97 97 85 NA
[73] 10 27 NA 7 48 35 61 79 63 16 NA NA 80 108 20 52 82 50 64 59 39 9 16 78
[97] 35 66 122 89 110 NA NA 44 28 65 NA 22 59 23 31 44 21 9 NA 45 168 73 NA 76
[121] 118 84 85 96 78 73 91 47 32 20 23 21 24 44 21 28 9 13 46 18 13 24 16 13
[145] 23 36 7 14 30 NA 14 18 20
These are all the observations from the Ozone column

Extracting multiple columns from a data frame

df[,c("A","B","E")] source Stack Overflow

```{r}
head(airquality[,c("Ozone","Temp")])
```
Ozone Temp
1 41 67
2 36 72
3 12 74
4 18 62
5 NA 56
6 28 66


You can also filter data when you are making a selection by using basic logic statements

Basic Logic statements

Operator Description
== equal
!= Not equal | > greater than | < less than | <= less than or equal | > greater than or equal
! NOT | & And | Or

%in% match returns a vector of the positions of (first) matches of its first argument in its second.

Logical Function Description
which.min Index of the minimum value
which.max index of the maximum value

Extract the subset of rows of the data frame where Ozone values are above 31
and Temp values are above 90.
```{r}
myset<-data[data$Ozone>31 & data$Temp>90,]
```
Extract the subset of rows of the data frame where Month values equal 5,7,8
```{r}
airquality[airquality$Month %in% c(5,7,8),]
```

What is the mean of "Temp" when "Month" is equal to 6?
```{r}
june <-airquality[airquality$Month==6,]
mean(june$Temp, na.rm=TRUE)
```

What was the maximum ozone value in the month of May (i.e. Month is equal to 5)?
```{r}
may<-airquality[airquality$Month==5,]
may[which.max(may$Ozone),]
```



More Resources

Practice with subsetting with R

http://www.ats.ucla.edu/stat/r/modules/subsetting.htm
Subsetting vectors,lists,matrices
http://adv-r.had.co.nz/Subsetting.html
https://ramnathv.github.io/pycon2014-r/learn/subsetting.html
Subsetting by string

http://rpackages.ianhowson.com/cran/DataCombine/man/grepl.sub.html
422 changes: 422 additions & 0 deletions subsettingRmd.nb.html

Large diffs are not rendered by default.