Skip to content

Commit

Permalink
updates module5&8
Browse files Browse the repository at this point in the history
  • Loading branch information
amykwinter committed Jul 16, 2024
1 parent f2e7e75 commit ae919aa
Show file tree
Hide file tree
Showing 7 changed files with 27 additions and 9 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
9 changes: 5 additions & 4 deletions modules/Module05-DataImportExport.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,16 @@ A delimited file is a sequential file with column delimiters. Each delimited fil

## Mini exercise

1. Download Module 5 data from the website and save the data to your data subdirectory -- specifically `SISMID_IntroToR_RProject/data`
1. Download 5 data from the website and save the data to your data subdirectory -- specifically `SISMID_IntroToR_RProject/data`

1. Open the '.csv' and '.txt' data files in a text editor application and familiarize yourself with the data (i.e., Notepad for Windows and TextEdit for Mac)
1. Open the 'serodata.csv' and 'serodata1.txt' and 'serodata2.txt' data files in a text editor application and familiarize yourself with the data (i.e., Notepad for Windows and TextEdit for Mac)

1. Open the '.xlsx' data file in excel and familiarize yourself with the data
1. Determine the delimiter of the two '.txt' files

1. Open the 'serodata.xlsx' data file in excel and familiarize yourself with the data
- if you use a Mac **do not** open in Numbers, it can corrupt the file
- if you do not have excel, you can upload it to Google Sheets

1. Determine the delimiter of the two '.txt' files

## Mini exercise

Expand Down
27 changes: 22 additions & 5 deletions modules/Module08-DataMergeReshape.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,21 @@ library(printr)
?merge
```

## Join Types

- Full join: includes all unique observations in object df.x and df.y
- `merged.df <- merge(df.x, df.y, all.x=T, all.y=T, by=merge_variable)`
- arguments `all = TRUE` is the same as `all.x = TRUE, all.y = TRUE`
- the number of rows in `merged.df` is >= max(nrow(df.x), nrow(df.y))
- Inner join: includes observations that are in both df.x and df.y
- `merged.df <- merge(df.x, df.y, all.x=F, all.y=F, by=merge_variable)`
- the number of rows in `merged.df` is <= min(nrow(df.x), nrow(df.y))
- Left join: joining on the first object (df.x) so it includes observations that in df.x
- `merged.df <- merge(df.x, df.y, all.x=T, all.y=F, by=merge_variable)`
- the number of rows in `merged.df` is nrow(df.x)
- Right join: joining on the second object (df.y) so it includes observations that in df.y
- `merged.df <- merge(df.x, df.y, all.x=F, all.y=T, by=merge_variable)`
- the number of rows in `merged.df` is nrow(df.y)

## Lets import the new data we want to merge and take a look

Expand Down Expand Up @@ -91,6 +106,7 @@ Now, lets merge.
```{r echo=TRUE}
df_all_wide <- merge(df, df_new, all.x=T, all.y=T, by=c('observation_id'))
str(df_all_wide)
head(df_all_wide)
```

## Merge the new data with the original data
Expand All @@ -112,6 +128,7 @@ head(df_new)
Now, lets merge. Note, "By default the data frames are merged on the columns with names they both have" therefore if I don't specify the by argument it will merge on all matching variables.
```{r echo=TRUE}
df_all_long <- merge(df, df_new, all.x=T, all.y=T)
str(df_all_long)
head(df_all_long)
```

Expand Down Expand Up @@ -310,17 +327,17 @@ df_back_to_wide <- reshape(df_wide_to_long)

## Let's get real

Use the `pivot_wider()` and `pivot_longer()` from the tidyr package!
We recommend checking out the `pivot_wider()` and `pivot_longer()` from the tidyr package!



## Summary

- the `merge()` function can be used to marge datasets.
- the `merge()` function can be used to merge datasets.
- pay close attention to the number of rows in your data set before and after a merge
- wide data has many columns and has many columns per observation
- long data has many rows and can have multiple rows per observation
- the `reshape()` function allows you to toggle between wide and long data. although we highly recommend using `pivot_wider()` and `pivot_longer()` from the tidyr package instead
- wide data has many columns per observation
- long data has many rows per observation
- the `reshape()`function allows you to toggle between wide and long data. although we highly recommend playing around with the `pivot_wider()` and `pivot_longer()` from the tidyr package instead


## Acknowledgements
Expand Down

0 comments on commit ae919aa

Please sign in to comment.