updates module5&8

UGA-IDD · Jul 16, 2024 · ae919aa · ae919aa
1 parent f2e7e75
commit ae919aa
Show file tree

Hide file tree

Showing 7 changed files with 27 additions and 9 deletions.
diff --git a/data/modules-data/serodata.csv → data/serodata.csv b/data/modules-data/serodata.csv → data/serodata.csv
diff --git a/data/modules-data/serodata.xlsx → data/serodata.xlsx b/data/modules-data/serodata.xlsx → data/serodata.xlsx
diff --git a/data/modules-data/serodata1.txt → data/serodata1.txt b/data/modules-data/serodata1.txt → data/serodata1.txt
diff --git a/data/modules-data/serodata2.txt → data/serodata2.txt b/data/modules-data/serodata2.txt → data/serodata2.txt
diff --git a/data/modules-data/serodata_new.csv → data/serodata_new.csv b/data/modules-data/serodata_new.csv → data/serodata_new.csv
diff --git a/modules/Module05-DataImportExport.qmd b/modules/Module05-DataImportExport.qmd
@@ -30,15 +30,16 @@ A delimited file is a sequential file with column delimiters. Each delimited fil
 
 ## Mini exercise
 
-1. Download Module 5 data from the website and save the data to your data subdirectory -- specifically `SISMID_IntroToR_RProject/data`
+1. Download 5 data from the website and save the data to your data subdirectory -- specifically `SISMID_IntroToR_RProject/data`
 
-1. Open the '.csv' and '.txt' data files in a text editor application and familiarize yourself with the data (i.e., Notepad for Windows and TextEdit for Mac)
+1. Open the 'serodata.csv' and 'serodata1.txt' and 'serodata2.txt' data files in a text editor application and familiarize yourself with the data (i.e., Notepad for Windows and TextEdit for Mac)
 
-1. Open the '.xlsx' data file in excel and familiarize yourself with the data
+1. Determine the delimiter of the two '.txt' files
+
+1. Open the 'serodata.xlsx' data file in excel and familiarize yourself with the data
 		-		if you use a Mac **do not** open in Numbers, it can corrupt the file
 		-		if you do not have excel, you can upload it to Google Sheets
 
-1. Determine the delimiter of the two '.txt' files
 
 ## Mini exercise
 

diff --git a/modules/Module08-DataMergeReshape.qmd b/modules/Module08-DataMergeReshape.qmd
@@ -45,6 +45,21 @@ library(printr)
 ?merge
 ```
 
+## Join Types
+
+- Full join: includes all unique observations in object df.x and df.y
+    - `merged.df <- merge(df.x, df.y, all.x=T, all.y=T, by=merge_variable)`
+    -  arguments `all = TRUE` is the same as `all.x = TRUE, all.y = TRUE`
+    -  the number of rows in `merged.df` is >= max(nrow(df.x), nrow(df.y))
+- Inner join: includes observations that are in both df.x and df.y
+    - `merged.df <- merge(df.x, df.y, all.x=F, all.y=F, by=merge_variable)`
+    - the number of rows in `merged.df` is <= min(nrow(df.x), nrow(df.y))
+- Left join: joining on the first object (df.x) so it includes observations that in df.x
+    - `merged.df <- merge(df.x, df.y, all.x=T, all.y=F, by=merge_variable)`
+    - the number of rows in `merged.df` is nrow(df.x)
+- Right join: joining on the second object (df.y) so it includes observations that in df.y
+    - `merged.df <- merge(df.x, df.y, all.x=F, all.y=T, by=merge_variable)`
+    - the number of rows in `merged.df` is nrow(df.y)
 
 ## Lets import the new data we want to merge and take a look
 
@@ -91,6 +106,7 @@ Now, lets merge.
 ```{r echo=TRUE}
 df_all_wide <- merge(df, df_new, all.x=T, all.y=T, by=c('observation_id'))
 str(df_all_wide)
+head(df_all_wide)
 ```
 
 ## Merge the new data with the original data
@@ -112,6 +128,7 @@ head(df_new)
 Now, lets merge. Note, "By default the data frames are merged on the columns with names they both have" therefore if I don't specify the by argument it will merge on all matching variables.
 ```{r echo=TRUE}
 df_all_long <- merge(df, df_new, all.x=T, all.y=T)
+str(df_all_long)
 head(df_all_long)
 ```
 
@@ -310,17 +327,17 @@ df_back_to_wide <- reshape(df_wide_to_long)
 
 ## Let's get real
 
-Use the `pivot_wider()` and `pivot_longer()` from the tidyr package!
+We recommend checking out the `pivot_wider()` and `pivot_longer()` from the tidyr package!
 
 
 
 ## Summary
 
-- the `merge()` function can be used to marge datasets. 
+- the `merge()` function can be used to merge datasets. 
 - pay close attention to the number of rows in your data set before and after a merge
-- wide data has many columns and has many columns per observation
-- long data has many rows and can have multiple rows per observation
-- the `reshape()` function allows you to toggle between wide and long data. although we highly recommend using `pivot_wider()` and `pivot_longer()` from the tidyr package instead 
+- wide data has many columns per observation
+- long data has many rows per observation
+- the `reshape()`function allows you to toggle between wide and long data. although we highly recommend playing around with the `pivot_wider()` and `pivot_longer()` from the tidyr package instead 
 
 
 ## Acknowledgements