changes

UGA-IDD · Jul 15, 2024 · 8c027aa · 8c027aa
1 parent fda1c44
commit 8c027aa
Show file tree

Hide file tree

Showing 86 changed files with 11,455 additions and 5,315 deletions.
diff --git a/SISMID-Module.bib b/SISMID-Module.bib
@@ -7,7 +7,8 @@ @book{Wickham2023
   month     =  jun,
   year      =  2023,
   address   = "Sebastopol, CA",
-  language  = "en"
+  language  = "en",
+  howpublished = "\url{https://r4ds.hadley.nz/}"
 }
 
 @BOOK{Matloff2011-gc,
@@ -20,3 +21,31 @@ @BOOK{Matloff2011-gc
   language  = "en"
 }
 
+@BOOK{Keyes2024-rg,
+  title     = "{R} for the Rest of Us: A statistics-free introduction",
+  author    = "Keyes, David",
+  publisher = "No Starch Press",
+  month     =  jun,
+  year      =  2024,
+  address   = "San Francisco, CA",
+  language  = "en"
+}
+
+@manual{Rintro,
+  title = "An introduction to {R}",
+  author = "{R Core team}",
+  year = 2024,
+  howpublished = "\url{https://cran.r-project.org/doc/manuals/r-release/R-intro.html}"
+}
+
+@misc{Carchedi_Kross_2024, title={Learn R, in R.}, url={https://swirlstats.com/}, journal={swirl}, author={Carchedi, Nick and Kross, Sean}, year={2024}} 
+
+@book{epir,
+  author = {Batra, Neale and Spina, Alex and Blomquist, Paula and Campbell, Finlay and Laurenson-Schafer, Henry and Florence, Isaac, and Fischer, Natalie and Ndiaye, Aminata and Coyer, Liza and Polonsky, Jonathan and Izawa, Yurie and Bailey, Chris and Molling, Daniel and Berry, Isha and Buajitti, Emma and Mousset, Mathilde and Hollis, Sara and Lin, Wen},
+  editor = {Batra, Neale},
+  title = {epiR Handbook},
+  publisher = {Applied Epi Incorporated},
+  year = {2021},
+  copyright = {Open Access},
+  howpublished = "\url{https://epirhandbook.com/}"
+}
diff --git a/_freeze/archive/CaseStudy01/execute-results/html.json b/_freeze/archive/CaseStudy01/execute-results/html.json
@@ -0,0 +1,18 @@
+{
+  "hash": "c061862e27098cb02ccfc71ba1c8ea30",
+  "result": {
+    "markdown": "---\ntitle: \"Algorithmic Thinking Case Study 1\"\nsubtitle: \"SISMID 2024 -- Introduction to R\"\nformat:\n  revealjs:\n    toc: false\nexecute: \n  echo: false\n---\n\n\n## Learning goals\n\n* Use logical operators, subsetting functions, and math calculations in R\n* Translate human-understandable problem descriptions into instructions that\nR can understand.\n\n# Remember, R always does EXACTLY what you tell it to do!\n\n## Instructions\n\n* Make a new R script for this case study, and save it to your code folder.\n* We'll use the diphtheria serosample data from Exercise 1 for this case study.\nLoad it into R and use the functions we've learned to look at it.\n\n## Instructions\n\n* Make a new R script for this case study, and save it to your code folder.\n* We'll use the diphtheria serosample data from Exercise 1 for this case study.\nLoad it into R and use the functions we've learned to look at it.\n* The `str()` of your dataset should look like this.\n\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\ntibble [250 × 5] (S3: tbl_df/tbl/data.frame)\n $ age_months  : num [1:250] 15 44 103 88 88 118 85 19 78 112 ...\n $ group       : chr [1:250] \"urban\" \"rural\" \"urban\" \"urban\" ...\n $ DP_antibody : num [1:250] 0.481 0.657 1.368 1.218 0.333 ...\n $ DP_infection: num [1:250] 1 1 1 1 1 1 1 1 1 1 ...\n $ DP_vacc     : num [1:250] 0 1 1 1 1 1 1 1 1 1 ...\n```\n:::\n:::\n\n\n## Q1: Was the overall prevalence higher in urban or rural areas?\n\n::: {.incremental}\n\n1. How do we calculate the prevalence from the data?\n1. How do we calculate the prevalence separately for urban and rural areas?\n1. How do we determine which prevalence is higher and if the difference is\nmeaningful?\n\n:::\n\n## Q1: How do we calculate the prevalence from the data?\n\n::: {.incremental}\n\n* The variable `DP_infection` in our dataset is binary / dichotomous.\n* The prevalence is the number or percent of people who had the disease over\nsome duration.\n* The average of a binary variable gives the prevalence!\n\n:::\n\n. . .\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(diph$DP_infection)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.8\n```\n:::\n:::\n\n\n## Q1: How do we calculate the prevalence separately for urban and rural areas?\n\n. . .\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(diph[diph$group == \"urban\", ]$DP_infection)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.8235294\n```\n:::\n\n```{.r .cell-code}\nmean(diph[diph$group == \"rural\", ]$DP_infection)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.778626\n```\n:::\n:::\n\n\n. . .\n\n* There are many ways you could write this code! You can use `subset()` or you\ncan write the indices many ways.\n* Using `tbl_df` objects from `haven` uses different `[[` rules than a base R\ndata frame.\n\n## Q1: How do we calculate the prevalence separately for urban and rural areas?\n\n* One easy way is to use the `aggregate()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\naggregate(DP_infection ~ group, data = diph, FUN = mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n  group DP_infection\n1 rural    0.7786260\n2 urban    0.8235294\n```\n:::\n:::\n\n\n## Q1: How do we determine which prevalence is higher and if the difference is meaningful?\n\n::: {.incremental}\n\n* We probably need to include a confidence interval in our calculation.\n* This is actually not so easy without more advanced tools that we will learn\nin upcoming modules.\n* Right now the best options are to do it by hand or google a function.\n\n:::\n\n## Q1: By hand\n\n\n::: {.cell}\n\n```{.r .cell-code}\np_urban <- mean(diph[diph$group == \"urban\", ]$DP_infection)\np_rural <- mean(diph[diph$group == \"rural\", ]$DP_infection)\nse_urban <- sqrt(p_urban * (1 - p_urban) / nrow(diph[diph$group == \"urban\", ]))\nse_rural <- sqrt(p_rural * (1 - p_rural) / nrow(diph[diph$group == \"rural\", ])) \n\nresult_urban <- paste0(\n\t\"Urban: \", round(p_urban, 2), \"; 95% CI: (\",\n\tround(p_urban - 1.96 * se_urban, 2), \", \",\n\tround(p_urban + 1.96 * se_urban, 2), \")\"\n)\n\nresult_rural <- paste0(\n\t\"Rural: \", round(p_rural, 2), \"; 95% CI: (\",\n\tround(p_rural - 1.96 * se_rural, 2), \", \",\n\tround(p_rural + 1.96 * se_rural, 2), \")\"\n)\n\ncat(result_urban, result_rural, sep = \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nUrban: 0.82; 95% CI: (0.76, 0.89)\nRural: 0.78; 95% CI: (0.71, 0.85)\n```\n:::\n:::\n\n\n## Q1: By hand\n\n* We can see that the 95% CI's overlap, so the groups are probably not that\ndifferent. **To be sure, we need to do a 2-sample test! But this is not a\nstatistics class.**\n* Some people will tell you that coding like this is \"bad\". **But 'bad' code\nthat gives you answers is better than broken code!** We will learn techniques for writing this with less work and less repetition\nin upcoming modules.\n\n## Q1: Googling a package\n\n. . .\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# install.packages(\"DescTools\")\nlibrary(DescTools)\n\naggregate(DP_infection ~ group, data = diph, FUN = DescTools::MeanCI)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n  group DP_infection.mean DP_infection.lwr.ci DP_infection.upr.ci\n1 rural         0.7786260           0.7065872           0.8506647\n2 urban         0.8235294           0.7540334           0.8930254\n```\n:::\n:::\n\n\n## You try it!\n\n* Using any of the approaches you can think of, answer this question!\n* **How many children under 5 were vaccinated? In children under 5, did\nvaccination lower the prevalence of infection?**\n\n## You try it!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# How many children under 5 were vaccinated\nsum(diph$DP_vacc[diph$age_months < 60])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 91\n```\n:::\n\n```{.r .cell-code}\n# Prevalence in both vaccine groups for children under 5\naggregate(\n\tDP_infection ~ DP_vacc,\n\tdata = subset(diph, age_months < 60),\n\tFUN = DescTools::MeanCI\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n  DP_vacc DP_infection.mean DP_infection.lwr.ci DP_infection.upr.ci\n1       0         0.4285714           0.1977457           0.6593972\n2       1         0.6373626           0.5366845           0.7380407\n```\n:::\n:::\n\n\nIt appears that prevalence was HIGHER in the vaccine group? That is\ncounterintuitive, but the sample size for the unvaccinated group is too small\nto be sure.\n\n## Congratulations for finishing the first case study!\n\n* What R functions and skills did you practice?\n* What other questions could you answer about the same dataset with the skills\nyou know now?\n",
+    "supporting": [],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {
+      "include-after-body": [
+        "\n<script>\n  // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n  // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n  // slide changes (different for each slide format).\n  (function () {\n    // dispatch for htmlwidgets\n    function fireSlideEnter() {\n      const event = window.document.createEvent(\"Event\");\n      event.initEvent(\"slideenter\", true, true);\n      window.document.dispatchEvent(event);\n    }\n\n    function fireSlideChanged(previousSlide, currentSlide) {\n      fireSlideEnter();\n\n      // dispatch for shiny\n      if (window.jQuery) {\n        if (previousSlide) {\n          window.jQuery(previousSlide).trigger(\"hidden\");\n        }\n        if (currentSlide) {\n          window.jQuery(currentSlide).trigger(\"shown\");\n        }\n      }\n    }\n\n    // hookup for slidy\n    if (window.w3c_slidy) {\n      window.w3c_slidy.add_observer(function (slide_num) {\n        // slide_num starts at position 1\n        fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n      });\n    }\n\n  })();\n</script>\n\n"
+      ]
+    },
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}