-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
fda1c44
commit 8c027aa
Showing
86 changed files
with
11,455 additions
and
5,315 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{ | ||
"hash": "c061862e27098cb02ccfc71ba1c8ea30", | ||
"result": { | ||
"markdown": "---\ntitle: \"Algorithmic Thinking Case Study 1\"\nsubtitle: \"SISMID 2024 -- Introduction to R\"\nformat:\n revealjs:\n toc: false\nexecute: \n echo: false\n---\n\n\n## Learning goals\n\n* Use logical operators, subsetting functions, and math calculations in R\n* Translate human-understandable problem descriptions into instructions that\nR can understand.\n\n# Remember, R always does EXACTLY what you tell it to do!\n\n## Instructions\n\n* Make a new R script for this case study, and save it to your code folder.\n* We'll use the diphtheria serosample data from Exercise 1 for this case study.\nLoad it into R and use the functions we've learned to look at it.\n\n## Instructions\n\n* Make a new R script for this case study, and save it to your code folder.\n* We'll use the diphtheria serosample data from Exercise 1 for this case study.\nLoad it into R and use the functions we've learned to look at it.\n* The `str()` of your dataset should look like this.\n\n\n\n\n::: {.cell}\n::: {.cell-output .cell-output-stdout}\n```\ntibble [250 × 5] (S3: tbl_df/tbl/data.frame)\n $ age_months : num [1:250] 15 44 103 88 88 118 85 19 78 112 ...\n $ group : chr [1:250] \"urban\" \"rural\" \"urban\" \"urban\" ...\n $ DP_antibody : num [1:250] 0.481 0.657 1.368 1.218 0.333 ...\n $ DP_infection: num [1:250] 1 1 1 1 1 1 1 1 1 1 ...\n $ DP_vacc : num [1:250] 0 1 1 1 1 1 1 1 1 1 ...\n```\n:::\n:::\n\n\n## Q1: Was the overall prevalence higher in urban or rural areas?\n\n::: {.incremental}\n\n1. How do we calculate the prevalence from the data?\n1. How do we calculate the prevalence separately for urban and rural areas?\n1. How do we determine which prevalence is higher and if the difference is\nmeaningful?\n\n:::\n\n## Q1: How do we calculate the prevalence from the data?\n\n::: {.incremental}\n\n* The variable `DP_infection` in our dataset is binary / dichotomous.\n* The prevalence is the number or percent of people who had the disease over\nsome duration.\n* The average of a binary variable gives the prevalence!\n\n:::\n\n. . .\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(diph$DP_infection)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.8\n```\n:::\n:::\n\n\n## Q1: How do we calculate the prevalence separately for urban and rural areas?\n\n. . .\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(diph[diph$group == \"urban\", ]$DP_infection)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.8235294\n```\n:::\n\n```{.r .cell-code}\nmean(diph[diph$group == \"rural\", ]$DP_infection)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 0.778626\n```\n:::\n:::\n\n\n. . .\n\n* There are many ways you could write this code! You can use `subset()` or you\ncan write the indices many ways.\n* Using `tbl_df` objects from `haven` uses different `[[` rules than a base R\ndata frame.\n\n## Q1: How do we calculate the prevalence separately for urban and rural areas?\n\n* One easy way is to use the `aggregate()` function.\n\n\n::: {.cell}\n\n```{.r .cell-code}\naggregate(DP_infection ~ group, data = diph, FUN = mean)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n group DP_infection\n1 rural 0.7786260\n2 urban 0.8235294\n```\n:::\n:::\n\n\n## Q1: How do we determine which prevalence is higher and if the difference is meaningful?\n\n::: {.incremental}\n\n* We probably need to include a confidence interval in our calculation.\n* This is actually not so easy without more advanced tools that we will learn\nin upcoming modules.\n* Right now the best options are to do it by hand or google a function.\n\n:::\n\n## Q1: By hand\n\n\n::: {.cell}\n\n```{.r .cell-code}\np_urban <- mean(diph[diph$group == \"urban\", ]$DP_infection)\np_rural <- mean(diph[diph$group == \"rural\", ]$DP_infection)\nse_urban <- sqrt(p_urban * (1 - p_urban) / nrow(diph[diph$group == \"urban\", ]))\nse_rural <- sqrt(p_rural * (1 - p_rural) / nrow(diph[diph$group == \"rural\", ])) \n\nresult_urban <- paste0(\n\t\"Urban: \", round(p_urban, 2), \"; 95% CI: (\",\n\tround(p_urban - 1.96 * se_urban, 2), \", \",\n\tround(p_urban + 1.96 * se_urban, 2), \")\"\n)\n\nresult_rural <- paste0(\n\t\"Rural: \", round(p_rural, 2), \"; 95% CI: (\",\n\tround(p_rural - 1.96 * se_rural, 2), \", \",\n\tround(p_rural + 1.96 * se_rural, 2), \")\"\n)\n\ncat(result_urban, result_rural, sep = \"\\n\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nUrban: 0.82; 95% CI: (0.76, 0.89)\nRural: 0.78; 95% CI: (0.71, 0.85)\n```\n:::\n:::\n\n\n## Q1: By hand\n\n* We can see that the 95% CI's overlap, so the groups are probably not that\ndifferent. **To be sure, we need to do a 2-sample test! But this is not a\nstatistics class.**\n* Some people will tell you that coding like this is \"bad\". **But 'bad' code\nthat gives you answers is better than broken code!** We will learn techniques for writing this with less work and less repetition\nin upcoming modules.\n\n## Q1: Googling a package\n\n. . .\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# install.packages(\"DescTools\")\nlibrary(DescTools)\n\naggregate(DP_infection ~ group, data = diph, FUN = DescTools::MeanCI)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n group DP_infection.mean DP_infection.lwr.ci DP_infection.upr.ci\n1 rural 0.7786260 0.7065872 0.8506647\n2 urban 0.8235294 0.7540334 0.8930254\n```\n:::\n:::\n\n\n## You try it!\n\n* Using any of the approaches you can think of, answer this question!\n* **How many children under 5 were vaccinated? In children under 5, did\nvaccination lower the prevalence of infection?**\n\n## You try it!\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# How many children under 5 were vaccinated\nsum(diph$DP_vacc[diph$age_months < 60])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] 91\n```\n:::\n\n```{.r .cell-code}\n# Prevalence in both vaccine groups for children under 5\naggregate(\n\tDP_infection ~ DP_vacc,\n\tdata = subset(diph, age_months < 60),\n\tFUN = DescTools::MeanCI\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n DP_vacc DP_infection.mean DP_infection.lwr.ci DP_infection.upr.ci\n1 0 0.4285714 0.1977457 0.6593972\n2 1 0.6373626 0.5366845 0.7380407\n```\n:::\n:::\n\n\nIt appears that prevalence was HIGHER in the vaccine group? That is\ncounterintuitive, but the sample size for the unvaccinated group is too small\nto be sure.\n\n## Congratulations for finishing the first case study!\n\n* What R functions and skills did you practice?\n* What other questions could you answer about the same dataset with the skills\nyou know now?\n", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": { | ||
"include-after-body": [ | ||
"\n<script>\n // htmlwidgets need to know to resize themselves when slides are shown/hidden.\n // Fire the \"slideenter\" event (handled by htmlwidgets.js) when the current\n // slide changes (different for each slide format).\n (function () {\n // dispatch for htmlwidgets\n function fireSlideEnter() {\n const event = window.document.createEvent(\"Event\");\n event.initEvent(\"slideenter\", true, true);\n window.document.dispatchEvent(event);\n }\n\n function fireSlideChanged(previousSlide, currentSlide) {\n fireSlideEnter();\n\n // dispatch for shiny\n if (window.jQuery) {\n if (previousSlide) {\n window.jQuery(previousSlide).trigger(\"hidden\");\n }\n if (currentSlide) {\n window.jQuery(currentSlide).trigger(\"shown\");\n }\n }\n }\n\n // hookup for slidy\n if (window.w3c_slidy) {\n window.w3c_slidy.add_observer(function (slide_num) {\n // slide_num starts at position 1\n fireSlideChanged(null, w3c_slidy.slides[slide_num - 1]);\n });\n }\n\n })();\n</script>\n\n" | ||
] | ||
}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
Oops, something went wrong.