Merge pull request #3 from seandavi/master

pulling in latest sean changes
seandavi · Apr 3, 2020 · 9d90ef3 · 9d90ef3
2 parents beaf21f + cfe13bc
commit 9d90ef3
Show file tree

Hide file tree

Showing 3 changed files with 204 additions and 79 deletions.
diff --git a/.github/workflows/update_excel_sheet_daily.yml b/.github/workflows/update_excel_sheet_daily.yml
@@ -15,12 +15,9 @@
 name: Publish artifacts
 
 on:
-  push:
-    branches:
-    - master
   schedule:
     # uses UTC
-    - cron:  '0 08 * * *'
+    - cron:  '0 */4 * * *'
 
 # Environment variables available to all jobs and steps in this workflow
 env:

diff --git a/README.md b/README.md
@@ -1,82 +1,106 @@
-# sars2pack
+sars2pack
+=========
 
 The sars2pack R package includes data resources, workflows, and data
-science tools to understand and interpret the COVID-19
-pandemic. Access to data resources is "real-time" to get the most
-up-to-date information. Use cases and introductory material are
-available in vignettes and in documentation.
-
-Contributions are welcome. Have an interesting analysis that you'd
-like to share, write up an R markdown document and contribute it as a
-vignette. If you are not used to working collaboratively in github,
-just post a [new
-issue](https://github.com/seandavi/sars2pack/issues/new) to ask for
-help.
-
-Thanks to the armies of people providing data for the rest of the
-world, often on a volunteer basis. Without their tireless work, we
-would not have these rapidly-developing resources.
-
-# Install
-
-```
-BioManager::install('seandavi/sars2pack')
-# OR
-devtools::install_github('seandavi/sars2pack')
-```
-
-# Features
-
-## Data resources available
-
-- Johns Hopkins global pandemic time series
-- USAFacts county-level US epidemic data
-- NYTimes state and county-level US epidemic data
-- COVIDTracker.com US state- and county-level epidemic data that includes detailed positive, negative, and pending test data
-- US Healthcare Capacity dataset, including details on US hospital capacity and capabilities.
-- US County-level metadata, including geospatial data
-
-## Capabilities
-
-- [X] Access data from multiple sources: jhu_data(), usa_facts_data(), nytimes_county_data(), nytimes_state_data(), covidtracker_data(), etc.
-- [X] Estimate R0 for localities or countries.
-- [X] Visualize time series pandemic case growth
-- [X] Perform data mashups between COVID-19 pandemic data and
-      additional geographic, financial, and demographic datasets
-- [ ] Create static and interactive maps of COVID-19 data for states,
-      countries, or even counties.
-
-## Visualization
-
-![](man/figures/africa_geo.png)
-![](man/figures/cc_ts_plot_log-1.png)
+science tools to understand and interpret the COVID-19 pandemic. Access
+to data resources is “real-time” to get the most up-to-date information.
+Use cases and introductory material are available in vignettes and in
+documentation.
+
+Contributions are welcome. Have an interesting analysis that you’d like
+to share, write up an R markdown document and contribute it as a
+vignette. If you are not used to working collaboratively in github, just
+post a [new issue](https://github.com/seandavi/sars2pack/issues/new) to
+ask for help.
+
+Thanks to the armies of people providing data for the rest of the world,
+often on a volunteer basis. Without their tireless work, we would not
+have these rapidly-developing resources.
+
+Install
+=======
+
+    BioManager::install('seandavi/sars2pack')
+    # OR
+    devtools::install_github('seandavi/sars2pack')
+
+Features
+========
+
+Data resources available
+------------------------
+
+-   Johns Hopkins global pandemic time series
+-   USAFacts county-level US epidemic data
+-   NYTimes state and county-level US epidemic data
+-   COVIDTracker.com US state- and county-level epidemic data that
+    includes detailed positive, negative, and pending test data
+-   US Healthcare Capacity dataset, including details on US hospital
+    capacity and capabilities.
+-   US County-level metadata, including geospatial data
+
+Capabilities
+------------
+
+-   Access data from multiple sources: jhu\_data(), usa\_facts\_data(),
+    nytimes\_county\_data(), nytimes\_state\_data(),
+    covidtracker\_data(), etc.
+-   Estimate R0 for localities or countries.
+-   Visualize time series pandemic case growth
+-   Perform data mashups between COVID-19 pandemic data and additional
+    geographic, financial, and demographic datasets
+-   Create static and interactive maps of COVID-19 data for states,
+    countries, or even counties.
+
+Visualization
+-------------
+
+![](man/figures/africa_geo.png) ![](man/figures/cc_ts_plot_log-1.png)
 ![](man/figures/epicurve_and_model.png)
 
-
-# Workflow status
+Workflow status
+===============
 
 Automated workflows and current status.
 
-| Workflow Status | Description |
-| --- | --- |
-| ![Publish artifacts](https://github.com/seandavi/sars2pack/workflows/Publish%20artifacts/badge.svg) | Produces regularly updated data resources and products |
-| ![pkgdown site](https://github.com/seandavi/sars2pack/workflows/pkgdown%20site/badge.svg) | Prepare and publish [pkgdown documentation](https://seandavi.github.io/sars2pack/) |
-
-
-
-# Contribute
-
-To contribute to this package please make a fork and then issue pull requests.
-
-# Resources
-
-## Similar work
-
-- https://github.com/emanuele-guidotti/COVID19
-- [Top 25 R resources on Novel COVID-19 Coronavirus](https://towardsdatascience.com/top-5-r-resources-on-covid-19-coronavirus-1d4c8df6d85f)
-- [COVID-19 epidemiology with R](https://rviews.rstudio.com/2020/03/05/covid-19-epidemiology-with-r/)
-- https://github.com/RamiKrispin/coronavirus
-- [Youtube: Using R to analyze COVID-19](https://www.youtube.com/watch?v=D_CNmYkGRUc)
-- [DataCamp: Visualize the rise of COVID-19 cases globally with ggplot2](https://www.datacamp.com/projects/870)
-
-
+<table>
+<thead>
+<tr class="header">
+<th>Workflow Status</th>
+<th>Description</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td><img src="https://github.com/seandavi/sars2pack/workflows/Publish%20artifacts/badge.svg" alt="Publish artifacts" /></td>
+<td>Produces regularly updated data resources and products</td>
+</tr>
+<tr class="even">
+<td><img src="https://github.com/seandavi/sars2pack/workflows/pkgdown%20site/badge.svg" alt="pkgdown site" /></td>
+<td>Prepare and publish <a href="https://seandavi.github.io/sars2pack/">pkgdown documentation</a></td>
+</tr>
+</tbody>
+</table>
+
+Contribute
+==========
+
+To contribute to this package please make a fork and then issue pull
+requests.
+
+Resources
+=========
+
+Similar work
+------------
+
+-   <https://github.com/emanuele-guidotti/COVID19>
+-   [Top 25 R resources on Novel COVID-19
+    Coronavirus](https://towardsdatascience.com/top-5-r-resources-on-covid-19-coronavirus-1d4c8df6d85f)
+-   [COVID-19 epidemiology with
+    R](https://rviews.rstudio.com/2020/03/05/covid-19-epidemiology-with-r/)
+-   <https://github.com/RamiKrispin/coronavirus>
+-   [Youtube: Using R to analyze
+    COVID-19](https://www.youtube.com/watch?v=D_CNmYkGRUc)
+-   [DataCamp: Visualize the rise of COVID-19 cases globally with
+    ggplot2](https://www.datacamp.com/projects/870)
diff --git a/inst/original/CountyEpiEstim.R b/inst/original/CountyEpiEstim.R
@@ -0,0 +1,104 @@
+# From Charles Morefield
+# Via biosecurity email list
+# April 2, 2020
+# 
+# 
+# Current status: a very simple downloader for covid-19 data and R estimator.
+# A work in progress (clm apr 1 2020)
+library(EpiEstim)
+library(tidyverse)
+library(lubridate)
+library(readxl)
+library(incidence)
+#library(ggplot2)
+#library(R0)
+library(kableExtra)
+library(knitr)
+
+# A row of the input spreadsheet may have zeros at the start,
+# which we eliminate using  little function called trimLeading.
+trimLeading <- function(x, value=0) {
+  w <- which.max(cummax(x != value))
+  x[seq.int(w, length(x))]
+  }
+
+# Create the USAFacts URL where a Covid-19 dataset is stored with automatic updates every day
+dataURL <- paste("https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv")
+# Select a destination path and destination file
+destfile <- "/Users/Chuck/Pandemic/rPandemicProject/PandemicData/covid_confirmed_usafacts.csv"
+# Download data
+download.file(dataURL, destfile)
+# Insert entire csv file into an R object
+USAfactsData <- as_tibble(read_csv(destfile), col_names = TRUE)
+#NOTE: for NYC better to use NYC Health data instead
+#countyfips <- 51510 # "countyFIPS" designation for Alexandria City VA
+#countyfips <- 51013 # "countyFIPS" designation for Arlington VA
+#countyfips <- 33009 # "countyFIPS" designation for Grafton County (Dartmouth)
+#countyfips <- 9001 # "countyFIPS" designation for Fairfield County
+#countyfips <- 13121 # "countyFIPS" designation for Fulton County (Atlanta)
+#countyfips <- 12057 # "countyFIPS" designation for Hillsborough County (Tampa)
+#countyfips <- 22071 # "countyFIPS" designation for Orleans Parish (New Orleans)
+#countyfips <- 33017 # "countyFIPS" designation for Strafford County (Durham)
+#countyfips <- 25025 # "countyFIPS" designation for Suffolk County (Boston)
+#countyfips <- 11001 # "countyFIPS" designation for Washington DC
+us_counties <- c(51510,51013,33009,9001,13121,12057,22071,33017,25025,11001)
+names(us_counties) <- c("AlexandriaCity,VA", "Arlington,VA","Grafton(Hanover)","Fairfield(CT)","Fulton(Atlanta)",
+                   "Hillsborough(Tampa)","OrleansParish","Strafford(Durham)","Suffolk(Boston)","Washington,DC")
+MeanSI <- 3.96
+StdSI  <- 4.75
+
+# Define a custom R function compute_res_parametric_si using EpiEstim to compute R for a specific county and input SI
+compute_res_parametric_si <- function(USAfactsData,countyfips,MeanSI,StdSI) {
+# pull the designated row of the download (this ASSUMES only one row is pulled)
+localCovidData <- as.vector(USAfactsData %>% filter(countyFIPS == countyfips))
+localCovidData <- localCovidData[-c(1,2,3,4)] # Remove the first four metadata values leaving just the incidence values
+for (k in 2:length(localCovidData)) {
+   if (localCovidData[k] < localCovidData[k-1]) localCovidData[k-1] <- localCovidData[k]
+}
+# Change from cumulative to incidence count
+i <- diff(as.numeric(localCovidData))
+names(i) <- names(localCovidData[-1])
+i <- trimLeading(i, value = 0) # Remove leading zeros from the row we just pulled
+i[i < 0] <- 0 # Make sure there are no places where cum reports go down, day-by-day (=> DATA ERROR in database)
+if (is.na(i[length(i)]) == TRUE) i <- i[-length(i)] # Remove last element if it is "NA"
+#if (i[length(i)] == 0) i <- i[-length(i)] # If last raw report is unchanged then MAYBE data missing. Trim the last date??
+
+# Finalize the incidence matrix "inc":
+dates <- mdy(names(i))
+I <- as.numeric(i)
+inc <- tibble(dates,I)
+# Set up the data structure needed as input to EpiEstim.
+# We assume SI mean and s.d. will be input from the published literature, so si_distr is set to NULL.
+localCovidData  <- list(inc,NULL)
+names(localCovidData) <- c("incidence","si_distr")
+# plot(as.incidence(localCovidData$incidence$I, dates = localCovidData$incidence$dates))
+res_parametric_si <- estimate_R(localCovidData$incidence, method = "parametric_si",
+                        config = make_config(list(mean_si = MeanSI, std_si = StdSI)))
+res_parametric_si
+}  # End of the custom function compute_res_parametric_si
+county_res_parametric_si <- data.frame()
+meanR <- c()
+dateR <- c()
+cumCases <- c()
+modelR <- c()
+meanSerInt <- c()
+sdSerInt <- c()
+sourceR <- c()
+for (j in 1:length(us_counties)) {
+   res_parametric_si <- compute_res_parametric_si(USAfactsData,countyfips = us_counties[j],MeanSI,StdSI)
+   outputR <- res_parametric_si$R[,3]
+   meanR <- c(meanR, outputR[length(outputR)])
+   dateR <- c(dateR, res_parametric_si$dates[length(res_parametric_si$dates)])
+   cumCases <- c(cumCases,sum(res_parametric_si$I))
+   modelR <- c(modelR,"EpiEstim.parametric_si")
+   meanSerInt <- c(meanSerInt,MeanSI)
+   sdSerInt  <- c(sdSerInt,StdSI)
+   sourceR <- c(sourceR,"usafacts.org")
+#plot(county_res_parametric_si, legend = FALSE)
+}
+#names(meanR) <- names(us_counties)
+dateR <- as.Date(dateR, origin="1970-01-01") # Convert integer date to standard nomenclature
+outputTable_meanR <- as.data.frame(list(dateR,meanR,cumCases,modelR,meanSerInt,sdSerInt,sourceR))
+rownames(outputTable_meanR) <- names(us_counties)
+colnames(outputTable_meanR) <- c("Date","Mean R","cumCases","Package.Method (CRAN)","Mean SI","StdDev SI","Data Source")
+kable(outputTable_meanR,align = "c",digits = 2, "pandoc")