Skip to content

Commit

Permalink
Merge pull request #3 from seandavi/master
Browse files Browse the repository at this point in the history
pulling in latest sean changes
  • Loading branch information
vjcitn authored Apr 3, 2020
2 parents beaf21f + cfe13bc commit 9d90ef3
Show file tree
Hide file tree
Showing 3 changed files with 204 additions and 79 deletions.
5 changes: 1 addition & 4 deletions .github/workflows/update_excel_sheet_daily.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,9 @@
name: Publish artifacts

on:
push:
branches:
- master
schedule:
# uses UTC
- cron: '0 08 * * *'
- cron: '0 */4 * * *'

# Environment variables available to all jobs and steps in this workflow
env:
Expand Down
174 changes: 99 additions & 75 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,82 +1,106 @@
# sars2pack
sars2pack
=========

The sars2pack R package includes data resources, workflows, and data
science tools to understand and interpret the COVID-19
pandemic. Access to data resources is "real-time" to get the most
up-to-date information. Use cases and introductory material are
available in vignettes and in documentation.

Contributions are welcome. Have an interesting analysis that you'd
like to share, write up an R markdown document and contribute it as a
vignette. If you are not used to working collaboratively in github,
just post a [new
issue](https://github.com/seandavi/sars2pack/issues/new) to ask for
help.

Thanks to the armies of people providing data for the rest of the
world, often on a volunteer basis. Without their tireless work, we
would not have these rapidly-developing resources.

# Install

```
BioManager::install('seandavi/sars2pack')
# OR
devtools::install_github('seandavi/sars2pack')
```

# Features

## Data resources available

- Johns Hopkins global pandemic time series
- USAFacts county-level US epidemic data
- NYTimes state and county-level US epidemic data
- COVIDTracker.com US state- and county-level epidemic data that includes detailed positive, negative, and pending test data
- US Healthcare Capacity dataset, including details on US hospital capacity and capabilities.
- US County-level metadata, including geospatial data

## Capabilities

- [X] Access data from multiple sources: jhu_data(), usa_facts_data(), nytimes_county_data(), nytimes_state_data(), covidtracker_data(), etc.
- [X] Estimate R0 for localities or countries.
- [X] Visualize time series pandemic case growth
- [X] Perform data mashups between COVID-19 pandemic data and
additional geographic, financial, and demographic datasets
- [ ] Create static and interactive maps of COVID-19 data for states,
countries, or even counties.

## Visualization

![](man/figures/africa_geo.png)
![](man/figures/cc_ts_plot_log-1.png)
science tools to understand and interpret the COVID-19 pandemic. Access
to data resources is “real-time” to get the most up-to-date information.
Use cases and introductory material are available in vignettes and in
documentation.

Contributions are welcome. Have an interesting analysis that you’d like
to share, write up an R markdown document and contribute it as a
vignette. If you are not used to working collaboratively in github, just
post a [new issue](https://github.com/seandavi/sars2pack/issues/new) to
ask for help.

Thanks to the armies of people providing data for the rest of the world,
often on a volunteer basis. Without their tireless work, we would not
have these rapidly-developing resources.

Install
=======

BioManager::install('seandavi/sars2pack')
# OR
devtools::install_github('seandavi/sars2pack')

Features
========

Data resources available
------------------------

- Johns Hopkins global pandemic time series
- USAFacts county-level US epidemic data
- NYTimes state and county-level US epidemic data
- COVIDTracker.com US state- and county-level epidemic data that
includes detailed positive, negative, and pending test data
- US Healthcare Capacity dataset, including details on US hospital
capacity and capabilities.
- US County-level metadata, including geospatial data

Capabilities
------------

- Access data from multiple sources: jhu\_data(), usa\_facts\_data(),
nytimes\_county\_data(), nytimes\_state\_data(),
covidtracker\_data(), etc.
- Estimate R0 for localities or countries.
- Visualize time series pandemic case growth
- Perform data mashups between COVID-19 pandemic data and additional
geographic, financial, and demographic datasets
- Create static and interactive maps of COVID-19 data for states,
countries, or even counties.

Visualization
-------------

![](man/figures/africa_geo.png) ![](man/figures/cc_ts_plot_log-1.png)
![](man/figures/epicurve_and_model.png)


# Workflow status
Workflow status
===============

Automated workflows and current status.

| Workflow Status | Description |
| --- | --- |
| ![Publish artifacts](https://github.com/seandavi/sars2pack/workflows/Publish%20artifacts/badge.svg) | Produces regularly updated data resources and products |
| ![pkgdown site](https://github.com/seandavi/sars2pack/workflows/pkgdown%20site/badge.svg) | Prepare and publish [pkgdown documentation](https://seandavi.github.io/sars2pack/) |



# Contribute

To contribute to this package please make a fork and then issue pull requests.

# Resources

## Similar work

- https://github.com/emanuele-guidotti/COVID19
- [Top 25 R resources on Novel COVID-19 Coronavirus](https://towardsdatascience.com/top-5-r-resources-on-covid-19-coronavirus-1d4c8df6d85f)
- [COVID-19 epidemiology with R](https://rviews.rstudio.com/2020/03/05/covid-19-epidemiology-with-r/)
- https://github.com/RamiKrispin/coronavirus
- [Youtube: Using R to analyze COVID-19](https://www.youtube.com/watch?v=D_CNmYkGRUc)
- [DataCamp: Visualize the rise of COVID-19 cases globally with ggplot2](https://www.datacamp.com/projects/870)


<table>
<thead>
<tr class="header">
<th>Workflow Status</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><img src="https://github.com/seandavi/sars2pack/workflows/Publish%20artifacts/badge.svg" alt="Publish artifacts" /></td>
<td>Produces regularly updated data resources and products</td>
</tr>
<tr class="even">
<td><img src="https://github.com/seandavi/sars2pack/workflows/pkgdown%20site/badge.svg" alt="pkgdown site" /></td>
<td>Prepare and publish <a href="https://seandavi.github.io/sars2pack/">pkgdown documentation</a></td>
</tr>
</tbody>
</table>

Contribute
==========

To contribute to this package please make a fork and then issue pull
requests.

Resources
=========

Similar work
------------

- <https://github.com/emanuele-guidotti/COVID19>
- [Top 25 R resources on Novel COVID-19
Coronavirus](https://towardsdatascience.com/top-5-r-resources-on-covid-19-coronavirus-1d4c8df6d85f)
- [COVID-19 epidemiology with
R](https://rviews.rstudio.com/2020/03/05/covid-19-epidemiology-with-r/)
- <https://github.com/RamiKrispin/coronavirus>
- [Youtube: Using R to analyze
COVID-19](https://www.youtube.com/watch?v=D_CNmYkGRUc)
- [DataCamp: Visualize the rise of COVID-19 cases globally with
ggplot2](https://www.datacamp.com/projects/870)
104 changes: 104 additions & 0 deletions inst/original/CountyEpiEstim.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# From Charles Morefield
# Via biosecurity email list
# April 2, 2020
#
#
# Current status: a very simple downloader for covid-19 data and R estimator.
# A work in progress (clm apr 1 2020)
library(EpiEstim)
library(tidyverse)
library(lubridate)
library(readxl)
library(incidence)
#library(ggplot2)
#library(R0)
library(kableExtra)
library(knitr)

# A row of the input spreadsheet may have zeros at the start,
# which we eliminate using little function called trimLeading.
trimLeading <- function(x, value=0) {
w <- which.max(cummax(x != value))
x[seq.int(w, length(x))]
}

# Create the USAFacts URL where a Covid-19 dataset is stored with automatic updates every day
dataURL <- paste("https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv")
# Select a destination path and destination file
destfile <- "/Users/Chuck/Pandemic/rPandemicProject/PandemicData/covid_confirmed_usafacts.csv"
# Download data
download.file(dataURL, destfile)
# Insert entire csv file into an R object
USAfactsData <- as_tibble(read_csv(destfile), col_names = TRUE)
#NOTE: for NYC better to use NYC Health data instead
#countyfips <- 51510 # "countyFIPS" designation for Alexandria City VA
#countyfips <- 51013 # "countyFIPS" designation for Arlington VA
#countyfips <- 33009 # "countyFIPS" designation for Grafton County (Dartmouth)
#countyfips <- 9001 # "countyFIPS" designation for Fairfield County
#countyfips <- 13121 # "countyFIPS" designation for Fulton County (Atlanta)
#countyfips <- 12057 # "countyFIPS" designation for Hillsborough County (Tampa)
#countyfips <- 22071 # "countyFIPS" designation for Orleans Parish (New Orleans)
#countyfips <- 33017 # "countyFIPS" designation for Strafford County (Durham)
#countyfips <- 25025 # "countyFIPS" designation for Suffolk County (Boston)
#countyfips <- 11001 # "countyFIPS" designation for Washington DC
us_counties <- c(51510,51013,33009,9001,13121,12057,22071,33017,25025,11001)
names(us_counties) <- c("AlexandriaCity,VA", "Arlington,VA","Grafton(Hanover)","Fairfield(CT)","Fulton(Atlanta)",
"Hillsborough(Tampa)","OrleansParish","Strafford(Durham)","Suffolk(Boston)","Washington,DC")
MeanSI <- 3.96
StdSI <- 4.75

# Define a custom R function compute_res_parametric_si using EpiEstim to compute R for a specific county and input SI
compute_res_parametric_si <- function(USAfactsData,countyfips,MeanSI,StdSI) {
# pull the designated row of the download (this ASSUMES only one row is pulled)
localCovidData <- as.vector(USAfactsData %>% filter(countyFIPS == countyfips))
localCovidData <- localCovidData[-c(1,2,3,4)] # Remove the first four metadata values leaving just the incidence values
for (k in 2:length(localCovidData)) {
if (localCovidData[k] < localCovidData[k-1]) localCovidData[k-1] <- localCovidData[k]
}
# Change from cumulative to incidence count
i <- diff(as.numeric(localCovidData))
names(i) <- names(localCovidData[-1])
i <- trimLeading(i, value = 0) # Remove leading zeros from the row we just pulled
i[i < 0] <- 0 # Make sure there are no places where cum reports go down, day-by-day (=> DATA ERROR in database)
if (is.na(i[length(i)]) == TRUE) i <- i[-length(i)] # Remove last element if it is "NA"
#if (i[length(i)] == 0) i <- i[-length(i)] # If last raw report is unchanged then MAYBE data missing. Trim the last date??

# Finalize the incidence matrix "inc":
dates <- mdy(names(i))
I <- as.numeric(i)
inc <- tibble(dates,I)
# Set up the data structure needed as input to EpiEstim.
# We assume SI mean and s.d. will be input from the published literature, so si_distr is set to NULL.
localCovidData <- list(inc,NULL)
names(localCovidData) <- c("incidence","si_distr")
# plot(as.incidence(localCovidData$incidence$I, dates = localCovidData$incidence$dates))
res_parametric_si <- estimate_R(localCovidData$incidence, method = "parametric_si",
config = make_config(list(mean_si = MeanSI, std_si = StdSI)))
res_parametric_si
} # End of the custom function compute_res_parametric_si
county_res_parametric_si <- data.frame()
meanR <- c()
dateR <- c()
cumCases <- c()
modelR <- c()
meanSerInt <- c()
sdSerInt <- c()
sourceR <- c()
for (j in 1:length(us_counties)) {
res_parametric_si <- compute_res_parametric_si(USAfactsData,countyfips = us_counties[j],MeanSI,StdSI)
outputR <- res_parametric_si$R[,3]
meanR <- c(meanR, outputR[length(outputR)])
dateR <- c(dateR, res_parametric_si$dates[length(res_parametric_si$dates)])
cumCases <- c(cumCases,sum(res_parametric_si$I))
modelR <- c(modelR,"EpiEstim.parametric_si")
meanSerInt <- c(meanSerInt,MeanSI)
sdSerInt <- c(sdSerInt,StdSI)
sourceR <- c(sourceR,"usafacts.org")
#plot(county_res_parametric_si, legend = FALSE)
}
#names(meanR) <- names(us_counties)
dateR <- as.Date(dateR, origin="1970-01-01") # Convert integer date to standard nomenclature
outputTable_meanR <- as.data.frame(list(dateR,meanR,cumCases,modelR,meanSerInt,sdSerInt,sourceR))
rownames(outputTable_meanR) <- names(us_counties)
colnames(outputTable_meanR) <- c("Date","Mean R","cumCases","Package.Method (CRAN)","Mean SI","StdDev SI","Data Source")
kable(outputTable_meanR,align = "c",digits = 2, "pandoc")

0 comments on commit 9d90ef3

Please sign in to comment.