APPENDIX R-CODE

The goal of the Appendix of R-code is to provide interested readers in more information about the Bayesian model and modeling code.

This document contains more information on the Bayesian estimation and statistical analysis. The main analysis script (main_script.R) contains the R-code that was used to load and restructure the data. The Bayesian model as written in stan can be consulted in the stan_model.stan-file.

Bayesian methods

In this paper, our goal is to estimate the true underlying seroprevalence of the population ≥ 18 and ≤ 65 years old as measured every two weeks, w in Belgium, denoted p_w^* (w = 1, …, W = 20).

We start by estimating the probability that each person in the serosurvey is seropositive using a Bayesian logistic regression model that accounts for the sensitivity and specificity of the ELISA assay, each individual’s age, sex, and current province, iterated over each two weeks they were sampled:

where x_i is the result of the IgG ELISA (in primary analyses) for the ith person (i = 1, …, N= 129) in the serosurvey. The sensitivity, θ⁺, is determined using n⁺ RT-PCR positive controls from the lab validation study, of which x⁺ tested positive. The specificity, θ⁻ , is determined using n⁻ pre-pandemic negative controls, of which x⁻ tested positive.

The probability of observing a diagnostic positive is a function of the true positive rate and the false negative rate with regards to the true underlying probability of seropositivity p_i for that person. This probability itself is a function of covariates X, which consists of sex, age categories, and week of study, and their coefficients β, and a random effect for household, α_h (h = 1, …, H= 129), with variance σ². We used naive priors on all parameters to allow for an exploration of the parameter space. The priors on the sensitivity and specificity were flat from 0 to 1, equivalent to Uniform(0, 1) or Bet**a(1, 1). We used weak Norma**l(0, 3) priors for the logistic regression coefficients β. The prior on the standard deviation of the household effect, σ, was flat from 0 to infinity (we tested a positive half-Normal and it did not affect estimates).

Bayesian MCMC Diagnostics

Divergence

In a first step, the number of chains that diverged where evaluated using the rstan::check_divergences-function:

Trace plots

A first diagnostic plot is the trace plot, which is a time series plot of the Markov chains. That is, a trace plot shows the evolution of parameter vector over the iterations of one or many Markov chains

## check the model
posterior_fit <- as.array(fit) ## save as array to plot
pl_age <- mcmc_trace(posterior_fit, regex_pars = c("age"))
pl_sex <- mcmc_trace(posterior_fit, regex_pars = c("sex"))
pl_prov <- mcmc_trace(posterior_fit, regex_pars = c("prov"))

For age, this resulted in the following trace plot:

pl_age

For sex, this resulted in the following trace plot:

pl_sex

For province, this resulted in the following trace plot:

pl_prov

R-hat

Another way to evaluate chain convergence to an equilibrium distribution is to compare its behavior to other initialized chains. This is the motivation for the potential scale reduction statistic, split-R̂. The split-R̂ statistic measures the ratio of the average variance of draws within each chain to the variance of the pooled draws across chains; if all chains are at equilibrium, these will be the same and split-R̂ will be one. If the chains have not converged to a common distribution, the split-R̂ statistic will be greater than one (see Gelman et al. 2013, Stan Development Team 2018).

## RHAT STATISTIC
rhats <- rhat(fit, pars = c("beta_age", "beta_sex", "beta_prov", 
                                            "sigma_age", "sigma_prov", "sigma_sex"))
mcmc_rhat(rhats)

ESS

The effective sample size is an estimate of the number of independent draws from the posterior distribution of the estimand of interest. The n_eff metric used in Stan is based on the ability of the draws to estimate the true mean value of the parameter. Because the draws within a Markov chain are not independent if there is autocorrelation, the effective sample size, n_eff, is usually smaller than the total sample size, N. The larger the ratio of n_eff to N the better (see Gelman et al. 2013, Stan Development Team 2018 for more details) .

## ESS STATISTIC
ess <- neff_ratio(fit, pars = c("beta_age", "beta_sex", "beta_prov", 
                                                "sigma_age", "sigma_prov", "sigma_sex",
                                                "phi"))
mcmc_neff(ess, size = 2)

Bayesian Output

Bayesian parameter estimation was performed by calculating the median of the posterior draws of ϕ. The 95% CI was calculated based on the Highest (Posterior) Density Interval (HDI) of ϕ.

est <- tibble(median = median(fit_mcmc$phi),
              hdill = hdi(fit_mcmc$phi, credMass = 0.95)[1],
              hdiul = hdi(fit_mcmc$phi, credMass = 0.95)[2])

Risk ratio’s were calculated by dividing the posterior distribution of the Bayesian logistic regression model parameters over a reference category, respectively for each independent variable (i.e. age, prov, and sex). As an example, the R**R-calculation of sex is given by

$P(RR_{sex}) = \frac{P(\beta_{FEMALE})}{P(\beta_{MALE})}$,

where P(β_gende**r) equals the posterior distribution of the female and male sex respectively, and P(R**R_sex), the posterior distribution of the RR.

est_all$sex %>%
  summarise(RR = median(`1`/`2`),
            HDI_lower = HDInterval::hdi(`1`/`2`, credMass = 0.95)[1,1],
            HDI_upper = HDInterval::hdi(`1`/`2`, credMass = 0.95)[2,1]) %>%
  mutate(`Category` = "Female (reference Male)") %>%
  select(Category, RR, `HDI (lower)` = HDI_lower, `HDI (upper)` = HDI_lower) %>%
  flextable() %>%
  colformat_double(digits = 3) %>%
  autofit()

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README_files/figure-gfm		README_files/figure-gfm
example_data		example_data
.gitignore		.gitignore
README.Rmd		README.Rmd
README.md		README.md
RR_est.R		RR_est.R
main_script.R		main_script.R
seroprevalence_paper.Rproj		seroprevalence_paper.Rproj
stan_model.stan		stan_model.stan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APPENDIX R-CODE

Bayesian methods

Bayesian MCMC Diagnostics

Divergence

Trace plots

R-hat

ESS

Bayesian Output

About

Releases

Packages

Languages

sciensanogit/seroprevalence_paper

Folders and files

Latest commit

History

Repository files navigation

APPENDIX R-CODE

Bayesian methods

Bayesian MCMC Diagnostics

Divergence

Trace plots

R-hat

ESS

Bayesian Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages