-
Notifications
You must be signed in to change notification settings - Fork 0
/
code.Rmd
204 lines (153 loc) · 7.17 KB
/
code.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
---
title: "Analysis of PEF-values with bayesian methods"
author: "Johannes Rajala"
date: "`r Sys.Date()`"
output: github_document
---
### 1. Introduction
In this document, we will analyze the peak expiratory flow (PEF) values of a person in order to gather evidence whether the person might have asthma or not. These values alone are not enough to make a diagnosis, but they can be used as a part of the diagnostic process.
The data consists of PEF-values measured in the morning and in the evening. Six measurements are taken each time, three before and three after taking asthma medication. The data was gathered over a period of seven days.
```{r, warning=FALSE, message=FALSE, echo=FALSE}
library(ggplot2)
library(ggridges)
library(bayestools)
library(gridExtra)
library(dplyr)
library(rstan)
library(loo)
library(brms)
cores = parallel::detectCores() / 2
options(mc.cores = cores)
```
We consider the following to be evidence for asthma:
1. The relative effect of medication (variable name $\textit{treatment}$) is over 15$\%$ with at least 3/14 probability.
2. The relative effect of time of day (variable name $\textit{circadian}$) is over 20$\%$ with at least 3/14 probability.
And last, the reference value for the mean PEF-value for the person in question (variable name $\alpha$) is 442.
```{r, echo=FALSE}
# set current project folder as working directory
setwd(here::here())
pefdata <- read.csv("./pefdata.csv")
# encode morning and evening as 0 and 1
pefdata$Time_of_day <- ifelse(pefdata$Time_of_day == "Morning", 0, 1)
```
### 2. Model
The model we will use is a Bayesian linear model. We will model the PEF measurements as a linear combination of the medication effect, the time of day effect, and the day of the week effect. The model is defined as follows:
$y \sim N(\alpha + \beta_1 x + \beta_2 t + \beta_3 d, \sigma)$,
where: $\alpha$ is the intercept, $x$ is the medication indicator, $t$ is the time of day indicator, $d$ is the day of the week, and $\sigma$ is the standard deviation.
We will use weakly informative normal prior distributions for the parameters. The intercept $\alpha$ is given a normal prior with mean 400 and standard deviation 100. The effects of medication, time of day, and day of the week are given normal priors with mean 0 and standard deviation 100. The standard deviation $\sigma$ is given a half-normal prior with standard deviation 100.
### 3. Analysis
#### 3.1. Model in Stan
Let us define a Bayesian linear mode for the PEF measurements in Stan and draw samples from the posterior distribution.
```{stan, output.var="simple", echo=TRUE, results="hide"}
data {
int<lower=0> N; // number of observations
array[N] real<lower=0> y; // PEF measurements
array[N] int<lower=0,upper=1> x; // medication indicator (0=before, 1=after)
array[N] int<lower=0,upper=1> t; // time of day indicator (0=morning, 1=evening)
array[N] int<lower=1,upper=7> d; // day of the follow-up
}
parameters {
real alpha; // intercept
real treatment; // effect of medication
real circadian; // effect of time of day
real followup; // effect of day of the week
real<lower=0> sigma; // standard deviation
}
transformed parameters {
vector[N] mu;
for (n in 1:N) {
mu[n] = alpha + treatment * x[n] + circadian * t[n] + followup * d[n];
}
}
model {
// priors
alpha ~ normal(400, 100); // weakly informative prior for the intercept
treatment ~ normal(0, 100); // weakly informative prior for the effect of medication
circadian ~ normal(0, 100); // weakly informative prior for the effect of time of day
followup ~ normal(0, 100); // weakly informative prior for the effect of day of the week
sigma ~ normal(0, 100); // half-normal due to constraint for the standard deviation
// likelihood
y ~ normal(mu, sigma);
}
generated quantities {
vector[N] log_lik;
for (n in 1:N) {
log_lik[n] = normal_lpdf(y[n] | mu[n], sigma);
}
}
```
```{r, echo=FALSE, results="hide"}
stan_data <- list(
y = pefdata$Measurement,
x = pefdata$Medication,
t = pefdata$Time_of_day,
d = pefdata$Day,
N = nrow(pefdata)
)
# Draw samples
model_simple <- rstan::sampling(simple, stan_data, chains = cores, iter = 10000)
draws_simple <- rstan::extract(model_simple)
```
#### 3.2. Diagnostics
Let's check diagnostics.
```{r, echo=FALSE}
monitor = rstan::monitor(model_simple, print = FALSE)
monitor[seq(1,5), c(1,2,3,10)]
```
$\hat R$ diagnostic values are below 1.01, which indicates that the chains have converged. Let's move onto the analysis.
#### 3.3. Results
Lets plot the posterior distribution of $\alpha$ (the mean value) with a 95$\%$ credible interval, with the reference value of 442.
```{r, echo=FALSE}
p = mean(draws_simple$alpha > 442)
ci = quantile(draws_simple$alpha, c(0.025, 0.975))
# plot alpha and its ci
p1 = ggplot(as.data.frame(draws_simple)) +
geom_density(aes(x=alpha), fill="blue", alpha=0.5) +
geom_vline(xintercept=442, linetype="dashed", color="purple") +
geom_vline(xintercept=ci[1], linetype="dashed", color="red") +
geom_vline(xintercept=ci[2], linetype="dashed", color="red") +
annotate("text", x=450, y=0.11, label=paste("P(mean PEF value > 442) = ", round(p, 4))) +
xlab("Mean PEF value") +
ggtitle("Posterior distribution of alpha", subtitle = "With 95% credible interval and the reference value") +
theme_minimal()
p1
```
The probability that $\alpha > 442$ is approximately 0.19$\%$.
```{r, echo=FALSE}
# 95$\%$ credible intervals for medication on time of day
medication_effect <- draws_simple$treatment
time_of_day_effect <- draws_simple$circadian
day_of_week_effect <- draws_simple$followup
# Lets calculate the relative effects of medication and time of day
# alpha plus time
alpha_time <- draws_simple$alpha + draws_simple$circadian
# alpha + medication
alpha_med <- draws_simple$alpha + draws_simple$treatment
# alpha_time / alpha
alpha_time_alpha <- alpha_time / draws_simple$alpha
# alpha_med / alpha
alpha_med_alpha <- alpha_med / draws_simple$alpha
# Probability that the relative effect of medication is over 15$%
p_med <- mean(alpha_med_alpha < 1.15)
# Probability that the relative effect of time of day is over 20$%
p_time <- mean(alpha_time_alpha < 1.20)
# Plot the relative effects
p2 = ggplot() +
geom_density(aes(x=alpha_med_alpha), fill="blue", alpha=0.5) +
geom_vline(xintercept=1.15, linetype="dashed", color="purple") +
annotate("text", x=1.1, y=45, label=paste("P(relative effect of medication < 15%) = ", round(p_med, 4))) +
xlab("Relative effect of medication") +
ggtitle("Posterior distribution of the relative effect of medication", subtitle = "With 15% threshold") +
theme_minimal()
p3 = ggplot() +
geom_density(aes(x=alpha_time_alpha), fill="blue", alpha=0.5) +
geom_vline(xintercept=1.20, linetype="dashed", color="purple") +
annotate("text", x=1.1, y=45, label=paste("P(relative effect of time of day < 20%) = ", round(p_time, 4))) +
xlab("Relative effect of time of day") +
ggtitle("Posterior distribution of the relative effect of time of day", subtitle = "With 20% threshold") +
theme_minimal()
p2
p3
```
Both the relative effect of medication and time of day are less than the thresholds of 15$\%$ and 20$\%$, respectively, with probabilities of 1 and 1.
No strong evidence for the person having asthma is present.