-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.qmd
281 lines (224 loc) · 13.4 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
---
title: "README.md"
author: Sam A. Welch
date: 2024.05.09
format:
gfm:
toc: true
code-fold: true
code-summary: "Show code"
editor_options:
chunk_output_type: inline
---
This README.md file is a rendering of the .qmd document of the same name, written as part of the coding task for a postdoctoral position at NIVA. All code is my own work. The (indirect) assistance of the authors of the below packages in completing this project is gratefully acknowledged. This project was created using R version 4.4.0, and may not run properly under other versions.
## Setup
```{r packages, warning=FALSE, message=FALSE}
library(tidyverse) # tidyverse packages for data cleaning, graphs, etc.
library(webchem) # access to chemical database APIs
library(readxl) # read Excel files
library(sf) # GIS functions
library(giscoR) # Eurostat maps
library(cowplot) # plots in grids
library(ggthemes) # colour palettes
options(knitr.kable.NA = '-')
knitr::opts_chunk$set(dev = "ragg_png")
```
### Data
Data provided as part of the assignment is loaded and processed below. In addition, a .csv of abbreviated stressor names (author's own work, and not based on any standardised practice) created for this assignment is loaded in.
Code chunks and tables have been placed in collapsible elements to improve document readability. Click the "▶ Show code" (or similar) button to show the section.
```{r data, echo=FALSE, warning=FALSE}
exposure <- read_excel(path = "data/Exposure_data_AEP.xlsx", sheet = 2)
sites <- read_excel(path = "data/Exposure_data_AEP.xlsx", sheet = 3) |>
# No samples included in the dataset were taken from Hotranelva.
filter(SITE_NAME != "Hotranelva") |>
# Order sites left to right
arrange(LONGITUDE) |>
mutate(SITE_NAME = fct_inorder(SITE_NAME))
all_data <- left_join(exposure, sites, by = "SITE_CODE")
mcpa_exposure <- all_data %>% filter(STRESSOR_ID == 21)
all_stressors <- all_data %>%
select(STRESSOR_ID, STRESSOR_NAME, CAS, INCHIKEY) %>%
unique() |>
# Abbreviate chemical names for neater graphs
left_join(y = read_csv("data/stressor_acronyms.csv", show_col_types = FALSE), by = "STRESSOR_NAME") |>
select(STRESSOR_NAME, STRESSOR_ACRONYM)
# Verify/validate data
# Is the same unit used throughout?
if (length(all_data$MEASURED_UNIT %>% unique()) != 1) {
print("Multiple units.")
}
# Set ggplot theme
theme_set(new = theme_few())
```
Next, maps of Europe from the `giscoR` package are loaded and cropped to the relevant area, and the provided site data is converted to an `sf` object for compatibility with `sf`'s GIS functions.
```{r gis_setup, warning=FALSE}
# Get a map of Europe
gisco_Europe <- gisco_get_countries(epsg = 4326, region = "Europe")
sites_sf <- st_as_sf(sites, coords = c("LONGITUDE", "LATITUDE"), crs = 4326) |>
select(-COORDINATE_SYSTEM, -SITE_CODE)
```
## Task A: Import and Visualise Spatiotemporal Exposure Data
Measured concentrations for 40 stressors across 5 sites were imported. Concentrations of 2-methyl-4-chlorophenoxyacetic acid (MCPA), a phenoyx herbicide, were measured over a 5-month period in 2019, 1-8 times, site depending. Data were plotted as a scatter plot/line graph to show spatial (site) and temporal (sampling date) variation.
```{r task_a, fig.width = 10, warning=FALSE, message=FALSE}
# Sampling Map
plot_a0 <- ggplot(gisco_Europe) +
geom_sf() +
geom_sf(data = sites_sf) +
geom_sf_text(aes(label = NAME_ENGL), colour = "darkgrey") +
geom_sf(data = sites_sf, aes(size = 6, colour = SITE_NAME)) +
geom_sf_text(data = sites_sf, aes(label = c("T", "V", "H", "S", "M"))) +
theme(legend.position = "none", axis.title = element_blank(), axis.ticks = element_blank(), axis.text = element_blank()) +
scale_color_few() +
coord_sf(xlim = c(5, 13), ylim = c(57, 61))
plot_a1 <- mcpa_exposure %>%
ggplot(mapping = aes(x = SAMPLE_DATE, y = MEASURED_VALUE, colour = SITE_NAME)) +
geom_line(size = 1.2) +
geom_point(size = 7) +
geom_text(aes(label = SITE_NAME |> substr(start = 1, stop = 1)), colour = "black") +
scale_y_log10() +
scale_color_few() +
labs(
x = "Sampling Date (2019)",
y = "MCPA Concentration (μg/L)",
colour = "Sampling Site",
shape = "Sampling Site"
) +
theme(legend.position = "none") +
annotation_logticks(sides = "l")
task_a_plot <- ggdraw(plot = plot_a1) +
draw_plot(plot_a0, x = 0.77, y = 0.6, width = 0.2, height = 0.3, scale = 1.5)
task_a_plot
```
Figure 1: Line graph/scatter plot of measured concentrations of 2-methyl-4-chlorophenoxyacetic acid (MCPA) in 5 freshwater sampling sites in Norway, between May 5th and October 14th, 2019. Each point represents a single sample. Inset map shows locations of sampling sites (T)imebekken, (V)asshaglona, (H)eiabekken, (S)kuterudbekken, and (M)ørdrebekken.
## Task B: Predict tissue concentrations of chemicals in fish
Summary statistics (mean, standard deviation, maximum, minimum, and 5th/95th percentiles) were calculated for each combination of site and stressor across the sampling period.
```{r task_b, warning=FALSE, message=FALSE}
all_data_summary <-
all_data %>%
group_by(SITE_NAME, STRESSOR_NAME) %>%
summarise(n_MEASURED_VALUE = n(),
Mean_MEASURED_VALUE = mean(MEASURED_VALUE),
SD_MEASURED_VALUE = sd(MEASURED_VALUE),
Max_MEASURED_VALUE = max(MEASURED_VALUE),
Min_MEASURED_VALUE = min(MEASURED_VALUE),
perc_5_MEASURED_VALUE = quantile(MEASURED_VALUE, probs = 0.05),
perc_95_MEASURED_VALUE = quantile(MEASURED_VALUE, probs = 0.95)) |>
transmute(SITE_NAME,
STRESSOR_NAME,
n_MEASURED_VALUE,
Mean_SD = case_when(is.na(SD_MEASURED_VALUE) ~
paste0(Mean_MEASURED_VALUE |> round(2), "*"),
TRUE ~
paste0(Mean_MEASURED_VALUE |> round(2), " ± ", SD_MEASURED_VALUE |> round(2))
),
Max_MEASURED_VALUE,
Min_MEASURED_VALUE,
perc_5_MEASURED_VALUE,
perc_95_MEASURED_VALUE
)
# Make a table using knitr
task_b_table <- knitr::kable(all_data_summary, digits = 2,
col.names = c("Site", "Stressor", "n", "Mean ± SD", "Min.", "Max.", "5th Percentile", "95th Percentile"))
```
```{=html}
<details>
<summary>Show Table 1: Summary statistics of measured stressor concentrations in water</summary>
```
```{r task_b_table, echo=FALSE}
task_b_table
```
</details>
Table 1: Table of mean, standard deviation, maximum, minimum and percentile values of measured concentrations of 48 chemical stressors across 6 freshwater sampling sites in Norway, May 6th to October 28th 2019. *: <i>n</i> too small to calculate standard deviation. All values in μg/L, rounded to 2 d.p.
Tissue concentrations of stressors in fish were calculated as follows. First, the R package `webchem` was used to assign PubChem IDs to the stressors based on InChIKeys. These IDs were subsequently used to look up LogKOW (a.k.a XLogP) values on PubChem, which were available as predicted values for 39 substances, but not for spinosad. These values were then used to predict tissue concentrations following the equation:
$$C_f = C_w \times 10^{0.76 \times logP - 0.23}$$
Tissue concentration data is subsequently saved to `/data/Cf_summary_table.csv` and displayed below.
```{r concentration_fish, warning=FALSE, message=FALSE}
# Try and import the (already downloaded and saved) chem properties, try importing if it doesn't work
try_import_webchem <- try(webchem_chemicals <- read_csv(file = "data/webchem_chemical_data.csv"))
if (inherits(x = try_import_webchem, what = "try-error")) {
print("Chemical data not found, importing from Pubchem via Webchem.")
# Get the relevant CIDs from InChiKeys, then look up LogKOW/XLogP on Pubchem
webchem_chemicals <- all_stressors |>
mutate(CID = get_cid(INCHIKEY, from = "inchikey", match = "first")$cid,
XLogP = pc_prop(CID, properties = "XLogP")$XLogP)
# Save to data to avoid unecessary API calls
write_csv(x = webchem_chemicals, file = "data/webchem_chemical_data.csv")
}
# Predict tissue concentration in fish for all chemicals and sites
Cf_all_stressors <- all_data |>
left_join(webchem_chemicals |> select(-STRESSOR_ID, -INCHIKEY, -CAS), by = "STRESSOR_NAME") |>
mutate(FISH_CONC_uGKG = MEASURED_VALUE * 10 ^ (0.76 * XLogP - 0.23))
# Make a table summarising mean concentrations in fish by site and stressor
Cf_summary <- Cf_all_stressors |>
group_by(SITE_NAME, STRESSOR_NAME) %>%
summarise(Mean_MEASURED_VALUE = mean(MEASURED_VALUE),
SD_MEASURED_VALUE = sd(MEASURED_VALUE)) |>
transmute(SITE_NAME,
STRESSOR_NAME,
Mean_SD = case_when(is.na(SD_MEASURED_VALUE) ~
paste0(Mean_MEASURED_VALUE |> round(2), "*"),
TRUE ~
paste0(Mean_MEASURED_VALUE |> round(2), " ± ", SD_MEASURED_VALUE |> round(2))
)
) |>
pivot_wider(values_from = Mean_SD, names_from = SITE_NAME)
# Make a pretty table using knitr
Cf_summary_table <- knitr::kable(Cf_summary, digits = 2,
col.names = c("Stressor", "Heiabekken", "Mørdrebekken", "Skuterudbekken", "Timebekken", "Vasshaglona"))
write_csv(x = Cf_summary, file = "data/Cf_summary_table.csv")
```
```{=html}
<details>
<summary>Show Table 1: Summary statistics of predicted stressor concentrations in fish tissue</summary>
```
```{r Cf_summary_table, echo=FALSE}
Cf_summary_table
```
</details>
Table 2: Table of mean and standard deviation of predicted concentrations in fish tissues of 39 chemical stressors across 6 freshwater sampling sites in Norway, May 6th to October 28th 2019. *: n too small to calculate standard deviation. All values in μg/kg, calculated from measured water concentrations and predicted LogKOW, following the equation $C_f = C_w \times 10^{0.76 \times logP - 0.23}$.
## Task C: Visualize the values of Cw and Cf on a map
A purely map-based approach to this task was considered. However, given the number of stressors assessed, and the inability (with available data) of determining which concentrations were most environmentally relevant, it was decided that box plots with an accompanying map of sampling sites was a more appropriate approach.
```{r task_c, warning=FALSE, message=FALSE, fig.width = 10, fig.height=12}
#| fig-cap:
#| - "Speed and Stopping Distances of Cars"
task_c_data <- Cf_all_stressors |>
left_join(sites_sf, by = "SITE_NAME") |>
left_join(all_stressors, by = "STRESSOR_NAME") |>
pivot_longer(cols = c(MEASURED_VALUE, FISH_CONC_uGKG), names_to = "Media", values_to = "Stressor_ug") |>
mutate(Media = case_when(
Media == "MEASURED_VALUE" ~ "Water (μg/L)",
TRUE ~ "Fish (μg/kg)"
) |> factor(levels = c("Water (μg/L)", "Fish (μg/kg)"))
)
task_c_map <-
ggplot(data = gisco_Europe) +
geom_sf() +
geom_sf(data = sites_sf) +
geom_sf_text(aes(label = NAME_ENGL), colour = "darkgrey") +
geom_sf(data = sites_sf, aes(size = 7, colour = SITE_NAME)) +
geom_sf_text(data = sites_sf, aes(label = c("T", "V", "H", "S", "M"))) +
theme(legend.position = "none", axis.title = element_blank(), axis.ticks = element_blank(), axis.text = element_blank()) +
coord_sf(xlim = c(5, 12), ylim = c(58, 64)) +
scale_color_few()
task_c_worldmap <- ggplot(data = gisco_countries) +
geom_sf() +
geom_rect(aes(xmin = 5, xmax = 12, ymin = 58, ymax = 64), color = "red", fill = NA) +
coord_sf(expand = FALSE) +
theme(axis.text = element_blank(), axis.ticks = element_blank())
task_c_boxplots <- ggplot(data = task_c_data, mapping = aes(x = Stressor_ug, y = STRESSOR_ACRONYM, fill = SITE_NAME)) +
geom_boxplot() +
scale_x_log10() +
scale_fill_few() +
labs(
x = "Measured Concentration",
y = "Stressor (Abbreviated)") +
annotation_logticks(sides = "b") +
theme(legend.position = "none") +
facet_grid(SITE_NAME ~ Media, scales = "free", space = "free", drop = TRUE, switch = "y") +
theme(plot.margin = unit(c(0, 5, 0, 0), "cm"))
task_c_plot <- ggdraw(plot = task_c_boxplots) +
draw_plot(task_c_map, x = 0.82, y = 0.715, width = 0.17, height = 0.3, scale = 1) +
draw_plot(task_c_worldmap, x = 0.82, y = 0.55, width = 0.17, height = 0.3, scale = 1)
task_c_plot
```
Figure 2: Box plot of concentrations, in water (measured, μg/L) and fish tissue (predicted from water and LogKow, μg/kg), for 40 chemical stressors sampled April - October 2019 across 5 freshwater sites in southern Norway. Inset map shows locations of (T)imebekken, (V)asshaglona, (H)eiabekken, (S)kuterudbekken, and (M)ørdrebekken, as well as rough area of the world. Acronyms: p,p'-DDE: 1,1'-(2,2-Dichloro-1,1-ethenediyl)bis(4-chlorobenzene), BAM: 2,6-dichlorobenzamide (BAM), MCPA: 2-methyl-4-chlorophenoxyacetic acid (MCPA), ACNF: aclonifen, AXSB: azoxystrobin, BTZN: bentazone, BXFN: bixafen, BSCL: boscalid, CBDZ: carbendazim, CFVP: chlorfenvinphos, CLMZ: clomazone, CPRL: clopyralid, CZFM: cyazofamid, CPDN: cyprodinil, DCHP: dichlorprop, DFNC: diflufenican, FNMD: fenamidone, FPPM: fenpropimorph, FLSL: florasulam, FDXN: fludioxonil, FLXP: fluroxypyr, IMDC: imidacloprid, MDPM: mandipropamid, MCPP: mecoprop, MTLX: metalaxyl, MTMT: metamitron, METZ: metribuzin, PCCR: pencycuron, PNXD: pinoxaden, PPMC: propamocarb, PCAZ: propiconazole, PPCZ: propoxycarbazone, PCZ-D: prothioconazole-desthio, PCSB: pyraclostrobin, PRDF: pyridafol, PRSL: pyroxsulam, SPNS: spinosad, TBDZ: thiabendazole, THCP: thiacloprid, TOSB: trifloxystrobin.