Norwegian-CDI-WG-comp.Rmd

---
title: "Norwegian CDI:WG comprehension analysis"
author: "Grzegorz Krajewski"
date: "6 10 2023"
output: pdf_document
---

**Based on P. Król's excellent work on Polish data :)**

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.align = "center")
```

As a preparation we first load required libraries and functions:

```{r libraries, warning=FALSE, message=FALSE, echo=FALSE}
library(beepr)
library(plyr) # We need it for laply()
library(tidyverse) # We use it to process the data from the csv
library(mirt)
library(mirtCAT)
library(parallel)
library(ggpubr)
source(paste0(getwd(),"/Functions_new/simulations.R"))
# source(paste0(getwd(),"/Functions/get_first_item_responses.R")) Not sure what we need it for
source(paste0(getwd(),"/Functions_new/fit_models.R"))
source(paste0(getwd(),"/Functions_new/prepare_input_files.R"))
source(paste0(getwd(),"/Functions_new/get_start_thetas.R"))
```

# Data preparation

We import responses, items, and participant data (demo: age & sex) from local files,
pick the *comprehension* data, exclude longitudinal data,
and perform a check (our sums equal to imported scores).

```{r data}
# We get responses and item data. More item data available, if needed: modify select():
read_csv("Data/norwegian_wb_wg.csv") %>%
  filter(item_kind == "word") %>%
  select(child_id, item_id, item_definition, english_gloss, value) -> responses

# We get items (Norwegian, English glosses, and IDs):
responses %>% distinct(item_definition, english_gloss, item_id) -> cdi

# We take comprehension data (modify here to get production):
if_else(is.na(responses$value), 0, 1) -> responses$value

# We get scores and demo data. More demo data available, if needed: modify select():
read_csv("Data/norwegian_wb_participants.csv") %>%
  filter(form == "WG") %>%
  select(child_id, age, sex, comprehension) -> scores
if_else(scores$sex == "Female", "girl",
        if_else(scores$sex == "Male", "boy", scores$sex)) -> scores$sex

# We filter out longitudinal data:
scores %>% group_by(child_id) %>% count() %>% filter(n == 1) %>% pull(child_id) -> single_ids
scores %>% filter(child_id %in% single_ids) -> scores
responses %>% filter(child_id %in% single_ids) -> responses

# We check the sums (should return zero tibble):
responses %>% group_by(child_id) %>% summarise(score = sum(value)) %>%
  full_join(scores) %>% filter(score != comprehension)

```

We prepare the responses matrix
(eliminating all 0's and all 1's items).

Mirt documentation suggests however removing items with diffulties > .95 and < .05.
**Check and explore in the future.**

```{r response_matrix}
responses %>% group_by(item_id) %>% filter(var(value) > 0) %>%
  pivot_wider(names_from = "item_id", values_from = "value", id_cols = "child_id") -> responses
responses <- as.matrix(select(responses, starts_with("item_")))
print(paste0("There are ", nrow(responses), " parents, ", ncol(responses),
             " items (out of ", nrow(cdi), ") with non-zero variance, and ",
             length(which(is.na(responses))), " missing responses."))
```

```{r non_zero_variance_items, eval=FALSE}
cdi %>% filter(item_id %in% colnames(responses)) -> cdi
```

# Model creation

For the model creation
we first decide on the optimal number of quadrature points.
We do it iteratively by fitting models with increasing numbers and
comparing their fit. We start with the default and increase it in
a number of arbitrary, but hopefully not unreasonable, steps.

We also compare the parameter estimates for the first item,
just out of curosity: as the goodness of fit stabilises, so
should the estimates.

*It may take some time! Switch eval on only if you really need to recalculate.*

```{r quadpts_comp, eval=FALSE}
quadpts <- quadpts_comp(responses, quadpts = c(61, 151, 301, 451, 601, 751), NCYCLES = 100000,
             output_file = "Data/norwegian_wg_comp_quadpts.RData")
```

We load the comparison of model fits for different numbers of quadrature points
from a file. Here there are:

```{r quadpts_comp1}
load("Data/norwegian_wg_comp_quadpts.RData")
output -> quadpts # temporarily...
quadpts[1:3,]
```

And here are the parameter estimates for the first item:

```{r quadpts_comp2}
sapply(1:ncol(quadpts), function(x) quadpts[,x]$item1pars)
```

Based on the above we set the number of quadrature points to 451.

```{r quadpts_set}
quadtp <- 451
```

We then iteratively fit the model and remove any misfits until we have fits for all items.

*It may take some time!*

We save the model to a file to save time in the future
(if you want to refit the model, switch the eval on):

```{r misfits_removal, eval=FALSE}
output <- misfits_removal(responses=responses, quadtps=quadtp, NCYCLES=100000, p=0.001,
                          output_file = "Data/norwegian_wg_comp.RData")
```

We load the model from a file (comment out if just fitted above)
and print a basic summary:

```{r model_summary}
load("Data/norwegian_wg_comp.RData")
mod <- output[[1]]
items_removed <- output[[2]]
rm(output) # to save space

mod
```

## Items fit

No items were misfits in the process of model fitting.

```{r items_removed}
items_removed
```

No need to remove anything from the response matrix or the list of items.

```{r cdi_items_removed}
# Code below most certainly needs updating!
items_to_remove_ids <- as.numeric(laply(unlist(items_removed), function(x) substr(x, 5, nchar(x))))
cdi[which(cdi$number.ws %in% items_to_remove_ids), ]
```

```{r items_removed2}
# Code below most certainly needs updating (first part)!
if(length(items_to_remove_ids)) {
  responses <- responses[, -which((colnames(responses) %in% unlist(items_removed)))]
  cdi <- cdi[-which(cdi$number.ws %in% items_to_remove_ids), ]
  if (nrow(cdi) == ncol(responses)) print(paste0(ncol(responses), " items left after removal."))
}
```

## Local dependence

*Check and explore the following code in the future (also compare with Kachergis et al., 2022)*

```{r local_dependence, eval=FALSE}
# I can't make the following code hide the results...
residuals <- residuals(mod, df.p=FALSE) # TODO: explore other parameters?
save(residuals, file = "Data/norwegian_wg_comp_residuals.RData")
```

```{r local_dependence_histogram}
load("Data/norwegian_wg_comp_residuals.RData")
residuals_up <- residuals
residuals_up[lower.tri(residuals_up)] <- NA # upper diagonal contains signed Cramers V coefficients
hist(residuals_up, main="Histogram of signed Cramer's V coefficients", xlab= "Cramer's V")
```

```{r local_dependence_summary}
summary(as.vector(residuals_up))[-7]
# -7 to avoid reporting NAs, which would be confusing as lower triangle is filled with them
```

To display the most highly mutually dependent items
we use V > .3 as the threshold:

```{r local_dependence_high}
indexes <- arrayInd(which(residuals_up > 0.3), dim(residuals))
data.frame(item1 = pull(cdi[indexes[, 1], "item_definition"]),
           item2 = pull(cdi[indexes[, 2], "item_definition"]), CramersV = residuals_up[indexes])
```

There are a few but following Kachergis et al. we set the threshold for removing items to V > .5
and we remove the following items:

```{r local_dependence_remove}
indexes <- arrayInd(which(residuals_up > 0.5), dim(residuals))
data.frame(item1 = pull(cdi[indexes[, 1], "item_definition"]),
           item2 = pull(cdi[indexes[, 2], "item_definition"]), CramersV = residuals_up[indexes])
unique(as.vector(indexes)) -> items_to_remove_ids
responses[, -items_to_remove_ids] -> responses
cdi[-items_to_remove_ids,] -> cdi
```

Then we repeat the iterative model fit with misfits removal.

```{r misfits_removal_2, eval=FALSE}
output <- misfits_removal(responses=responses, quadtps=quadtp, NCYCLES=100000, p=0.001,
                          output_file = "Data/norwegian_wg_comp_2.RData")
```

We load the model from a file (comment out if just fitted above)
and print a basic summary:

```{r model_summary_2}
load("Data/norwegian_wg_comp_2.RData")
mod <- output[[1]]
items_removed <- output[[2]]
rm(output) # to save space

mod
```


There is one misfit to remove:

```{r items_removed_2}
cdi[cdi$item_id %in% items_removed, "item_definition"]
responses <- responses[, -which((colnames(responses) %in% unlist(items_removed)))]
cdi <- cdi[! cdi$item_id %in% items_removed, ]
```

*Seems like we don't need to repeat the local dependence procedure but better check it!*


# Model exploration

## IRT items parameteres

```{r item_parameters}
params_IRT <- as.data.frame(coef(mod, simplify = TRUE, IRTpars = TRUE)$items)[1:2]
cdi <- cbind(params_IRT, cdi)
```

Scatterplot of difficulty by discrimination
(*we use English glosses;
also, if we had categories we could add them as colours*):

```{r item_parameters_plot, fig.width=12, fig.height=8}
# ggplot(params_cdi, aes(b, a, colour=category, label=position)) +
ggplot(cdi, aes(b, a, label=english_gloss)) +
  geom_text(check_overlap = T) +
  geom_point(alpha = 0.3) +
  xlab("Difficulty") +
  ylab("Discrimination") +
#  labs(colour = "Category") +
  labs(title = "Norwegian CDI:WG Comprehension") +
  theme(
  panel.background = element_rect(fill = NA),
  panel.grid.major = element_line(colour = "grey"),
  legend.position = "top"
)
```

## Ability vs. raw scores

We estimate ability levels for children from the sample based on the model.

```{r ability}
scores <- cbind(scores, fscores(mod, method = "MAP", full.scores.SE = TRUE))
```

*Explore the failed convergence? Does it have something to do with
the chosen MAP method?*

Plot the estimated ability level against raw scores:

```{r ability_scores_plot, fig.width=12, fig.height=8}
ggplot(scores, aes(comprehension, F1)) +
  geom_point() +
  ylab("Ability level (theta)") +
  xlab("Raw score (sum of checked items)") +
  labs(title = "Norwegian CDI:WG Comprehension") +
  theme(
  panel.background = element_rect(fill = NA),
  panel.grid.major = element_line(colour = "lightgrey"),
  text = element_text(size=16)
)
```

The scatterplot above shows that IRT estimates get "normalised",
compared to raw scores, as can be seen on the following histograms
(flat distribution for raw scores vs. bell-like distribution for *theta* estimates).

```{r, eval=FALSE}
# Should be in two separate chunks perhaps?
hist(scores$comprehension, xlab = "Raw sums", main = "")
hist(scores$F1, xlab = "Estimated ability levels", main = "")
```


## Ability vs. SE

Plot the ability level against SE:

```{r ability_se_plot, fig.width=12, fig.height=8}
ggplot(scores, aes(F1, SE_F1)) +
  geom_point(alpha = 0.3) +
  xlab("Ability level (theta)") +
  ylab("Standard error (SE)") +
  labs(title = "Norwegian CDI:WG Comprehension") +
  theme(
  panel.background = element_rect(fill = NA),
  panel.grid.major = element_line(colour = "lightgrey"),
  text = element_text(size=16)
)
```

The scatterplot above shows increasing errors at both extremes of the ability level.
Perhaps due to floor and ceiling effects. Errors higher than 0.1 highly unfrequent though:

```{r se_histogram, fig.width=12, fig.height=8}
hist(scores$SE_F1, main = "Norwegian CDI:WG Comprehension", sub = "Histogram of SE", xlab = "SE")
```

## SE and ability level vs. age

The graph below shows that higher error rates are at and close to age range limits
(which is sort of obvious, given the relation between theta and age).
But it also shows that some outliers are at every age
(though in the middle they are not as high).

```{r se_age_plot, fig.width=12, fig.height=8}
ggplot(data.frame(age = as.factor(scores$age), SE = scores$SE_F1), aes(age, SE)) +
  geom_boxplot() +
  xlab("Age (months)") +
  ylab("Standard error of ability estimations") +
  labs(title = "Norwegian CDI:WG Comprehension") +
  theme(
  panel.background = element_rect(fill = NA),
  panel.grid.major = element_line(colour = "lightgrey"),
  text = element_text(size=16)
  )
```

So let's see the relation between ability level and age (and first between raw scores and age).
We add SE as colour.

```{r score_age_plot, fig.width=12, fig.height=8}
ggplot(scores) +
  geom_point(aes(age, comprehension))
ggplot(scores) +
  geom_point(aes(age, F1, col = SE_F1)) +
  scale_fill_gradient(low="green", high="red", aesthetics = "col") +
  labs(y = "ability level", col = "SE")
```


# Simulations

```{r prepare_sim}
params <- as.data.frame(coef(mod, simplify = T)$items)[1:2]
mo <- generate.mirt_object(params, '2PL')
cl <- makeCluster(detectCores())
```

## Whole pool for everyone

```{r sim_all, eval=FALSE}
message(paste("Starting simulation @ ", Sys.time()))
results_all <- mirtCAT(mo = mo, method = "MAP", criteria = "MI", start_item = "MI",
                       local_pattern = responses, cl = cl, design = list(min_items = ncol(responses)))
# results_all1 <- mirtCAT(mo = mod, method = "MAP", criteria = "MI", start_item = "MI",
#                       local_pattern = responses_r, cl = cl, design = list(min_items = ncol(responses_r)))
save(results_all, file = "Data/norwegian_wg_comp_sim_all.Rdata")
message(paste("Finished @ ", Sys.time()))
beep()
```


```{r sim_all_report}
load("Data/norwegian_wg_comp_sim_all.Rdata")
report_sim_results(results_all, select(scores, contains("F1")))
```

Based on the whole pool simulation we can examine decreasing average SE with each item
(we plot it up to 100 items: no need for more).

```{r se_length_plot, fig.width=12, fig.height=8, warning=FALSE}
SE_history <- laply(results_all, function(x) x$thetas_SE_history)
meanSE_item <- as.data.frame(colMeans(SE_history))
meanSE_item$item <- row.names(meanSE_item)
colnames(meanSE_item) <- c("mean_SE", "item")
meanSE_item$item <- as.numeric(meanSE_item$item)

ggline(meanSE_item, x = "item", y = "mean_SE", numeric.x.axis = TRUE) +
  xlab ("Test length") +
  ylab ("Mean SE") +
  labs(title = "Norwegian CDI:WG Comp") +
  theme(
    panel.background = element_rect(fill = NA),
    panel.grid.major = element_line(colour = "lightgrey"),
    text = element_text(size=16)
  ) + 
  scale_y_continuous(breaks = c(0.1, 0.15, 0.2, 0.25, 0.5, 0.75, 1)) + 
  scale_x_continuous(breaks = seq(0, nrow(params), by=10), limits = c(1, 100))
```

## SE = 0.1 stop criterion

The plot above shows that we need quite a number of items (ca. 80) to reach SE=.1 on average.
Let's try a simulation.

```{r se1, eval=FALSE}
results_SE1 <- sim_se(0.1, file = "Data/norwegian_wg_comp_sim_se1.RData")
```


```{r se1_report}
load("Data/norwegian_wg_comp_sim_se1.Rdata")
results_SE1 <- results
report_sim_results(results_SE1, select(scores, contains("F1")))
```


```{r se1_plot, fig.width=12, fig.height=8}
plot_length(results_SE1, "Norwegian CDI:WG Comp", 0.1)
```

## SE = 0.15 stop criterion

```{r se15, eval=FALSE}
results_SE15 <- sim_se(0.15, file = "Data/norwegian_wg_comp_sim_se15.RData")
```


```{r se15_report}
load("Data/norwegian_wg_comp_sim_se15.Rdata")
results_SE15 <- results
report_sim_results(results_SE15, select(scores, contains("F1")))
```


```{r se15_plot, fig.width=12, fig.height=8}
plot_length(results_SE15, "Norwegian CDI:WG Comp", 0.15)
```


## Different start thetas

```{r different_start_thetas, eval=FALSE, message=FALSE, echo=FALSE}
start_thetas <- get_start_thetas(scores, "age", "sex")
fscores_aggr <- start_thetas[[1]]
start_thetas <- start_thetas[[2]]
```

TODO: Simulations with different start thetas when stop criterion would be specified

# Input files preparation

```{r eval=FALSE}
prepare_input_files(params, cdi$item_definition, fscores_aggr,
                    question = "CATEGORY LABEL here", group = "comp", id = cdi$item_id,
                    file1 = "items-Norwegian-WG-comp.csv", file2 = "startThetas-Norwegian-WG-comp.csv")
```