Matching strategy for clusters that don't include both treatment groups - multilevel matching #188

leonALIVE · 2024-01-28T03:55:55Z

Is there a workaround to get matchit to preferentially match within cluster and to find a match outside the cluster if one does not exist within? (Something similar to Cannas and Arpino (2019) CMatching "hybrid matching" that is no longer supported.)

My example,
I'm evaluating the effect of an intervention (treatment) applied to patients (subjects) in hospitals (clusters or groups). In more that a third of hospitals either all or none of the patients were exposed to the intervention. Strict within-cluster matching options require me to subset (= exclude) a large section of the study population.

I can group hospitals by hospital-level covariates to increase cluster size, but I was hoping there may be a more elegant approach to this problem that is common in my field.

ngreifer · 2024-01-28T04:52:28Z

That's a great question. I can think of an ad-hoc workaround that would be fairly straightforward to implement but would require some manual coding. Essentially, you do regular matching but put a large penalty on any between-cluster matches. The way you could implement this penalty would be by adding a large positive number to the distance between units in different clusters in a distance matrix. That way, between-cluster matches would only occur if the within-cluster match was impossible (e.g., because there were no units left or all remaining units were banned due to a caliper or other constraint). You would also need to match in order of closeness, i.e., by setting m.order = "closest", which would ensure every unit that can get a within-cluster match gets one before any between-cluster matches are sought.

Here is how you might implement this using propensity score matching.

#Compute PS
ps <- glm(A ~ X1 + X2 + cluster, data = data, family = binomial)$fitted

#Compute PS distance
dist <- euclidean_dist(treat ~ ps, data = lalonde)

#Create penalty matrix
cluster_dist <- euclidean_dist(treat ~ cluster, data = lalonde)

#Apply penalty matrix
dist[cluster_dist > 0] <- dist[cluster_dist > 0] + 100 * max(dist)

#Do matching
m <- matchit(A ~ X1 + X2 + cluster, data = data,
             distance = dist, m.order = "closest")

#Find which treated units received matches outside their cluster
rownames(m$match.matrix)[cluster_dist[cbind(rownames(m$match.matrix), m$match.matrix[,1])] > 0]

Setting the penalty to Inf is equivalent to doing exact matching on cluster; setting the penalty to anything larger than the largest distance will prioritize within-cluster matching and do between-cluster matching only for the units that require a match outside their cluster, still prioritizing otherwise close matches. You can modify the penalty to penalize different clusters different amounts. The great thing about being able to supply a distance matrix is that you can implement whatever penalty or restriction you want.

leonALIVE · 2024-01-28T17:37:56Z

Thank you for this beautiful solution, Noah!

Note, for some reason the code to find which treated units received matches outside their cluster does not work. It just produces a matrix of NAs. (Regardless whether I run the code on lalonde or my own test dataset.)

The rest of it works perfectly.

Here is the test data I'm using:
https://github.com/leonALIVE/fake_data/blob/main/dtax.csv

And your code using the var names in the test dataset. The covariates included in the model below are just for testing purposes. The cluster variable indicating hospital is called 'DAG'.

dtax <- read.csv("~/dtax.csv")

ps <- glm(surg_checklist~age+gender+Hb+chronic_comorbid___1+anes_techniq+Specialists+DAG,
          data = dtax, family = binomial)$fitted

dist <- euclidean_dist(surg_checklist ~ ps, data = dtax)

cluster_dist <- euclidean_dist(surg_checklist ~ DAG, data = dtax)

dist[cluster_dist > 0] <- dist[cluster_dist > 0] + 100 * max(dist)

m.out <- matchit(surg_checklist~DAG+age+gender+Hb+chronic_comorbid___1+anes_techniq+Specialists, 
                  data = dtax, distance = dist, m.order = "closest", replace = T)

ngreifer · 2024-01-28T23:31:19Z

Glad it worked! Change the rbind() to cbind() and it should work correctly. I'll make that edit above in case someone else wants to use the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matching strategy for clusters that don't include both treatment groups - multilevel matching #188

Matching strategy for clusters that don't include both treatment groups - multilevel matching #188

leonALIVE commented Jan 28, 2024

ngreifer commented Jan 28, 2024 •

edited

Loading

leonALIVE commented Jan 28, 2024 •

edited

Loading

ngreifer commented Jan 28, 2024

Matching strategy for clusters that don't include both treatment groups - multilevel matching #188

Matching strategy for clusters that don't include both treatment groups - multilevel matching #188

Comments

leonALIVE commented Jan 28, 2024

ngreifer commented Jan 28, 2024 • edited Loading

leonALIVE commented Jan 28, 2024 • edited Loading

ngreifer commented Jan 28, 2024

ngreifer commented Jan 28, 2024 •

edited

Loading

leonALIVE commented Jan 28, 2024 •

edited

Loading