Using matchit on data with long format #128

yusheng0104 · 2022-08-29T19:14:53Z

Hi,

Could matchit also be used to match data in a long format?
If yes, could you help with how to set up the parameters?

Thank you.

ngreifer · 2022-08-29T20:23:53Z

Can you be more specific about what you are trying to accomplish? If you only want to match on the baseline (time 1) treatment and covariates, then just subset your data to time 1, perform the matching, and merge the matched dataset back into the original dataset, using participant ID as the merging variable.

yusheng0104 · 2022-08-29T20:37:23Z

Thanks for your response.
Here is what I would like to accomplish.
Say I have a patient with several follow-up treatments who finally died. The data is organized in a long format.
id status age bplevel
1 0 31 4.2
1 0 31 4.0
1 1 33 12.0

The control data is also in a long format but with different follow-up times. I am trying to find a good match for patient 1. Three control examples (5, 6, 7) are shown below.
id status age bplevel
5 0 27 3.2
5 0 28 2.9
5 0 32. 4.0
5 0 35 3.5
6 0 23 2.2
6 0 23 3.9
7 0 30 2.0
7 0 33 3.3
7 0 36 3.9

Thanks

ngreifer · 2022-08-29T21:39:07Z

I see. Please see this link, which asks the same question.

yusheng0104 · 2022-08-29T23:23:06Z

Hi Noah,

My question is actually different. I would like to include all the observations of the patient. If it only matches the first observation, the problem would be super simple.

Thanks

ngreifer · 2022-08-29T23:27:54Z

I'm still not sure what you mean. If you want to match each treated row to a control row, you don't need to do anything. Just matching on the dataset as it is will work (though statistically that wouldn't make much sense). If not, can you please explain in detail what you want to do? Maybe a way to think of this is, how do you want to define the distance between two units? Once that has been decided, the matching is straightforward.

yusheng0104 · 2022-08-29T23:59:27Z

I posted patient 1 earlier, it has 3 rows. But for the controls, the patients could have 2, 3, or 4 rows. To find the best match for patient 1, all 3 rows of patient 1 should be taken into consideration. It's a good question about defining the distance. In fact, I am not sure which distance could be used in my case because the row numbers are not the same among different patients. It will be straightforward if all patients have the same number of rows. I have checked some literature, and someone calculated propensity scores using ML models. I still couldn't figure out how to calculate the scores and was curious if your Matchit tool could make the matching simple.

ngreifer · 2022-08-30T14:44:41Z

I have never seen anyone perform matching in this way and have no idea how it would work, so unfortunately I can't help you until you find a way to compute the distance between two units. It's not even clear to me how you would compute a propensity score from this data. If you have a reference that analyzes data in the way you want, please send it along and I will take a look. Your best bet is to create a dataset with a row for each unit with variables that summarize the longitudinal nature of the original data, e.g., by using the mean value of the covariate. You can use aggregate() or related functions in dplyr to do that before running a simple match.

yusheng0104 · 2022-08-30T15:09:14Z

I've sent a paper to you through my email. Please make sure you've received it.
Your suggestion on using a mean row is a good idea for me. Thanks.

ngreifer · 2022-08-30T15:51:33Z

I didn't receive it. You can just send the doi here.

yusheng0104 · 2022-08-30T16:14:47Z

A paper using Cox-PH model to calculate propensity scores.
https://atm.amegroups.com/article/view/36411/pdf

yusheng0104 · 2022-10-11T07:19:21Z

Please check this paper. The Cox-PH model was used to calculate the propensity scores. https://atm.amegroups.com/article/view/36411/pdf

…

On Tue, Aug 30, 2022 at 10:44 AM Noah Greifer ***@***.***> wrote: I have never seen anyone perform matching in this way and have no idea how it would work, so unfortunately I can't help you until you find a way to compute the distance between two units. It's not even clear to me how you would compute a propensity score from this data. If you have a reference that analyzes data in the way you want, please send it along and I will take a look. Your best bet is to create a dataset with a row for each unit with variables that summarize the longitudinal nature of the original data, e.g., by using the mean value of the covariate. You can use aggregate() or related functions in dplyr to do that before running a simple match. — Reply to this email directly, view it on GitHub <#128 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJVSLA55EQIP2AUDEZMKYE3V3YM6JANCNFSM57645WXQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

ngreifer · 2022-10-11T15:12:54Z

That is an interesting paper, but it is outside the scope of MatchIt's current capabilities, so I can't offer you any help, sorry. Code is given in the paper, so perhaps you can just follow it and see what happens. That is not a mainstream methodology and it has not been rigorously investigated, so I would be hesitant to use it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using matchit on data with long format #128

Using matchit on data with long format #128

yusheng0104 commented Aug 29, 2022

ngreifer commented Aug 29, 2022

yusheng0104 commented Aug 29, 2022 •

edited

Loading

ngreifer commented Aug 29, 2022

yusheng0104 commented Aug 29, 2022

ngreifer commented Aug 29, 2022

yusheng0104 commented Aug 29, 2022 •

edited

Loading

ngreifer commented Aug 30, 2022

yusheng0104 commented Aug 30, 2022

ngreifer commented Aug 30, 2022

yusheng0104 commented Aug 30, 2022

yusheng0104 commented Oct 11, 2022 via email

ngreifer commented Oct 11, 2022

Using matchit on data with long format #128

Using matchit on data with long format #128

Comments

yusheng0104 commented Aug 29, 2022

ngreifer commented Aug 29, 2022

yusheng0104 commented Aug 29, 2022 • edited Loading

ngreifer commented Aug 29, 2022

yusheng0104 commented Aug 29, 2022

ngreifer commented Aug 29, 2022

yusheng0104 commented Aug 29, 2022 • edited Loading

ngreifer commented Aug 30, 2022

yusheng0104 commented Aug 30, 2022

ngreifer commented Aug 30, 2022

yusheng0104 commented Aug 30, 2022

yusheng0104 commented Oct 11, 2022 via email

ngreifer commented Oct 11, 2022

yusheng0104 commented Aug 29, 2022 •

edited

Loading

yusheng0104 commented Aug 29, 2022 •

edited

Loading