Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using matchit on data with long format #128

Open
yusheng0104 opened this issue Aug 29, 2022 · 12 comments
Open

Using matchit on data with long format #128

yusheng0104 opened this issue Aug 29, 2022 · 12 comments

Comments

@yusheng0104
Copy link

Hi,

Could matchit also be used to match data in a long format?
If yes, could you help with how to set up the parameters?

Thank you.

@ngreifer
Copy link
Collaborator

Can you be more specific about what you are trying to accomplish? If you only want to match on the baseline (time 1) treatment and covariates, then just subset your data to time 1, perform the matching, and merge the matched dataset back into the original dataset, using participant ID as the merging variable.

@yusheng0104
Copy link
Author

yusheng0104 commented Aug 29, 2022

Thanks for your response.
Here is what I would like to accomplish.
Say I have a patient with several follow-up treatments who finally died. The data is organized in a long format.
id status age bplevel
1 0 31 4.2
1 0 31 4.0
1 1 33 12.0

The control data is also in a long format but with different follow-up times. I am trying to find a good match for patient 1. Three control examples (5, 6, 7) are shown below.
id status age bplevel
5 0 27 3.2
5 0 28 2.9
5 0 32. 4.0
5 0 35 3.5
6 0 23 2.2
6 0 23 3.9
7 0 30 2.0
7 0 33 3.3
7 0 36 3.9

Thanks

@ngreifer
Copy link
Collaborator

I see. Please see this link, which asks the same question.

@yusheng0104
Copy link
Author

Hi Noah,

My question is actually different. I would like to include all the observations of the patient. If it only matches the first observation, the problem would be super simple.

Thanks

@ngreifer
Copy link
Collaborator

I'm still not sure what you mean. If you want to match each treated row to a control row, you don't need to do anything. Just matching on the dataset as it is will work (though statistically that wouldn't make much sense). If not, can you please explain in detail what you want to do? Maybe a way to think of this is, how do you want to define the distance between two units? Once that has been decided, the matching is straightforward.

@yusheng0104
Copy link
Author

yusheng0104 commented Aug 29, 2022

I posted patient 1 earlier, it has 3 rows. But for the controls, the patients could have 2, 3, or 4 rows. To find the best match for patient 1, all 3 rows of patient 1 should be taken into consideration. It's a good question about defining the distance. In fact, I am not sure which distance could be used in my case because the row numbers are not the same among different patients. It will be straightforward if all patients have the same number of rows. I have checked some literature, and someone calculated propensity scores using ML models. I still couldn't figure out how to calculate the scores and was curious if your Matchit tool could make the matching simple.

@ngreifer
Copy link
Collaborator

I have never seen anyone perform matching in this way and have no idea how it would work, so unfortunately I can't help you until you find a way to compute the distance between two units. It's not even clear to me how you would compute a propensity score from this data. If you have a reference that analyzes data in the way you want, please send it along and I will take a look. Your best bet is to create a dataset with a row for each unit with variables that summarize the longitudinal nature of the original data, e.g., by using the mean value of the covariate. You can use aggregate() or related functions in dplyr to do that before running a simple match.

@yusheng0104
Copy link
Author

I've sent a paper to you through my email. Please make sure you've received it.
Your suggestion on using a mean row is a good idea for me. Thanks.

@ngreifer
Copy link
Collaborator

I didn't receive it. You can just send the doi here.

@yusheng0104
Copy link
Author

A paper using Cox-PH model to calculate propensity scores.
https://atm.amegroups.com/article/view/36411/pdf

@yusheng0104
Copy link
Author

yusheng0104 commented Oct 11, 2022 via email

@ngreifer
Copy link
Collaborator

That is an interesting paper, but it is outside the scope of MatchIt's current capabilities, so I can't offer you any help, sorry. Code is given in the paper, so perhaps you can just follow it and see what happens. That is not a mainstream methodology and it has not been rigorously investigated, so I would be hesitant to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants