Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multicore Parallel Processing for Large Datasets #23

Closed
chinthanishanth opened this issue Feb 2, 2018 · 3 comments
Closed

Multicore Parallel Processing for Large Datasets #23

chinthanishanth opened this issue Feb 2, 2018 · 3 comments

Comments

@chinthanishanth
Copy link

i am using matchit to calculate propensity scores for large data (say approx ~ 500 thousand records) it is taking approx 2 hours to get results. it will be great if this package support muticore parallel processing i.e., using all available cores in processor, so that computation time can be reduced significantly.

@ngreifer
Copy link
Collaborator

ngreifer commented Nov 4, 2020

This is something we will look into, but it is not straightforward how to make nearest neighbor matching without replacement run in parallel. It is an iterative process where each step depends on the steps taken before it. This makes it not "embarrassingly parallel". This may be possible with matching with replacement, however. We will continue to examine this.

@ginnydang
Copy link

I have the same problem working with dataset up to 400,000 entries. I wonder if you come up with some ideas or not. Thank you so much!

@buhtz
Copy link

buhtz commented Jan 11, 2024

Some things need to be serial. 2 hours is not much. Take a longer lunch break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants