Multicore Parallel Processing for Large Datasets #23

chinthanishanth · 2018-02-02T23:28:38Z

i am using matchit to calculate propensity scores for large data (say approx ~ 500 thousand records) it is taking approx 2 hours to get results. it will be great if this package support muticore parallel processing i.e., using all available cores in processor, so that computation time can be reduced significantly.

ngreifer · 2020-11-04T08:49:29Z

This is something we will look into, but it is not straightforward how to make nearest neighbor matching without replacement run in parallel. It is an iterative process where each step depends on the steps taken before it. This makes it not "embarrassingly parallel". This may be possible with matching with replacement, however. We will continue to examine this.

ginnydang · 2024-01-11T19:31:16Z

I have the same problem working with dataset up to 400,000 entries. I wonder if you come up with some ideas or not. Thank you so much!

buhtz · 2024-01-11T20:47:32Z

Some things need to be serial. 2 hours is not much. Take a longer lunch break.

ngreifer added the enhancement label Nov 4, 2020

ngreifer closed this as completed Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multicore Parallel Processing for Large Datasets #23

Multicore Parallel Processing for Large Datasets #23

chinthanishanth commented Feb 2, 2018

ngreifer commented Nov 4, 2020

ginnydang commented Jan 11, 2024

buhtz commented Jan 11, 2024

Multicore Parallel Processing for Large Datasets #23

Multicore Parallel Processing for Large Datasets #23

Comments

chinthanishanth commented Feb 2, 2018

ngreifer commented Nov 4, 2020

ginnydang commented Jan 11, 2024

buhtz commented Jan 11, 2024