You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
this is related to the branch I started better_parallel.
Right now, with smallish data, the parallelism is great, as data gets copied over to a process, it gets run, and then returned. However, this gets really bogged down with a large number of comparisons (say 2000 things to compare to each other).
I've tried using the chunksize argument to pool.starmap to put more comparisons in each process, but what ends up happening, due to the large number of comparisons, is multiple copies worth of data will get put over onto each process (see here).
What really should be happening to enable this large parallelism, is that we split up the comparisons to be done across processes ourselves in the code based on the number of threads available, and then each process gets a single copy of the input data, and then generates a subset of the full results, which get put together at the end.
I've previously done this in the R version, and it works well, except for needing 2X the memory that it should. I think it will work even better in Python with it's memory handling.
The text was updated successfully, but these errors were encountered:
this is related to the branch I started
better_parallel
.Right now, with smallish data, the parallelism is great, as data gets copied over to a process, it gets run, and then returned. However, this gets really bogged down with a large number of comparisons (say 2000 things to compare to each other).
I've tried using the chunksize argument to
pool.starmap
to put more comparisons in each process, but what ends up happening, due to the large number of comparisons, is multiple copies worth of data will get put over onto each process (see here).What really should be happening to enable this large parallelism, is that we split up the comparisons to be done across processes ourselves in the code based on the number of threads available, and then each process gets a single copy of the input data, and then generates a subset of the full results, which get put together at the end.
I've previously done this in the R version, and it works well, except for needing 2X the memory that it should. I think it will work even better in Python with it's memory handling.
The text was updated successfully, but these errors were encountered: