-
Notifications
You must be signed in to change notification settings - Fork 39
CoreNEURON calculation of load balance for mpi simulation. #676
Comments
In lieu of directly calculating computation time used by each cell, it may be sufficient to know the number of instances and computation time of each mechanism (nrn_cur and nrn_solve) in use, as well as number of compartments and setup, gaussian elimination, update, and nonvint times. From that data, it my be possible to calculate a reasonable value for how much time a given distribution of cells will take. |
This makes a lot of sense. It makes sense to start calculating load-imbalance. I would say it would be relatively low-overhead to add a few timers tracking the runtime of these compute routines and printing the load-(im)balance at the end. |
There was some discussion on slack with Ivan and @pramodk about load balance for Ivan's full dentate gyrus model and Pramod showed some calliper profiling timings for 1000msec with coreneuron on 64 nodes of BB5 (each node has 40 CascadeLake cores and 384 GB DRAM). In part:
I noted (barrier refers to spike-exchange imbalance):
A precise answer to that last question is still unclear to me :) Also, there were detailed caliper timings for each mechanism, e.g.:
And it seemed to me that combining this with the instance counts of each mechanism would allow calculation (back in NEURON) of a pretty good proxy for calculating cell complexity and therefore an LPT (least processing time) cell distribution on threads and ranks. In particular because with a range of (instance count, time) for min, max, and average for each mechanism, one might have not just a time per instance but a function f(i) of time/per i instances that could take into account some otherwise vexing memory latency/bandwidth effects. (Speculative but possibly a nice paper there :) |
The actual measure of load balance during simulation is an important statistic to gauge whether performance might be improved by better distribution of cells on mpi ranks.
During a simulation, CoreNEURON should keep track of computation time used by each rank (as opposed to spike exchange time) and indicate the average and maximum computation time.
loadbalance = average/maximum
with a value of 1.0 being ideal.If possible, it would be very useful to determine the computation time used by each cell, since, with that data, one may be able to use the lpt (least processing time) algorithm to decide on a distribution of cells on ranks better than the default round-robin distribution and use the lpt distribution for subsequent simulation runs.
The text was updated successfully, but these errors were encountered: