-
Notifications
You must be signed in to change notification settings - Fork 31
Performance Testing with LBPM
For performance testing purposes it can be useful to be able to quickly generate synthetic test cases to assess the performance of LBPM, without needing to rely on large image sizes to construct subdomains. A simple case can be constructed by using a random close pack of 1896 spheres, which is available from the example/Sph1896
directory. To set up a simulation from the example, copy the contents of the example directory to some working location, e.g.
cp -r $LBPM_DIR/example/Sph1896 ./
An example input file input.db
is included. The GenerateSphereTest
executable will convert the provided sphere packing into a binary format, discretizing the sphere system into a single 400x400x400
sub-domain. To generate the input data, run the command
[mcclurej@thor Sph1896]$ mpirun -np 1 $LBPM_DIR/bin/GenerateSphereTest input.db
********************************************************
Running Sphere Packing pre-processor for LBPM-WIA
********************************************************
voxel length = 0.003125 micron
Reading the sphere packing
Reading the packing file...
Number of spheres extracted is: 1896
Domain set.
Sauter Mean Diameter (computed from sphere packing) = 34.151490
Media porosity = 0.359970
MorphOpen: Initializing with saturation 0.500000
Media Porosity: 0.366707
Maximum pore size: 14.075796
Performing morphological opening with target saturation 0.500000
Final saturation=0.503602
Final critical radius=4.793676
This will create a single input file ID.00000
. The sphere pack is fully periodic, which means that it can be repeated arbitrarily many times in any direction and each process should perform identical computations. If we want to run this example with a process grid that is 4x4x4
, we can create 64
copies of this input file as follows
export NRANKS=64
export BASEID="ID.000"
for i in `seq -w 1 $NRANKS`; do idfile="$BASEID$i"; echo $idfile; cp ID.00000 $idfile; done
Note that the value provided for BASEID
should ensure that the width of the number appended to the files ID.xxxxx
is five (i.e. should be consistent with seq -w
output). This provides an identical input geometry for each MPI sub-domain. The input.db
input file can then be updated to reflect the desired domain structure by altering the process grid. Within the Domain
section of the file, specify the 4 x 4 x 4 process grid as follows
nproc = 4, 4, 4 // Number of processors (Npx,Npy,Npz)
You can then launch
MPI_LAUNCHER = mpirun
MPI_NUMPROCS_FLAG = -n
MPI_FLAGS = "bind-to core"
$MPI_LAUNCHER $MPI_NUMPROCS_FLAG 64 $MPI_FLAGS $LBPM_DIR/bin/lbpm_color_simulator input.db
The output below is for a four-GPU run on an IBM Power8 Minksi node with NVLINK and 4 NVIDIA P100 GPU.
mpirun -np 4 $MPIARGS $LBPM_BIN/lbpm_color_simulator input.db
********************************************************
Running Color LBM
********************************************************
MPI rank=0 will use GPU ID 0 / 4
MPI rank=2 will use GPU ID 2 / 4
MPI rank=3 will use GPU ID 3 / 4
MPI rank=1 will use GPU ID 1 / 4
voxel length = 0.001563 micron
voxel length = 0.001563 micron
Read input media...
Initialize from segmented data: solid=0, NWP=1, WP=2
Media porosity = 0.359970
Initialized solid phase -- Converting to Signed Distance function
Domain set.
Create ScaLBL_Communicator
Set up memory efficient layout, 11795503 | 11795520 | 33386248
Allocating distributions
Setting up device map and neighbor list
Component labels: 1
label=0, affinity=-1.000000, volume fraction==0.652189
Initializing distributions
Initializing phase field
********************************************************
No. of timesteps: 3000
Affinities - rank 0:
Main: 0
Thread 1: 1
-------------------------------------------------------------------
********************************************************
CPU time = 0.034209
Lattice update rate (per core)= 344.810959 MLUPS
Lattice update rate (total)= 1379.243836 MLUPS
********************************************************
Example arguments for GPU-based Open MPI:
export MPIARGS="--bind-to core --mca pml ob1 --mca btl vader,self,smcuda,openib --mca btl_openib_warn_default_gid_prefix 0 --mca btl_smcuda_use_cuda_ipc_same_gpu 0 --mca btl_openib_want_cuda_gdr 0 --mca btl_openib_cuda_async_recv false --mca btl_smcuda_use_cuda_ipc 0 --mca btl_openib_allow_ib true --mca btl_openib_cuda_rdma_limit 1000 -x LD_LIBRARY_PATH"