From 969a76d812a1e8479276a9964c0e0fd0341641d0 Mon Sep 17 00:00:00 2001 From: Tony Craig Date: Thu, 16 May 2024 17:39:04 -0700 Subject: [PATCH] Update the automated max_blocks calculation (#954) Update support for max_blocks=-1. This update computes the blocks required on each MPI task and then sets that as max_blocks if max_blocks=-1 in namelist. This is done in ice_distribution and is a function of the decomposition among other things. Refactor the decomposition computation to defer usage of max_blocks and eliminate the blockIndex array. Update some indentation formatting in ice_distribution.F90. Modify cice.setup and cice_decomp.csh to set max_blocks=-1 unless it's explicitly defined by the cice.setup -p setting. Fix a bug in ice_gather_scatter related to zero-ing out of the halo with the field_loc_noupdate setting. This was zero-ing out the blocks extra times and there were no problems as long as max_blocks was the same value on all MPI tasks. With the new implementation of max_blocks=-1, max_blocks can be different values on different MPI tasks. An error was generated and then the implementation was fixed so each block on each task is now zeroed out exactly once. Update diagnostics related to max_block information. Write out the min and max max_blocks values across MPI tasks. Add extra allocation/deallocation checks in ice_distribution.F90 and add a function, ice_memusage_allocErr, to ice_memusage.F90 that checks the alloc/dealloc return code, writes an error message, and aborts. This function could be used in other parts of the code as well. Fix a bug in the io_binary restart output where each task was writing some output when it should have just been the master task. Update test cases Update documentation --- cice.setup | 8 +- .../comm/mpi/ice_gather_scatter.F90 | 36 +- .../comm/serial/ice_gather_scatter.F90 | 32 +- .../cicedyn/infrastructure/ice_domain.F90 | 96 +- cicecore/cicedyn/infrastructure/ice_grid.F90 | 14 +- .../cicedyn/infrastructure/ice_memusage.F90 | 34 +- .../io/io_binary/ice_restart.F90 | 11 +- cicecore/shared/ice_distribution.F90 | 1083 +++++++---------- cicecore/shared/ice_domain_size.F90 | 11 +- configuration/scripts/cice_decomp.csh | 3 +- configuration/scripts/tests/decomp_suite.ts | 51 +- configuration/scripts/tests/first_suite.ts | 18 +- configuration/scripts/tests/gridsys_suite.ts | 54 +- configuration/scripts/tests/perf_suite.ts | 48 +- configuration/scripts/tests/unittest_suite.ts | 13 + doc/source/user_guide/ug_case_settings.rst | 1 + doc/source/user_guide/ug_implementation.rst | 22 +- 17 files changed, 686 insertions(+), 849 deletions(-) diff --git a/cice.setup b/cice.setup index 4c7a222ff..2fd68cd18 100755 --- a/cice.setup +++ b/cice.setup @@ -684,7 +684,7 @@ EOF set thrd = `echo ${pesx} | cut -d x -f 2` set blckx = `echo ${pesx} | cut -d x -f 3` set blcky = `echo ${pesx} | cut -d x -f 4` - set mblck = 0 + set mblck = -1 if (${task} == 0 || ${thrd} == 0 || ${blckx} == 0 || ${blcky} == 0) then echo "${0}: ERROR in -p argument, cannot have zeros" exit -1 @@ -696,7 +696,7 @@ EOF set thrd = `echo ${pesx} | cut -d x -f 2` set blckx = 0 set blcky = 0 - set mblck = 0 + set mblck = -1 if (${task} == 0 || ${thrd} == 0) then echo "${0}: ERROR in -p argument, cannot have zeros" exit -1 @@ -708,7 +708,7 @@ EOF set thrd = 1 set blckx = 0 set blcky = 0 - set mblck = 0 + set mblck = -1 if (${task} == 0) then echo "${0}: ERROR in -p argument, cannot have zeros" exit -1 @@ -757,7 +757,7 @@ EOF # update pesx based on use defined settings and machine limits to reflect actual value set pesx = ${task}x${thrd}x${blckx}x${blcky}x${mblck} - if (${mblck} == 0) then + if (${mblck} <= 0) then set pesx = ${task}x${thrd}x${blckx}x${blcky} endif if (${blckx} == 0 || ${blcky} == 0) then diff --git a/cicecore/cicedyn/infrastructure/comm/mpi/ice_gather_scatter.F90 b/cicecore/cicedyn/infrastructure/comm/mpi/ice_gather_scatter.F90 index 030deabca..cfb98befe 100644 --- a/cicecore/cicedyn/infrastructure/comm/mpi/ice_gather_scatter.F90 +++ b/cicecore/cicedyn/infrastructure/comm/mpi/ice_gather_scatter.F90 @@ -1836,12 +1836,12 @@ subroutine scatter_global_dbl(ARRAY, ARRAY_G, src_task, dst_dist, & !----------------------------------------------------------------- if (field_loc == field_loc_noupdate) then - do n=1,nblocks_tot + do n=1,nblocks_tot + if (dst_dist%blockLocation(n) == my_task+1 .and. & + dst_dist%blockLocalID(n) > 0) then + dst_block = dst_dist%blockLocalID(n) this_block = get_block(n,n) - - if (dst_block > 0) then - ! north edge do j = this_block%jhi+1,ny_block do i = 1, nx_block @@ -1867,8 +1867,8 @@ subroutine scatter_global_dbl(ARRAY, ARRAY_G, src_task, dst_dist, & enddo enddo - endif - enddo + endif + enddo endif if (add_mpi_barriers) then @@ -2222,12 +2222,12 @@ subroutine scatter_global_real(ARRAY, ARRAY_G, src_task, dst_dist, & !----------------------------------------------------------------- if (field_loc == field_loc_noupdate) then - do n=1,nblocks_tot + do n=1,nblocks_tot + if (dst_dist%blockLocation(n) == my_task+1 .and. & + dst_dist%blockLocalID(n) > 0) then + dst_block = dst_dist%blockLocalID(n) this_block = get_block(n,n) - - if (dst_block > 0) then - ! north edge do j = this_block%jhi+1,ny_block do i = 1, nx_block @@ -2253,8 +2253,8 @@ subroutine scatter_global_real(ARRAY, ARRAY_G, src_task, dst_dist, & enddo enddo - endif - enddo + endif + enddo endif if (add_mpi_barriers) then @@ -2608,12 +2608,12 @@ subroutine scatter_global_int(ARRAY, ARRAY_G, src_task, dst_dist, & !----------------------------------------------------------------- if (field_loc == field_loc_noupdate) then - do n=1,nblocks_tot + do n=1,nblocks_tot + if (dst_dist%blockLocation(n) == my_task+1 .and. & + dst_dist%blockLocalID(n) > 0) then + dst_block = dst_dist%blockLocalID(n) this_block = get_block(n,n) - - if (dst_block > 0) then - ! north edge do j = this_block%jhi+1,ny_block do i = 1, nx_block @@ -2639,8 +2639,8 @@ subroutine scatter_global_int(ARRAY, ARRAY_G, src_task, dst_dist, & enddo enddo - endif - enddo + endif + enddo endif if (add_mpi_barriers) then diff --git a/cicecore/cicedyn/infrastructure/comm/serial/ice_gather_scatter.F90 b/cicecore/cicedyn/infrastructure/comm/serial/ice_gather_scatter.F90 index 34cca2d03..5f4938281 100644 --- a/cicecore/cicedyn/infrastructure/comm/serial/ice_gather_scatter.F90 +++ b/cicecore/cicedyn/infrastructure/comm/serial/ice_gather_scatter.F90 @@ -1002,12 +1002,12 @@ subroutine scatter_global_dbl(ARRAY, ARRAY_G, src_task, dst_dist, & !----------------------------------------------------------------- if (field_loc == field_loc_noupdate) then - do n=1,nblocks_tot + do n=1,nblocks_tot + if (dst_dist%blockLocation(n) /= 0 .and. & + dst_dist%blockLocalID(n) > 0) then + dst_block = dst_dist%blockLocalID(n) this_block = get_block(n,n) - - if (dst_block > 0) then - ! north edge do j = this_block%jhi+1,ny_block do i = 1, nx_block @@ -1033,8 +1033,8 @@ subroutine scatter_global_dbl(ARRAY, ARRAY_G, src_task, dst_dist, & enddo enddo - endif - enddo + endif + enddo endif !----------------------------------------------------------------------- @@ -1250,12 +1250,12 @@ subroutine scatter_global_real(ARRAY, ARRAY_G, src_task, dst_dist, & !----------------------------------------------------------------- if (field_loc == field_loc_noupdate) then - do n=1,nblocks_tot + do n=1,nblocks_tot + if (dst_dist%blockLocation(n) /= 0 .and. & + dst_dist%blockLocalID(n) > 0) then + dst_block = dst_dist%blockLocalID(n) this_block = get_block(n,n) - - if (dst_block > 0) then - ! north edge do j = this_block%jhi+1,ny_block do i = 1, nx_block @@ -1281,8 +1281,8 @@ subroutine scatter_global_real(ARRAY, ARRAY_G, src_task, dst_dist, & enddo enddo - endif - enddo + endif + enddo endif !----------------------------------------------------------------------- @@ -1498,12 +1498,12 @@ subroutine scatter_global_int(ARRAY, ARRAY_G, src_task, dst_dist, & !----------------------------------------------------------------- if (field_loc == field_loc_noupdate) then - do n=1,nblocks_tot + do n=1,nblocks_tot + if (dst_dist%blockLocation(n) /= 0 .and. & + dst_dist%blockLocalID(n) > 0) then + dst_block = dst_dist%blockLocalID(n) this_block = get_block(n,n) - - if (dst_block > 0) then - ! north edge do j = this_block%jhi+1,ny_block do i = 1, nx_block diff --git a/cicecore/cicedyn/infrastructure/ice_domain.F90 b/cicecore/cicedyn/infrastructure/ice_domain.F90 index df112eb50..91af49947 100644 --- a/cicecore/cicedyn/infrastructure/ice_domain.F90 +++ b/cicecore/cicedyn/infrastructure/ice_domain.F90 @@ -223,7 +223,12 @@ subroutine init_domain_blocks call broadcast_scalar(nx_global, master_task) call broadcast_scalar(ny_global, master_task) - ! Set nprocs if not set in namelist +!---------------------------------------------------------------------- +! +! Set nprocs if not explicitly set to valid value in namelist +! +!---------------------------------------------------------------------- + #ifdef CESMCOUPLED nprocs = get_num_procs() #else @@ -235,18 +240,6 @@ subroutine init_domain_blocks endif #endif - ! Determine max_blocks if not set - if (max_blocks < 1) then - call proc_decomposition(nprocs, nprocs_x, nprocs_y) - max_blocks=((nx_global-1)/block_size_x/nprocs_x+1) * & - ((ny_global-1)/block_size_y/nprocs_y+1) - max_blocks=max(1,max_blocks) - if (my_task == master_task) then - write(nu_diag,'(/,a52,i6,/)') & - '(ice_domain): max_block < 1: max_block estimated to ',max_blocks - endif - endif - !---------------------------------------------------------------------- ! ! perform some basic checks on domain @@ -321,6 +314,7 @@ subroutine init_domain_distribution(KMTG,ULATG,grid_ice) use ice_boundary, only: ice_HaloCreate use ice_distribution, only: create_distribution, create_local_block_ids, ice_distributionGet use ice_domain_size, only: max_blocks, nx_global, ny_global + use ice_global_reductions, only: global_sum, global_maxval real (dbl_kind), dimension(nx_global,ny_global), intent(in) :: & KMTG ,&! global topography @@ -608,9 +602,9 @@ subroutine init_domain_distribution(KMTG,ULATG,grid_ice) work_per_block = 0 end where if (my_task == master_task) then - write(nu_diag,*) 'ice_domain work_unit, max_work_unit = ',work_unit, max_work_unit - write(nu_diag,*) 'ice_domain nocn = ',minval(nocn),maxval(nocn),sum(nocn) - write(nu_diag,*) 'ice_domain work_per_block = ',minval(work_per_block),maxval(work_per_block),sum(work_per_block) + write(nu_diag,'(2a,4i9)') subname,' work_unit = ',work_unit, max_work_unit + write(nu_diag,'(2a,4i9)') subname,' nocn = ',minval(nocn),maxval(nocn),sum(nocn) + write(nu_diag,'(2a,4i9)') subname,' work_per_block = ',minval(work_per_block),maxval(work_per_block),sum(work_per_block) endif deallocate(nocn) @@ -634,8 +628,42 @@ subroutine init_domain_distribution(KMTG,ULATG,grid_ice) call create_local_block_ids(blocks_ice, distrb_info) - ! write out block distribution - ! internal check of icedistributionGet as part of verification process +!---------------------------------------------------------------------- +! +! check block sizes and max_blocks +! +!---------------------------------------------------------------------- + + if (associated(blocks_ice)) then + nblocks = size(blocks_ice) + else + nblocks = 0 + endif + + tblocks_tmp = global_sum(nblocks, distrb_info) + nblocks_max = global_maxval(nblocks, distrb_info) + + if (my_task == master_task) then + write(nu_diag,'(2a,i8)') subname,' total number of blocks is', tblocks_tmp + endif + + if (nblocks > max_blocks) then + write(nu_diag,'(2a,2i6)') subname,' ERROR: nblocks, max_blocks = ',nblocks,max_blocks + write(nu_diag,'(2a,2i6)') subname,' ERROR: max_blocks too small: increase to', nblocks_max + call abort_ice(subname//' ERROR max_blocks too small', file=__FILE__, line=__LINE__) + else if (nblocks_max < max_blocks) then + if (my_task == master_task) then + write(nu_diag,'(2a,2i6)') subname,' NOTE: max_blocks too large: decrease to', nblocks_max + endif + endif + +!---------------------------------------------------------------------- +! +! write out block distribution +! internal check of icedistributionGet as part of verification process +! +!---------------------------------------------------------------------- + if (debug_blocks) then call flush_fileunit(nu_diag) @@ -713,38 +741,6 @@ subroutine init_domain_distribution(KMTG,ULATG,grid_ice) endif endif - if (associated(blocks_ice)) then - nblocks = size(blocks_ice) - else - nblocks = 0 - endif - nblocks_max = 0 - tblocks_tmp = 0 - do n=0,distrb_info%nprocs - 1 - nblocks_tmp = nblocks - call broadcast_scalar(nblocks_tmp, n) - nblocks_max = max(nblocks_max,nblocks_tmp) - tblocks_tmp = tblocks_tmp + nblocks_tmp - end do - - if (my_task == master_task) then - write(nu_diag,*) & - 'ice: total number of blocks is', tblocks_tmp - endif - - if (nblocks_max > max_blocks) then - write(outstring,*) ' ERROR: num blocks exceed max: increase max to', nblocks_max - call abort_ice(subname//trim(outstring), file=__FILE__, line=__LINE__) - else if (nblocks_max < max_blocks) then - write(outstring,*) 'WARNING: ice no. blocks too large: decrease max to', nblocks_max - if (my_task == master_task) then - write(nu_diag,*) ' ********WARNING***********' - write(nu_diag,*) subname,trim(outstring) - write(nu_diag,*) ' **************************' - write(nu_diag,*) ' ' - endif - endif - !---------------------------------------------------------------------- ! ! Set up ghost cell updates for each distribution. diff --git a/cicecore/cicedyn/infrastructure/ice_grid.F90 b/cicecore/cicedyn/infrastructure/ice_grid.F90 index c43b7989c..54bc3ad92 100644 --- a/cicecore/cicedyn/infrastructure/ice_grid.F90 +++ b/cicecore/cicedyn/infrastructure/ice_grid.F90 @@ -301,6 +301,10 @@ subroutine init_grid1 real (kind=dbl_kind), dimension(:,:), allocatable :: & work_g1, work_g2 + integer (kind=int_kind) :: & + max_blocks_min, & ! min value of max_blocks across procs + max_blocks_max ! max value of max_blocks across procs + real (kind=dbl_kind) :: & rad_to_deg @@ -390,9 +394,15 @@ subroutine init_grid1 ! write additional domain information !----------------------------------------------------------------- + max_blocks_min = global_minval(max_blocks, distrb_info) + max_blocks_max = global_maxval(max_blocks, distrb_info) if (my_task == master_task) then - write(nu_diag,'(a26,i6)') ' Block size: nx_block = ',nx_block - write(nu_diag,'(a26,i6)') ' ny_block = ',ny_block + write(nu_diag,* ) '' + write(nu_diag,'(2a)' ) subname,' Block size:' + write(nu_diag,'(2a,i8)') subname,' nx_block = ',nx_block + write(nu_diag,'(2a,i8)') subname,' ny_block = ',ny_block + write(nu_diag,'(2a,i8)') subname,' min(max_blocks) = ',max_blocks_min + write(nu_diag,'(2a,i8)') subname,' max(max_blocks) = ',max_blocks_max endif end subroutine init_grid1 diff --git a/cicecore/cicedyn/infrastructure/ice_memusage.F90 b/cicecore/cicedyn/infrastructure/ice_memusage.F90 index 323a9074e..45b882879 100644 --- a/cicecore/cicedyn/infrastructure/ice_memusage.F90 +++ b/cicecore/cicedyn/infrastructure/ice_memusage.F90 @@ -8,13 +8,16 @@ MODULE ice_memusage !------------------------------------------------------------------------------- use ice_kinds_mod, only : dbl_kind, log_kind + use ice_fileunits, only : nu_diag + use ice_exit, only : abort_ice implicit none private ! PUBLIC: Public interfaces - public :: ice_memusage_getusage, & + public :: ice_memusage_allocErr, & + ice_memusage_getusage, & ice_memusage_init, & ice_memusage_print @@ -29,6 +32,35 @@ MODULE ice_memusage contains +!=============================================================================== +! check memory alloc/dealloc return code + +logical function ice_memusage_allocErr(istat, errstr) + + implicit none + + !----- arguments ----- + + integer :: istat !< input error code + + character(len=*), optional :: errstr !< error string + + !----- local ----- + + character(*),parameter :: subname = '(ice_memusage_allocErr)' + + ice_memusage_allocErr = .false. + if (istat /= 0) then + ice_memusage_allocErr = .true. + if (present(errstr)) then + write(nu_diag,*) 'ERROR: '//trim(errstr) + endif + call abort_ice(subname//'ERROR: alloc/dealloc', file=__FILE__, line=__LINE__) + return + endif + +end function ice_memusage_allocErr + !=============================================================================== ! Initialize memory conversion to MB diff --git a/cicecore/cicedyn/infrastructure/io/io_binary/ice_restart.F90 b/cicecore/cicedyn/infrastructure/io/io_binary/ice_restart.F90 index 606f0d46b..5866d7130 100644 --- a/cicecore/cicedyn/infrastructure/io/io_binary/ice_restart.F90 +++ b/cicecore/cicedyn/infrastructure/io/io_binary/ice_restart.F90 @@ -12,6 +12,7 @@ module ice_restart use ice_restart_shared, only: & restart, restart_ext, restart_dir, restart_file, pointer_file, & runid, runtype, use_restart_time, lenstr + use ice_communicate, only: my_task, master_task use ice_fileunits, only: nu_diag, nu_rst_pointer use ice_fileunits, only: nu_dump, nu_dump_eap, nu_dump_FY, nu_dump_age use ice_fileunits, only: nu_dump_lvl, nu_dump_pond, nu_dump_hbrine @@ -48,7 +49,6 @@ subroutine init_restart_read(ice_ic) use ice_calendar, only: istep0, istep1, timesecs, npt, myear, & set_date_from_timesecs - use ice_communicate, only: my_task, master_task use ice_dyn_shared, only: kdyn use ice_read_write, only: ice_open, ice_open_ext @@ -381,7 +381,6 @@ subroutine init_restart_write(filename_spec) use ice_calendar, only: msec, mmonth, mday, myear, istep1, & timesecs - use ice_communicate, only: my_task, master_task use ice_dyn_shared, only: kdyn use ice_read_write, only: ice_open, ice_open_ext @@ -721,7 +720,9 @@ subroutine read_restart_field(nu,nrec,work,atype,vname,ndim3, & character(len=*), parameter :: subname = '(read_restart_field)' - write(nu_diag,*) 'vname ',trim(vname) + if (my_task == master_task) then + write(nu_diag,*) subname,' read vname ',trim(vname) + endif if (present(field_loc)) then do n=1,ndim3 if (restart_ext) then @@ -782,6 +783,9 @@ subroutine write_restart_field(nu,nrec,work,atype,vname,ndim3,diag) character(len=*), parameter :: subname = '(write_restart_field)' + if (my_task == master_task) then + write(nu_diag,*) subname,' write vname ',trim(vname) + endif do n=1,ndim3 work2(:,:,:) = work(:,:,n,:) if (restart_ext) then @@ -801,7 +805,6 @@ end subroutine write_restart_field subroutine final_restart() use ice_calendar, only: istep1, timesecs - use ice_communicate, only: my_task, master_task logical (kind=log_kind) :: & tr_iage, tr_FY, tr_lvl, tr_iso, tr_aero, & diff --git a/cicecore/shared/ice_distribution.F90 b/cicecore/shared/ice_distribution.F90 index 6e06069ab..d0768fc5a 100644 --- a/cicecore/shared/ice_distribution.F90 +++ b/cicecore/shared/ice_distribution.F90 @@ -15,6 +15,7 @@ module ice_distribution use ice_blocks, only: nblocks_x, nblocks_y, nblocks_tot, debug_blocks use ice_exit, only: abort_ice use ice_fileunits, only: nu_diag + use ice_memusage, only: ice_memusage_allocErr implicit none private @@ -33,8 +34,6 @@ module ice_distribution blockGlobalID ! global block id for each local block integer (int_kind), dimension(:), pointer :: blockCnt - integer (int_kind), dimension(:,:), pointer :: blockIndex - end type public :: create_distribution, & @@ -123,7 +122,8 @@ function create_distribution(dist_type, nprocs, work_per_block) case default - call abort_ice(subname//'ERROR: ice distribution: unknown distribution type') + call abort_ice(subname//'ERROR: ice distribution: unknown distribution type', & + file=__FILE__, line=__LINE__) end select @@ -153,7 +153,8 @@ subroutine create_local_block_ids(block_ids, distribution) !----------------------------------------------------------------------- integer (int_kind) :: & - n, bcount ! dummy counters + n, bcount, &! dummy counters + istat ! status flag for deallocate character(len=*),parameter :: subname='(create_local_block_ids)' @@ -168,9 +169,6 @@ subroutine create_local_block_ids(block_ids, distribution) if (distribution%blockLocation(n) == my_task+1) bcount = bcount + 1 end do - - if (bcount > 0) allocate(block_ids(bcount)) - !----------------------------------------------------------------------- ! ! now fill array with proper block ids @@ -178,6 +176,8 @@ subroutine create_local_block_ids(block_ids, distribution) !----------------------------------------------------------------------- if (bcount > 0) then + allocate(block_ids(bcount), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc block_ids')) return do n=1,size(distribution%blockLocation) if (distribution%blockLocation(n) == my_task+1) then block_ids(distribution%blockLocalID(n)) = n @@ -315,7 +315,8 @@ subroutine proc_decomposition(nprocs, nprocs_x, nprocs_y) end do proc_loop if (nprocs_x == 0) then - call abort_ice(subname//'ERROR: Unable to find 2d processor config') + call abort_ice(subname//'ERROR: Unable to find 2d processor config', & + file=__FILE__, line=__LINE__) endif if (my_task == master_task) then @@ -364,11 +365,16 @@ subroutine ice_distributionDestroy(distribution) !---------------------------------------------------------------------- deallocate(distribution%blockLocation, stat=istat) + if (ice_memusage_allocErr(istat,subname//'dealloc blockLocation')) return + deallocate(distribution%blockLocalID , stat=istat) + if (ice_memusage_allocErr(istat,subname//'dealloc blockLocalID')) return + deallocate(distribution%blockGlobalID, stat=istat) - deallocate(distribution%blockCnt , stat=istat) - deallocate(distribution%blockindex , stat=istat) + if (ice_memusage_allocErr(istat,subname//'dealloc blockGlobalID')) return + deallocate(distribution%blockCnt , stat=istat) + if (ice_memusage_allocErr(istat,subname//'dealloc blockCnt')) return !----------------------------------------------------------------------- @@ -383,19 +389,19 @@ subroutine ice_distributionGet(distribution,& ! This routine extracts information from a distribution. type (distrb), intent(in) :: & - distribution ! input distribution for which information - ! is requested + distribution ! input distribution for which information + ! is requested - integer (int_kind), intent(out), optional :: & - nprocs ,&! number of processors in this dist - communicator ,&! communicator to use in this dist - numLocalBlocks ! number of blocks distributed to this - ! local processor + integer (int_kind), intent(out), optional :: & + nprocs ,&! number of processors in this dist + communicator ,&! communicator to use in this dist + numLocalBlocks ! number of blocks distributed to this + ! local processor - integer (int_kind), dimension(:), optional :: & - blockLocation ,&! processor location for all blocks - blockLocalID ,&! local block id for all blocks - blockGlobalID ! global block id for each local block + integer (int_kind), dimension(:), optional :: & + blockLocation ,&! processor location for all blocks + blockLocalID ,&! local block id for all blocks + blockGlobalID ! global block id for each local block character(len=*),parameter :: subname='(ice_distributionGet)' @@ -414,7 +420,8 @@ subroutine ice_distributionGet(distribution,& if (associated(distribution%blockLocation)) then blockLocation = distribution%blockLocation else - call abort_ice(subname//'ERROR: blockLocation not allocated') + call abort_ice(subname//'ERROR: blockLocation not allocated', & + file=__FILE__, line=__LINE__) return endif endif @@ -423,7 +430,8 @@ subroutine ice_distributionGet(distribution,& if (associated(distribution%blockLocalID)) then blockLocalID = distribution%blockLocalID else - call abort_ice(subname//'ERROR: blockLocalID not allocated') + call abort_ice(subname//'ERROR: blockLocalID not allocated', & + file=__FILE__, line=__LINE__) return endif endif @@ -432,7 +440,8 @@ subroutine ice_distributionGet(distribution,& if (associated(distribution%blockGlobalID)) then blockGlobalID = distribution%blockGlobalID else - call abort_ice(subname//'ERROR: blockGlobalID not allocated') + call abort_ice(subname//'ERROR: blockGlobalID not allocated', & + file=__FILE__, line=__LINE__) return endif endif @@ -471,7 +480,8 @@ subroutine ice_distributionGetBlockLoc(distribution, blockID, & !----------------------------------------------------------------------- if (blockID < 0 .or. blockID > nblocks_tot) then - call abort_ice(subname//'ERROR: invalid block id') + call abort_ice(subname//'ERROR: invalid block id', & + file=__FILE__, line=__LINE__) return endif @@ -515,7 +525,8 @@ subroutine ice_distributionGetBlockID(distribution, localID, & !----------------------------------------------------------------------- if (localID < 0 .or. localID > distribution%numLocalBlocks) then - call abort_ice(subname//'ERROR: invalid local id') + call abort_ice(subname//'ERROR: invalid local id', & + file=__FILE__, line=__LINE__) return endif @@ -533,7 +544,7 @@ end subroutine ice_distributionGetBlockID !*********************************************************************** - function create_distrb_cart(nprocs, workPerBlock) result(newDistrb) + function create_distrb_cart(nprocs, workPerBlock, max_blocks_calc) result(newDistrb) ! This function creates a distribution of blocks across processors ! using a 2-d Cartesian distribution. @@ -542,11 +553,14 @@ function create_distrb_cart(nprocs, workPerBlock) result(newDistrb) nprocs ! number of processors in this distribution integer (int_kind), dimension(:), intent(in) :: & - workPerBlock ! amount of work per block + workPerBlock ! amount of work per block + + logical (log_kind), optional :: & + max_blocks_calc ! compute max_blocks (default true) type (distrb) :: & - newDistrb ! resulting structure describing Cartesian - ! distribution of blocks + newDistrb ! resulting structure describing Cartesian + ! distribution of blocks !---------------------------------------------------------------------- ! @@ -555,24 +569,31 @@ function create_distrb_cart(nprocs, workPerBlock) result(newDistrb) !---------------------------------------------------------------------- integer (int_kind) :: & - i, j, &! dummy loop indices + i, j, n, &! dummy loop indices istat, &! status flag for allocation iblock, jblock, &! is, ie, js, je, &! start, end block indices for each proc processor, &! processor position in cartesian decomp globalID, &! global block ID localID, &! block location on this processor - nprocsX, &! num of procs in x for global domain - nprocsY, &! num of procs in y for global domain + nprocsX, &! num of procs in x for global domain + nprocsY, &! num of procs in y for global domain numBlocksXPerProc, &! num of blocks per processor in x numBlocksYPerProc, &! num of blocks per processor in y numBlocksPerProc ! required number of blocks per processor - character(len=char_len) :: & - numBlocksPerProc_str ! required number of blocks per processor (as string) + logical (log_kind) :: & + lmax_blocks_calc ! local max_blocks_calc setting character(len=*),parameter :: subname='(create_distrb_cart)' +!---------------------------------------------------------------------- + + lmax_blocks_calc = .true. + if (present(max_blocks_calc)) then + lmax_blocks_calc = max_blocks_calc + endif + !---------------------------------------------------------------------- ! ! create communicator for this distribution @@ -591,27 +612,18 @@ function create_distrb_cart(nprocs, workPerBlock) result(newDistrb) call proc_decomposition(nprocs, nprocsX, nprocsY) - !---------------------------------------------------------------------- ! ! allocate space for decomposition ! !---------------------------------------------------------------------- - allocate (newDistrb%blockLocation(nblocks_tot), & - newDistrb%blockLocalID (nblocks_tot), stat=istat) - - if (istat > 0) then - call abort_ice( & - 'create_distrb_cart: error allocating blockLocation or blockLocalID') - return - endif - - allocate (newDistrb%blockCnt(nprocs)) - newDistrb%blockCnt(:) = 0 + allocate(newDistrb%blockLocation(nblocks_tot), & + newDistrb%blockLocalID (nblocks_tot), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockLocation or blockLocalID')) return - allocate(newDistrb%blockIndex(nprocs,max_blocks)) - newDistrb%blockIndex(:,:) = 0 + allocate(newDistrb%blockCnt(nprocs), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockLocation or blockLocalID')) return !---------------------------------------------------------------------- ! @@ -622,17 +634,10 @@ function create_distrb_cart(nprocs, workPerBlock) result(newDistrb) numBlocksXPerProc = (nblocks_x-1)/nprocsX + 1 numBlocksYPerProc = (nblocks_y-1)/nprocsY + 1 - ! Check if max_blocks is too small - numBlocksPerProc = numBlocksXPerProc * numBlocksYPerProc - if (numBlocksPerProc > max_blocks) then - write(numBlocksPerProc_str, '(i2)') numBlocksPerProc - call abort_ice(subname//'ERROR: max_blocks too small (need at least '//trim(numBlocksPerProc_str)//')') - return - endif - + newDistrb%blockCnt(:) = 0 do j=1,nprocsY do i=1,nprocsX - processor = (j-1)*nprocsX + i ! number the processors + processor = (j-1)*nprocsX + i ! number the processors ! left to right, bot to top is = (i-1)*numBlocksXPerProc + 1 ! starting block in i @@ -642,16 +647,14 @@ function create_distrb_cart(nprocs, workPerBlock) result(newDistrb) je = j *numBlocksYPerProc ! ending block in j if (je > nblocks_y) je = nblocks_y - localID = 0 ! initialize counter for local index do jblock = js,je do iblock = is,ie globalID = (jblock - 1)*nblocks_x + iblock if (workPerBlock(globalID) /= 0) then - localID = localID + 1 + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) newDistrb%blockLocation(globalID) = processor newDistrb%blockLocalID (globalID) = localID - newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 - newDistrb%blockIndex(processor,localID) = globalID else ! no work - eliminate block from distribution newDistrb%blockLocation(globalID) = 0 newDistrb%blockLocalID (globalID) = 0 @@ -659,64 +662,25 @@ function create_distrb_cart(nprocs, workPerBlock) result(newDistrb) end do end do - ! if this is the local processor, set number of local blocks - if (my_task == processor - 1) then - newDistrb%numLocalBlocks = localID - endif - end do end do -!---------------------------------------------------------------------- -! -! now store the local info -! -!---------------------------------------------------------------------- + newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) - if (newDistrb%numLocalBlocks > 0) then - allocate (newDistrb%blockGlobalID(newDistrb%numLocalBlocks), & - stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_cart: error allocating blockGlobalID') - return + ! set local blockGlobalID array + allocate(newDistrb%blockGlobalID(newDistrb%numLocalBlocks), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc numLocalBlocks')) return + do n = 1,nblocks_tot + if (my_task+1 == newDistrb%blockLocation(n)) then + localID = newDistrb%blockLocalID(n) + newDistrb%blockGlobalID (localID) = n endif + enddo - do j=1,nprocsY - do i=1,nprocsX - processor = (j-1)*nprocsX + i - - if (processor == my_task + 1) then - is = (i-1)*numBlocksXPerProc + 1 ! starting block in i - ie = i *numBlocksXPerProc ! ending block in i - if (ie > nblocks_x) ie = nblocks_x - js = (j-1)*numBlocksYPerProc + 1 ! starting block in j - je = j *numBlocksYPerProc ! ending block in j - if (je > nblocks_y) je = nblocks_y - - localID = 0 ! initialize counter for local index - do jblock = js,je - do iblock = is,ie - globalID = (jblock - 1)*nblocks_x + iblock - if (workPerBlock(globalID) /= 0) then - localID = localID + 1 - newDistrb%blockGlobalID (localID) = globalID - endif - end do - end do - - endif - - end do - end do - - else - allocate (newDistrb%blockGlobalID(newDistrb%numLocalBlocks), & - stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_cart: error allocating blockGlobalID') - return + ! set/check max_blocks + if (lmax_blocks_calc) then + if (max_blocks < 0) then + max_blocks = newDistrb%numLocalBlocks endif endif @@ -750,22 +714,23 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - integer (int_kind) :: & - i,j,n ,&! dummy loop indices - pid ,&! dummy for processor id - istat ,&! status flag for allocates - localBlock ,&! local block position on processor - numOcnBlocks ,&! number of ocean blocks - maxWork ,&! max amount of work in any block - nprocsX ,&! num of procs in x for global domain - nprocsY ! num of procs in y for global domain + integer (int_kind) :: & + i, j, n, &! dummy loop indices + processor, &! dummy for processor id + istat, &! status flag for allocates + globalID, &! global block ID + localID, &! block location on this processor + numOcnBlocks, &! number of ocean blocks + maxWork, &! max amount of work in any block + nprocsX, &! num of procs in x for global domain + nprocsY ! num of procs in y for global domain integer (int_kind), dimension(:), allocatable :: & - priority ,&! priority for moving blocks - workTmp ,&! work per row or column for rake algrthm + priority, &! priority for moving blocks + workTmp, &! work per row or column for rake algrthm procTmp ! temp processor id for rake algrthm - type (distrb) :: dist ! temp hold distribution + type (distrb) :: dist ! temp hold distribution character(len=*),parameter :: subname='(create_distrb_rake)' @@ -775,7 +740,8 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - dist = create_distrb_cart(nprocs, workPerBlock) + ! ignore max_block calc in create_distrb_cart and recompute below + dist = create_distrb_cart(nprocs, workPerBlock, max_blocks_calc=.false.) !---------------------------------------------------------------------- ! @@ -792,11 +758,7 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) write(nu_diag,*) subname,' 1d rake on entire distribution' allocate(priority(nblocks_tot), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error allocating priority') - return - endif + if (ice_memusage_allocErr(istat,subname//'alloc priority')) return !*** initialize priority array @@ -812,11 +774,7 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) end do allocate(workTmp(nprocs), procTmp(nprocs), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error allocating procTmp') - return - endif + if (ice_memusage_allocErr(istat,subname//'alloc procTmp')) return workTmp(:) = 0 do i=1,nprocs @@ -832,11 +790,7 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) priority, dist) deallocate(workTmp, procTmp, stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error deallocating procTmp') - return - endif + if (ice_memusage_allocErr(istat,subname//'dealloc procTmp')) return !---------------------------------------------------------------------- ! @@ -857,11 +811,7 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) !---------------------------------------------------------------------- allocate(priority(nblocks_tot), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error allocating priority') - return - endif + if (ice_memusage_allocErr(istat,subname//'alloc priority')) return !*** set highest priority such that eastern-most blocks !*** and blocks with the least amount of work are @@ -880,20 +830,16 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) end do allocate(workTmp(nprocsX), procTmp(nprocsX), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error allocating procTmp') - return - endif + if (ice_memusage_allocErr(istat,subname//'alloc procTmp')) return do j=1,nprocsY workTmp(:) = 0 do i=1,nprocsX - pid = (j-1)*nprocsX + i - procTmp(i) = pid + processor = (j-1)*nprocsX + i + procTmp(i) = processor do n=1,nblocks_tot - if (dist%blockLocation(n) == pid) then + if (dist%blockLocation(n) == processor) then workTmp(i) = workTmp(i) + workPerBlock(n) endif end do @@ -904,11 +850,7 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) end do deallocate(workTmp, procTmp, stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error deallocating procTmp') - return - endif + if (ice_memusage_allocErr(istat,subname//'dealloc procTmp')) return !---------------------------------------------------------------------- ! @@ -931,20 +873,16 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) end do allocate(workTmp(nprocsY), procTmp(nprocsY), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error allocating procTmp') - return - endif + if (ice_memusage_allocErr(istat,subname//'alloc procTmp')) return do i=1,nprocsX workTmp(:) = 0 do j=1,nprocsY - pid = (j-1)*nprocsX + i - procTmp(j) = pid + processor = (j-1)*nprocsX + i + procTmp(j) = processor do n=1,nblocks_tot - if (dist%blockLocation(n) == pid) then + if (dist%blockLocation(n) == processor) then workTmp(j) = workTmp(j) + workPerBlock(n) endif end do @@ -956,11 +894,7 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) end do deallocate(workTmp, procTmp, priority, stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error deallocating procTmp') - return - endif + if (ice_memusage_allocErr(istat,subname//'dealloc procTmp')) return endif ! 1d or 2d rake @@ -976,76 +910,46 @@ function create_distrb_rake(nprocs, workPerBlock) result(newDistrb) allocate(newDistrb%blockLocation(nblocks_tot), & newDistrb%blockLocalID(nblocks_tot), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error allocating blockLocation or blockLocalID') - return - endif + if (ice_memusage_allocErr(istat,subname//'alloc blockLocation or blockLocalID')) return - allocate (newDistrb%blockCnt(nprocs)) + allocate(newDistrb%blockCnt(nprocs), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockCnt')) return newDistrb%blockCnt(:) = 0 - allocate(newDistrb%blockIndex(nprocs,max_blocks)) - newDistrb%blockIndex(:,:) = 0 - - allocate(procTmp(nprocs), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_rake: error allocating procTmp') - return - endif - - procTmp = 0 do n=1,nblocks_tot - pid = dist%blockLocation(n) ! processor id - newDistrb%blockLocation(n) = pid - - if (pid > 0) then - procTmp(pid) = procTmp(pid) + 1 - if (procTmp(pid) > max_blocks) then - call abort_ice(subname//'ERROR: max_blocks too small') - return - endif - newDistrb%blockLocalID (n) = procTmp(pid) - newDistrb%blockIndex(pid,procTmp(pid)) = n + globalID = n + processor = dist%blockLocation(globalID) ! processor id + newDistrb%blockLocation(globalID) = processor + + if (processor > 0) then + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) + newDistrb%blockLocation(globalID) = processor + newDistrb%blockLocalID (globalID) = localID else - newDistrb%blockLocalID (n) = 0 + newDistrb%blockLocation(globalID) = 0 + newDistrb%blockLocalID (globalID) = 0 endif end do - newDistrb%blockCnt(:) = procTmp(:) - newDistrb%numLocalBlocks = procTmp(my_task+1) - - if (minval(procTmp) < 1) then - call abort_ice(subname//'ERROR: processors left with no blocks') - return - endif - - deallocate(procTmp, stat=istat) - - if (istat > 0) then - call abort_ice(subname//'ERROR: allocating last procTmp') - return - endif + newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) - allocate(newDistrb%blockGlobalID(newDistrb%numLocalBlocks), & - stat=istat) - - if (istat > 0) then - call abort_ice(subname//'ERROR: allocating blockGlobalID') - return - endif - - localBlock = 0 - do n=1,nblocks_tot - if (newDistrb%blockLocation(n) == my_task+1) then - localBlock = localBlock + 1 - newDistrb%blockGlobalID(localBlock) = n + ! set local blockGlobalID array + allocate(newDistrb%blockGlobalID(newDistrb%numLocalBlocks), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc numLocalBlocks')) return + do n = 1,nblocks_tot + if (my_task+1 == newDistrb%blockLocation(n)) then + localID = newDistrb%blockLocalID(n) + newDistrb%blockGlobalID (localID) = n endif - end do + enddo -!---------------------------------------------------------------------- + ! set/check max_blocks + if (max_blocks < 0) then + max_blocks = newDistrb%numLocalBlocks + endif + ! destroy cart distribution call ice_distributionDestroy(dist) !---------------------------------------------------------------------- @@ -1061,7 +965,7 @@ function create_distrb_roundrobin(nprocs, workPerBlock) result(newDistrb) ! standalone CAM mode. integer (int_kind), intent(in) :: & - nprocs ! number of processors in this distribution + nprocs ! number of processors in this distribution integer (int_kind), dimension(:), intent(in) :: & workPerBlock ! amount of work per block @@ -1077,15 +981,12 @@ function create_distrb_roundrobin(nprocs, workPerBlock) result(newDistrb) !---------------------------------------------------------------------- integer (int_kind) :: & - i, j, &! dummy loop indices + i, j, n, &! dummy loop indices istat, &! status flag for allocation processor, &! processor position in cartesian decomp globalID, &! global block ID localID ! block location on this processor - integer (int_kind), dimension(:), allocatable :: & - proc_tmp ! temp processor id - character(len=*),parameter :: subname='(create_distrb_roundrobin)' !---------------------------------------------------------------------- @@ -1110,15 +1011,12 @@ function create_distrb_roundrobin(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - allocate (newDistrb%blockLocation(nblocks_tot), & - newDistrb%blockLocalID (nblocks_tot), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_roundrobin: error allocating blockLocation or blockLocalID') - return - endif + allocate(newDistrb%blockLocation(nblocks_tot), & + newDistrb%blockLocalID (nblocks_tot), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockLocation or blockLocalID')) return - allocate (newDistrb%blockCnt(nprocs)) + allocate(newDistrb%blockCnt(nprocs), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockCnt')) return !---------------------------------------------------------------------- ! @@ -1126,67 +1024,42 @@ function create_distrb_roundrobin(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - allocate(proc_tmp(nprocs)) processor = 0 globalID = 0 - proc_tmp = 0 - - allocate(newDistrb%blockIndex(nprocs,max_blocks)) - newDistrb%blockIndex(:,:) = 0 + newDistrb%numLocalBlocks = 0 + newDistrb%blockCnt(:) = 0 + ! compute decomposition do j=1,nblocks_y do i=1,nblocks_x - globalID = globalID + 1 - if (workPerBlock(globalID) /= 0) then processor = mod(processor,nprocs) + 1 - proc_tmp(processor) = proc_tmp(processor) + 1 - localID = proc_tmp(processor) - if (localID > max_blocks) then - call abort_ice(subname//'ERROR: max_blocks too small') - return - endif + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) newDistrb%blockLocation(globalID) = processor newDistrb%blockLocalID (globalID) = localID - newDistrb%blockIndex(processor,localID) = globalID else ! no work - eliminate block from distribution newDistrb%blockLocation(globalID) = 0 newDistrb%blockLocalID (globalID) = 0 endif + enddo + enddo + newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) + + ! set local blockGlobalID array + allocate(newDistrb%blockGlobalID(newDistrb%numLocalBlocks), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc numLocalBlocks')) return + do n = 1,nblocks_tot + if (my_task+1 == newDistrb%blockLocation(n)) then + localID = newDistrb%blockLocalID(n) + newDistrb%blockGlobalID (localID) = n + endif + enddo - end do - end do - - newDistrb%numLocalBlocks = proc_tmp(my_task+1) - newDistrb%blockCnt(:) = proc_tmp(:) - deallocate(proc_tmp) - -! write(nu_diag,*) 'my_task,newDistrb%numLocalBlocks',& -! my_task,newDistrb%numLocalBlocks - -!---------------------------------------------------------------------- -! -! now store the local info -! -!---------------------------------------------------------------------- - - globalID = 0 - - if (newDistrb%numLocalBlocks > 0) then - allocate (newDistrb%blockGlobalID(newDistrb%numLocalBlocks), & - stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_roundrobin: error allocating numLocalBlocks') - return - endif - - processor = my_task + 1 - do localID = 1,newDistrb%numLocalBlocks - newDistrb%blockGlobalID (localID) = newDistrb%blockIndex(processor,& - localID) - enddo + ! set/check max_blocks + if (max_blocks < 0) then + max_blocks = newDistrb%numLocalBlocks endif !---------------------------------------------------------------------- @@ -1202,7 +1075,7 @@ function create_distrb_spiralcenter(nprocs, workPerBlock) result(newDistrb) ! standalone CAM mode. integer (int_kind), intent(in) :: & - nprocs ! number of processors in this distribution + nprocs ! number of processors in this distribution integer (int_kind), dimension(:), intent(in) :: & workPerBlock ! amount of work per block @@ -1219,14 +1092,13 @@ function create_distrb_spiralcenter(nprocs, workPerBlock) result(newDistrb) integer (int_kind) :: & n, i, j, ic, jc, id, jd, cnt, &! dummy loop indices - istat, &! status flag for allocation - processor, &! processor position in cartesian decomp - nblocklist, &! number of blocks in blocklist - globalID, &! global block ID - localID ! block location on this processor + istat, &! status flag for allocation + processor, &! processor position in cartesian decomp + nblocklist, &! number of blocks in blocklist + globalID, &! global block ID + localID ! block location on this processor integer (int_kind), dimension(:), allocatable :: & - proc_tmp, &! temp processor id blocklist ! temp block ordered list integer (int_kind), dimension(:,:), allocatable :: & blockchk ! temp block check array @@ -1255,10 +1127,12 @@ function create_distrb_spiralcenter(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - allocate (newDistrb%blockLocation(nblocks_tot), & - newDistrb%blockLocalID (nblocks_tot), stat=istat) + allocate(newDistrb%blockLocation(nblocks_tot), & + newDistrb%blockLocalID (nblocks_tot), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockLocation or blockLocalID')) return - allocate (newDistrb%blockCnt(nprocs)) + allocate(newDistrb%blockCnt(nprocs), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockCnt')) return !---------------------------------------------------------------------- ! @@ -1271,18 +1145,15 @@ function create_distrb_spiralcenter(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - allocate(proc_tmp(nprocs)) - allocate(blocklist(nblocks_tot)) - allocate(blockchk(nblocks_x,nblocks_y)) + allocate(blocklist(nblocks_tot), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blocklist')) return + allocate(blockchk(nblocks_x,nblocks_y), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockchk')) return nblocklist = 0 blocklist = 0 blockchk = 0 processor = 0 globalID = 0 - proc_tmp = 0 - - allocate(newDistrb%blockIndex(nprocs,max_blocks)) - newDistrb%blockIndex(:,:) = 0 jc = nblocks_y/2 ic = nblocks_x/2 @@ -1354,10 +1225,12 @@ function create_distrb_spiralcenter(nprocs, workPerBlock) result(newDistrb) if (nblocklist /= nblocks_x*nblocks_y .or. & maxval(blockchk) /= 1 .or. minval(blockchk) /= 1) then - call abort_ice(subname//'ERROR: blockchk invalid') + call abort_ice(subname//'ERROR: blockchk invalid', & + file=__FILE__, line=__LINE__) return endif - deallocate(blockchk) + deallocate(blockchk, stat=istat) + if (ice_memusage_allocErr(istat,subname//'dealloc blockchk')) return !---------------------------------------------------------------------- ! @@ -1365,55 +1238,42 @@ function create_distrb_spiralcenter(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - do n = 1,nblocklist - - globalID = blocklist(n) - - if (workPerBlock(globalID) /= 0) then - processor = mod(processor,nprocs) + 1 - proc_tmp(processor) = proc_tmp(processor) + 1 - localID = proc_tmp(processor) - if (localID > max_blocks) then - call abort_ice(subname//'ERROR: max_blocks too small') - return - endif - newDistrb%blockLocation(globalID) = processor - newDistrb%blockLocalID (globalID) = localID - newDistrb%blockIndex(processor,localID) = globalID - else ! no work - eliminate block from distribution - newDistrb%blockLocation(globalID) = 0 - newDistrb%blockLocalID (globalID) = 0 - endif + newDistrb%numLocalBlocks = 0 + newDistrb%blockCnt(:) = 0 + do n = 1,nblocklist + globalID = blocklist(n) + if (workPerBlock(globalID) /= 0) then + processor = mod(processor,nprocs) + 1 + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) + newDistrb%blockLocation(globalID) = processor + newDistrb%blockLocalID (globalID) = localID + else ! no work - eliminate block from distribution + newDistrb%blockLocation(globalID) = 0 + newDistrb%blockLocalID (globalID) = 0 + endif end do + newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) + + ! set local blockGlobalID array + allocate(newDistrb%blockGlobalID(newDistrb%numLocalBlocks), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc numLocalBlocks')) return + do n = 1,nblocks_tot + if (my_task+1 == newDistrb%blockLocation(n)) then + localID = newDistrb%blockLocalID(n) + newDistrb%blockGlobalID (localID) = n + endif + enddo - newDistrb%numLocalBlocks = proc_tmp(my_task+1) - newDistrb%blockCnt(:) = proc_tmp(:) - deallocate(proc_tmp) - deallocate(blocklist) - -! write(nu_diag,*) 'my_task,newDistrb%numLocalBlocks',& -! my_task,newDistrb%numLocalBlocks - -!---------------------------------------------------------------------- -! -! now store the local info -! -!---------------------------------------------------------------------- - - globalID = 0 - - if (newDistrb%numLocalBlocks > 0) then - allocate (newDistrb%blockGlobalID(newDistrb%numLocalBlocks), & - stat=istat) - - processor = my_task + 1 - do localID = 1,newDistrb%numLocalBlocks - newDistrb%blockGlobalID (localID) = newDistrb%blockIndex(processor,& - localID) - enddo + ! set/check max_blocks + if (max_blocks < 0) then + max_blocks = newDistrb%numLocalBlocks endif + deallocate(blocklist, stat=istat) + if (ice_memusage_allocErr(istat,subname//'dealloc blocklist')) return + !---------------------------------------------------------------------- end function create_distrb_spiralcenter @@ -1427,7 +1287,7 @@ function create_distrb_wghtfile(nprocs, workPerBlock) result(newDistrb) ! standalone CAM mode. integer (int_kind), intent(in) :: & - nprocs ! number of processors in this distribution + nprocs ! number of processors in this distribution integer (int_kind), dimension(:), intent(in) :: & workPerBlock ! amount of work per block @@ -1450,9 +1310,6 @@ function create_distrb_wghtfile(nprocs, workPerBlock) result(newDistrb) globalID, &! global block ID localID ! block location on this processor - integer (int_kind), dimension(:), allocatable :: & - proc_tmp ! temp processor id - logical (log_kind) :: up ! direction of pe counting character(len=*),parameter :: subname='(create_distrb_wghtfile)' @@ -1479,10 +1336,12 @@ function create_distrb_wghtfile(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - allocate (newDistrb%blockLocation(nblocks_tot), & - newDistrb%blockLocalID (nblocks_tot), stat=istat) + allocate(newDistrb%blockLocation(nblocks_tot), & + newDistrb%blockLocalID (nblocks_tot), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockLocation or blockLocalID')) return - allocate (newDistrb%blockCnt(nprocs)) + allocate(newDistrb%blockCnt(nprocs), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockCnt')) return !---------------------------------------------------------------------- ! @@ -1492,94 +1351,76 @@ function create_distrb_wghtfile(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - allocate(proc_tmp(nprocs)) processor = 0 - proc_tmp = 0 + newDistrb%numLocalBlocks = 0 + newDistrb%blockCnt(:) = 0 up = .true. - allocate(newDistrb%blockIndex(nprocs,max_blocks)) - newDistrb%blockIndex(:,:) = 0 - if (my_task == master_task) & write(nu_diag,*) subname,' workPerBlock = ',minval(workPerBlock),maxval(workPerBlock) if (minval(workPerBlock) < 0 .or. maxval(workPerBlock) > 12) then write(nu_diag,*) subname,' workPerBlock = ',minval(workPerBlock),maxval(workPerBlock) - call abort_ice(subname//'ERROR: workPerBlock incorrect') + call abort_ice(subname//'ERROR: workPerBlock incorrect', & + file=__FILE__, line=__LINE__) return endif ! do not distribution blocks with work=0 - do n=maxval(workPerBlock),1,-1 - cnt = 0 - do j=1,nblocks_y - do i=1,nblocks_x - - if (mod(j,2) == 1) then - globalID = (j-1)*nblocks_x + i - else - globalID = (j-1)*nblocks_x + nblocks_x - i + 1 - endif - - if (workPerBlock(globalID) == 0) then ! no work - eliminate block from distribution - newDistrb%blockLocation(globalID) = 0 - newDistrb%blockLocalID (globalID) = 0 - elseif (workPerBlock(globalID) == n) then - cnt = cnt + 1 -! processor = mod(processor,nprocs) + 1 - if (up) then - processor = processor + 1 + do n = maxval(workPerBlock),1,-1 + cnt = 0 + do j=1,nblocks_y + do i=1,nblocks_x + if (mod(j,2) == 1) then + globalID = (j-1)*nblocks_x + i else - processor = processor - 1 - endif - if (processor > nprocs) then - up = .false. - processor = nprocs - elseif (processor < 1) then - up = .true. - processor = 1 + globalID = (j-1)*nblocks_x + nblocks_x - i + 1 endif - proc_tmp(processor) = proc_tmp(processor) + 1 - localID = proc_tmp(processor) - if (localID > max_blocks) then - call abort_ice(subname//'ERROR: max_blocks too small') - return + if (workPerBlock(globalID) == 0) then ! no work - eliminate block from distribution + newDistrb%blockLocation(globalID) = 0 + newDistrb%blockLocalID (globalID) = 0 + elseif (workPerBlock(globalID) == n) then + cnt = cnt + 1 + if (up) then + processor = processor + 1 + else + processor = processor - 1 + endif + if (processor > nprocs) then + up = .false. + processor = nprocs + elseif (processor < 1) then + up = .true. + processor = 1 + endif + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) + newDistrb%blockLocation(globalID) = processor + newDistrb%blockLocalID (globalID) = localID endif - newDistrb%blockLocation(globalID) = processor - newDistrb%blockLocalID (globalID) = localID - newDistrb%blockIndex(processor,localID) = globalID - endif - - end do - end do -! write(nu_diag,*) 'create_distrb_wghtfile n cnt = ',n,cnt + end do + end do +! write(nu_diag,*) subname,'n cnt = ',n,cnt end do + newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) + + ! set local blockGlobalID array + allocate(newDistrb%blockGlobalID(newDistrb%numLocalBlocks), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc numLocalBlocks')) return + do n = 1,nblocks_tot + if (my_task+1 == newDistrb%blockLocation(n)) then + localID = newDistrb%blockLocalID(n) + newDistrb%blockGlobalID (localID) = n + endif + enddo - newDistrb%numLocalBlocks = proc_tmp(my_task+1) - newDistrb%blockCnt(:) = proc_tmp(:) - deallocate(proc_tmp) + ! set/check max_blocks + if (max_blocks < 0) then + max_blocks = newDistrb%numLocalBlocks + endif -! write(nu_diag,*) 'my_task,newDistrb%numLocalBlocks',& +! write(nu_diag,*) subname,'my_task,newDistrb%numLocalBlocks',& ! my_task,newDistrb%numLocalBlocks -!---------------------------------------------------------------------- -! -! now store the local info -! -!---------------------------------------------------------------------- - - globalID = 0 - - if (newDistrb%numLocalBlocks > 0) then - allocate (newDistrb%blockGlobalID(newDistrb%numLocalBlocks), & - stat=istat) - - processor = my_task + 1 - do localID = 1,newDistrb%numLocalBlocks - newDistrb%blockGlobalID (localID) = newDistrb%blockIndex(processor,& - localID) - enddo - endif - !---------------------------------------------------------------------- end function create_distrb_wghtfile @@ -1593,7 +1434,7 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) ! standalone CAM mode. integer (int_kind), intent(in) :: & - nprocs ! number of processors in this distribution + nprocs ! number of processors in this distribution integer (int_kind), dimension(:), intent(in) :: & workPerBlock ! amount of work per block @@ -1609,18 +1450,15 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) !---------------------------------------------------------------------- integer (int_kind) :: & - i, j, &! dummy loop indices + i, j, n, &! dummy loop indices istat, &! status flag for allocation mblocks, &! estimate of max blocks per pe processor, &! processor position in cartesian decomp globalID, &! global block ID localID ! block location on this processor - integer (int_kind), dimension(:), allocatable :: & - proc_tmp ! temp processor id - logical (log_kind), dimension(:), allocatable :: & - bfree ! map of assigned blocks + bfree ! map of assigned blocks, true = free integer (int_kind) :: cnt, blktogether, i2 integer (int_kind) :: totblocks, nchunks @@ -1650,15 +1488,12 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - allocate (newDistrb%blockLocation(nblocks_tot), & - newDistrb%blockLocalID (nblocks_tot), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_sectrobin: error allocating blockLocation or blockLocalID') - return - endif + allocate(newDistrb%blockLocation(nblocks_tot), & + newDistrb%blockLocalID (nblocks_tot), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockLocation or blockLocalID')) return - allocate (newDistrb%blockCnt(nprocs)) + allocate(newDistrb%blockCnt(nprocs), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockCnt')) return !---------------------------------------------------------------------- ! @@ -1666,15 +1501,12 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - allocate(proc_tmp(nprocs)) processor = 0 globalID = 0 - proc_tmp = 0 - - allocate(newDistrb%blockIndex(nprocs,max_blocks)) - newDistrb%blockIndex(:,:) = 0 - - allocate(bfree(nblocks_x*nblocks_y)) + newDistrb%numLocalBlocks = 0 + newDistrb%blockCnt(:) = 0 + allocate(bfree(nblocks_x*nblocks_y), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc bfree')) return bfree=.true. totblocks = 0 @@ -1696,12 +1528,13 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) blktogether = max(1,nint(float(totblocks)/float(6*nprocs))) -! write(nu_diag,*) 'ice_distrb_sectrobin totblocks = ',totblocks,nblocks_y*nblocks_x +! write(nu_diag,*) subname,'totblocks = ',totblocks,nblocks_y*nblocks_x !------------------------------ ! southern group of blocks ! weave back and forth in i vs j ! go south to north, low - high pes + ! keepgoing to false to stop distribution !------------------------------ processor=1 @@ -1720,24 +1553,18 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) cnt = 0 if (processor == 1) keepgoing = .false. endif -! write(nu_diag,'(a,6i7,l2)') 'tcx ',i,j,globalID,cnt,blktogether,processor,keepgoing +! write(nu_diag,'(a,6i7,l2)') subname,i,j,globalID,cnt,blktogether,processor,keepgoing if (keepgoing) then if (bfree(globalID)) then if (workPerBlock(globalID) /= 0) then - proc_tmp(processor) = proc_tmp(processor) + 1 - localID = proc_tmp(processor) - if (localID > max_blocks) then - call abort_ice(subname//'ERROR: max_blocks too small') - return - endif + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) newDistrb%blockLocation(globalID) = processor newDistrb%blockLocalID (globalID) = localID - newDistrb%blockIndex(processor,localID) = globalID cnt = cnt + 1 totblocks = totblocks-1 bfree(globalID) = .false. - else ! no work - eliminate block from distribution bfree(globalID) = .false. newDistrb%blockLocation(globalID) = 0 @@ -1748,12 +1575,13 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) end do end do -! write(nu_diag,*) 'ice_distrb_sectrobin totblocks left after southern = ',totblocks +! write(nu_diag,*) subname,'totblocks left after southern = ',totblocks !------------------------------ ! northern group of blocks ! weave back and forth in i vs j ! go north to south, high - low pes + ! keepgoing to false to stop distribution !------------------------------ processor=nprocs @@ -1776,19 +1604,13 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) if (keepgoing) then if (bfree(globalID)) then if (workPerBlock(globalID) /= 0) then - proc_tmp(processor) = proc_tmp(processor) + 1 - localID = proc_tmp(processor) - if (localID > max_blocks) then - call abort_ice(subname//'ERROR: max_blocks too small') - return - endif + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) newDistrb%blockLocation(globalID) = processor newDistrb%blockLocalID (globalID) = localID - newDistrb%blockIndex(processor,localID) = globalID cnt = cnt + 1 - totblocks = totblocks - 1 + totblocks = totblocks-1 bfree(globalID) = .false. - else ! no work - eliminate block from distribution bfree(globalID) = .false. newDistrb%blockLocation(globalID) = 0 @@ -1799,12 +1621,13 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) end do end do -! write(nu_diag,*) 'ice_distrb_sectrobin totblocks left after northern = ',totblocks +! write(nu_diag,*) subname,'totblocks left after northern = ',totblocks !------------------------------ ! central group of blocks ! weave back and forth in i vs j ! go north to south, low - high / low - high pes + ! distribute rest of blocks in 2 chunks per proc !------------------------------ nchunks = 2*nprocs @@ -1820,35 +1643,29 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) endif globalID = (j-1)*nblocks_x + i2 if (totblocks > 0) then - do while (proc_tmp(processor) >= mblocks .or. cnt >= blktogether) - nchunks = nchunks - 1 - if (nchunks == 0) then - blktogether = 1 - else - blktogether = max(1,nint(float(totblocks)/float(nchunks))) - endif - cnt = 0 - processor = mod(processor,nprocs) + 1 - enddo + do while (newDistrb%blockCnt(processor) >= mblocks .or. cnt >= blktogether) + nchunks = nchunks - 1 + if (nchunks == 0) then + blktogether = 1 + else + blktogether = max(1,nint(float(totblocks)/float(nchunks))) + endif + cnt = 0 + processor = mod(processor,nprocs) + 1 + enddo endif -! write(nu_diag,*) 'ice_distrb_sectrobin central ',i,j,totblocks,cnt,nchunks,blktogether,processor +! write(nu_diag,*) subname,'central ',i,j,totblocks,cnt,nchunks,blktogether,processor if (bfree(globalID)) then if (workPerBlock(globalID) /= 0) then - proc_tmp(processor) = proc_tmp(processor) + 1 - localID = proc_tmp(processor) - if (localID > max_blocks) then - call abort_ice(subname//'ERROR: max_blocks too small') - return - endif + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) newDistrb%blockLocation(globalID) = processor newDistrb%blockLocalID (globalID) = localID - newDistrb%blockIndex(processor,localID) = globalID cnt = cnt + 1 totblocks = totblocks-1 bfree(globalID) = .false. - else ! no work - eliminate block from distribution bfree(globalID) = .false. newDistrb%blockLocation(globalID) = 0 @@ -1858,34 +1675,25 @@ function create_distrb_sectrobin(nprocs, workPerBlock) result(newDistrb) end do end do - newDistrb%numLocalBlocks = proc_tmp(my_task+1) - newDistrb%blockCnt(:) = proc_tmp(:) - deallocate(proc_tmp) - deallocate(bfree) + newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) -!---------------------------------------------------------------------- -! -! now store the local info -! -!---------------------------------------------------------------------- - - globalID = 0 + ! set local blockGlobalID array + allocate(newDistrb%blockGlobalID(newDistrb%numLocalBlocks), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc numLocalBlocks')) return + do n = 1,nblocks_tot + if (my_task+1 == newDistrb%blockLocation(n)) then + localID = newDistrb%blockLocalID(n) + newDistrb%blockGlobalID (localID) = n + endif + enddo - if (newDistrb%numLocalBlocks > 0) then - allocate (newDistrb%blockGlobalID(newDistrb%numLocalBlocks), & - stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_sectrobin: error allocating numLocalBlocks') - return + ! set/check max_blocks + if (max_blocks < 0) then + max_blocks = newDistrb%numLocalBlocks endif - processor = my_task + 1 - do localID = 1,newDistrb%numLocalBlocks - newDistrb%blockGlobalID (localID) = newDistrb%blockIndex(processor,& - localID) - enddo - endif + deallocate(bfree, stat=istat) + if (ice_memusage_allocErr(istat,subname//'dealloc bfree')) return !---------------------------------------------------------------------- @@ -1900,7 +1708,7 @@ function create_distrb_sectcart(nprocs, workPerBlock) result(newDistrb) ! standalone CAM mode. integer (int_kind), intent(in) :: & - nprocs ! number of processors in this distribution + nprocs ! number of processors in this distribution integer (int_kind), dimension(:), intent(in) :: & workPerBlock ! amount of work per block @@ -1924,9 +1732,6 @@ function create_distrb_sectcart(nprocs, workPerBlock) result(newDistrb) blktogether, &! number of blocks together cnt ! counter - integer (int_kind), dimension(:), allocatable :: & - proc_tmp ! temp processor id - integer (int_kind) :: n character(len=*),parameter :: subname='(create_distrb_sectcart)' @@ -1953,27 +1758,19 @@ function create_distrb_sectcart(nprocs, workPerBlock) result(newDistrb) ! !---------------------------------------------------------------------- - allocate (newDistrb%blockLocation(nblocks_tot), & - newDistrb%blockLocalID (nblocks_tot), stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_sectcart: error allocating blockLocation or blockLocalID') - return - endif + allocate(newDistrb%blockLocation(nblocks_tot), & + newDistrb%blockLocalID (nblocks_tot), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockLocation or blockLocalID')) return + + allocate(newDistrb%blockCnt(nprocs), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockCnt')) return - allocate (newDistrb%blockCnt(nprocs)) !---------------------------------------------------------------------- ! ! distribute blocks linearly across processors in quadrants ! !---------------------------------------------------------------------- - allocate(proc_tmp(nprocs)) - proc_tmp = 0 - - allocate(newDistrb%blockIndex(nprocs,max_blocks)) - newDistrb%blockIndex(:,:) = 0 - blktogether = max(1,nint(float(nblocks_x*nblocks_y)/float(4*nprocs))) ! --- two phases, reset processor and cnt for each phase @@ -1981,10 +1778,14 @@ function create_distrb_sectcart(nprocs, workPerBlock) result(newDistrb) ! --- phase 2 is north to south, east to west on the right half of the domain if (mod(nblocks_x,2) /= 0) then - call abort_ice(subname//'ERROR: nblocks_x not divisible by 2') + call abort_ice(subname//'ERROR: nblocks_x not divisible by 2', & + file=__FILE__, line=__LINE__) return endif + newDistrb%numLocalBlocks = 0 + newDistrb%blockCnt(:) = 0 + do n=1,2 processor = 1 cnt = 0 @@ -2007,15 +1808,10 @@ function create_distrb_sectcart(nprocs, workPerBlock) result(newDistrb) cnt = cnt + 1 if (workPerBlock(globalID) /= 0) then - proc_tmp(processor) = proc_tmp(processor) + 1 - localID = proc_tmp(processor) - if (localID > max_blocks) then - call abort_ice(subname//'ERROR: max_blocks too small') - return - endif + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) newDistrb%blockLocation(globalID) = processor newDistrb%blockLocalID (globalID) = localID - newDistrb%blockIndex(processor,localID) = globalID else ! no work - eliminate block from distribution newDistrb%blockLocation(globalID) = 0 newDistrb%blockLocalID (globalID) = 0 @@ -2024,36 +1820,21 @@ function create_distrb_sectcart(nprocs, workPerBlock) result(newDistrb) end do end do end do + newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) + + ! set local blockGlobalID array + allocate(newDistrb%blockGlobalID(newDistrb%numLocalBlocks), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc numLocalBlocks')) return + do n = 1,nblocks_tot + if (my_task+1 == newDistrb%blockLocation(n)) then + localID = newDistrb%blockLocalID(n) + newDistrb%blockGlobalID (localID) = n + endif + enddo - newDistrb%numLocalBlocks = proc_tmp(my_task+1) - newDistrb%blockCnt(:) = proc_tmp(:) - deallocate(proc_tmp) - -! write(nu_diag,*) 'my_task,newDistrb%numLocalBlocks',& -! my_task,newDistrb%numLocalBlocks - -!---------------------------------------------------------------------- -! -! now store the local info -! -!---------------------------------------------------------------------- - - globalID = 0 - - if (newDistrb%numLocalBlocks > 0) then - allocate (newDistrb%blockGlobalID(newDistrb%numLocalBlocks), & - stat=istat) - if (istat > 0) then - call abort_ice( & - 'create_distrb_sectcart: error allocating numLocalBlocks') - return - endif - - processor = my_task + 1 - do localID = 1,newDistrb%numLocalBlocks - newDistrb%blockGlobalID (localID) = newDistrb%blockIndex(processor,& - localID) - enddo + ! set/check max_blocks + if (max_blocks < 0) then + max_blocks = newDistrb%numLocalBlocks endif !---------------------------------------------------------------------- @@ -2062,7 +1843,7 @@ end function create_distrb_sectcart !********************************************************************** - function create_distrb_spacecurve(nprocs,work_per_block) + function create_distrb_spacecurve(nprocs,work_per_block) result(newDistrb) ! This function distributes blocks across processors in a ! load-balanced manner using space-filling curves @@ -2071,14 +1852,14 @@ function create_distrb_spacecurve(nprocs,work_per_block) use ice_spacecurve integer (int_kind), intent(in) :: & - nprocs ! number of processors in this distribution + nprocs ! number of processors in this distribution integer (int_kind), dimension(:), intent(in) :: & - work_per_block ! amount of work per block + work_per_block ! amount of work per block type (distrb) :: & - create_distrb_spacecurve ! resulting structure describing - ! load-balanced distribution of blocks + newDistrb ! resulting structure describing Cartesian + ! distribution of blocks !---------------------------------------------------------------------- ! @@ -2087,16 +1868,18 @@ function create_distrb_spacecurve(nprocs,work_per_block) !---------------------------------------------------------------------- integer (int_kind) :: & - i,j,n ,&! dummy loop indices - pid ,&! dummy for processor id + i, j, n, &! dummy loop indices + istat, &! status flag for allocation + processor, &! processor position in cartesian decomp + globalID, &! global block ID localID ! local block position on processor integer (int_kind), dimension(:),allocatable :: & idxT_i,idxT_j ! Temporary indices for SFC integer (int_kind), dimension(:,:),allocatable :: & - Mesh ,&! !arrays to hold Space-filling curve - Mesh2 ,&! + Mesh, &! !arrays to hold Space-filling curve + Mesh2, &! Mesh3 ! integer (int_kind) :: & @@ -2111,11 +1894,6 @@ function create_distrb_spacecurve(nprocs,work_per_block) integer (int_kind) :: subNum, sfcNum logical :: foundx - integer (int_kind), dimension(:), allocatable :: & - proc_tmp ! temp processor id for rake algrthm - - type (distrb) :: dist ! temp hold distribution - character(len=*),parameter :: subname='(create_distrb_spacecurve)' !------------------------------------------------------ @@ -2126,10 +1904,39 @@ function create_distrb_spacecurve(nprocs,work_per_block) !------------------------------------------------------ if((.not. IsFactorable(nblocks_y)) .or. (.not. IsFactorable(nblocks_x))) then - create_distrb_spacecurve = create_distrb_cart(nprocs, work_per_block) + newDistrb = create_distrb_cart(nprocs, work_per_block) return endif +!---------------------------------------------------------------------- +! +! create communicator for this distribution +! +!---------------------------------------------------------------------- + + call create_communicator(newDistrb%communicator, nprocs) + +!---------------------------------------------------------------------- +! +! try to find best processor arrangement +! +!---------------------------------------------------------------------- + + newDistrb%nprocs = nprocs + +!---------------------------------------------------------------------- +! +! allocate space for decomposition +! +!---------------------------------------------------------------------- + + allocate(newDistrb%blockLocation(nblocks_tot), & + newDistrb%blockLocalID (nblocks_tot), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockLocation or blockLocalID')) return + + allocate(newDistrb%blockCnt(nprocs), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc blockCnt')) return + !----------------------------------------------- ! Factor the numbers of blocks in each dimension !----------------------------------------------- @@ -2156,36 +1963,16 @@ function create_distrb_spacecurve(nprocs,work_per_block) sb_x = ProdFactor(xdim) sb_y = ProdFactor(ydim) - call create_communicator(dist%communicator, nprocs) - - dist%nprocs = nprocs - - !---------------------------------------------------------------------- - ! - ! allocate space for decomposition - ! - !---------------------------------------------------------------------- - - allocate (dist%blockLocation(nblocks_tot), & - dist%blockLocalID (nblocks_tot)) - - dist%blockLocation=0 - dist%blockLocalID =0 - - allocate (dist%blockCnt(nprocs)) - dist%blockCnt(:) = 0 - - allocate(dist%blockIndex(nprocs,max_blocks)) - dist%blockIndex(:,:) = 0 - !---------------------------------------------------------------------- ! Create the array to hold the SFC and indices into it !---------------------------------------------------------------------- - allocate(Mesh(curveSize,curveSize)) - allocate(Mesh2(nblocks_x,nblocks_y)) - allocate(Mesh3(nblocks_x,nblocks_y)) - allocate(idxT_i(nblocks_tot),idxT_j(nblocks_tot)) + allocate(Mesh(curveSize,curveSize), & + Mesh2(nblocks_x,nblocks_y), & + Mesh3(nblocks_x,nblocks_y), & + idxT_i(nblocks_tot), & + idxT_j(nblocks_tot), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc meshes')) return Mesh = 0 Mesh2 = 0 @@ -2266,7 +2053,7 @@ function create_distrb_spacecurve(nprocs,work_per_block) ! ! First region gets nblocksL+1 blocks per partition ! Second region gets nblocksL blocks per partition -! if(debug_blocks) write(nu_diag,*) 'nprocs,extra,nblocks,nblocksL,s1: ', & +! if(debug_blocks) write(nu_diag,*) subname,'nprocs,extra,nblocks,nblocksL,s1: ', & ! nprocs,extra,nblocks,nblocksL,s1 !----------------------------------------------------------- @@ -2285,7 +2072,7 @@ function create_distrb_spacecurve(nprocs,work_per_block) ! ------------------------------------ ii=ii-1 tmp1 = ii/(nblocksL+1) - dist%blockLocation(n) = tmp1+1 + newDistrb%blockLocation(n) = tmp1+1 else ! ------------------------------------ ! If on the second region of curve @@ -2293,7 +2080,7 @@ function create_distrb_spacecurve(nprocs,work_per_block) ! ------------------------------------ ii=ii-s1-1 tmp1 = ii/nblocksL - dist%blockLocation(n) = extra + tmp1 + 1 + newDistrb%blockLocation(n) = extra + tmp1 + 1 endif endif enddo @@ -2303,54 +2090,52 @@ function create_distrb_spacecurve(nprocs,work_per_block) ! Reset the dist data structure !---------------------------------------------------------------------- - allocate(proc_tmp(nprocs)) - proc_tmp = 0 + globalID = 0 + newDistrb%numLocalBlocks = 0 + newDistrb%blockCnt(:) = 0 do n=1,nblocks_tot - pid = dist%blockLocation(n) - !!!dist%blockLocation(n) = pid - - if(pid>0) then - proc_tmp(pid) = proc_tmp(pid) + 1 - if (proc_tmp(pid) > max_blocks) then - call abort_ice(subname//'ERROR: max_blocks too small') - return - endif - dist%blockLocalID(n) = proc_tmp(pid) - dist%blockIndex(pid,proc_tmp(pid)) = n - else - dist%blockLocalID(n) = 0 + globalID = n + processor = newDistrb%blockLocation(globalID) + if (processor > 0) then + newDistrb%blockCnt(processor) = newDistrb%blockCnt(processor) + 1 + localID = newDistrb%blockCnt(processor) + newDistrb%blockLocalID (globalID) = localID + else ! no work - eliminate block from distribution + newDistrb%blockLocation(globalID) = 0 + newDistrb%blockLocalID (globalID) = 0 endif enddo - dist%numLocalBlocks = proc_tmp(my_task+1) - dist%blockCnt(:) = proc_tmp(:) + newDistrb%numLocalBlocks = newDistrb%blockCnt(my_task+1) - if (dist%numLocalBlocks > 0) then - allocate (dist%blockGlobalID(dist%numLocalBlocks)) - dist%blockGlobalID = 0 - endif - localID = 0 - do n=1,nblocks_tot - if (dist%blockLocation(n) == my_task+1) then - localID = localID + 1 - dist%blockGlobalID(localID) = n + ! set local blockGlobalID array + allocate(newDistrb%blockGlobalID(newDistrb%numLocalBlocks), stat=istat) + if (ice_memusage_allocErr(istat,subname//'alloc numLocalBlocks')) return + do n = 1,nblocks_tot + if (my_task+1 == newDistrb%blockLocation(n)) then + localID = newDistrb%blockLocalID(n) + newDistrb%blockGlobalID (localID) = n endif enddo + ! set/check max_blocks + if (max_blocks < 0) then + max_blocks = newDistrb%numLocalBlocks + endif + ! if (debug_blocks) then -! if (my_task == master_task) write(nu_diag,*) 'dist%blockLocation:= ',dist%blockLocation -! write(nu_diag,*) 'IAM: ',my_task,' SpaceCurve: Number of blocks {total,local} :=', & -! nblocks_tot,nblocks,proc_tmp(my_task+1) +! if (my_task == master_task) write(nu_diag,*) subname,'dist%blockLocation:= ',dist%blockLocation +! write(nu_diag,*) subname,'IAM: ',my_task,' SpaceCurve: Number of blocks {total,local} :=', & +! nblocks_tot,nblocks,newDistrb%numLocalBlocks ! endif + !--------------------------------- ! Deallocate temporary arrays !--------------------------------- - deallocate(proc_tmp) - deallocate(Mesh,Mesh2,Mesh3) - deallocate(idxT_i,idxT_j) - create_distrb_spacecurve = dist ! return the result + deallocate(Mesh,Mesh2,Mesh3,idxT_i,idxT_j, stat=istat) + if (ice_memusage_allocErr(istat,subname//'dealloc meshes')) return !---------------------------------------------------------------------- @@ -2374,11 +2159,11 @@ subroutine ice_distributionRake (procWork, procID, blockWork, & ! ensure a block does not stray too far from its neighbors. integer (int_kind), intent(in), dimension(:) :: & - blockWork ,&! amount of work per block + blockWork, &! amount of work per block procID ! global processor number integer (int_kind), intent(inout), dimension(:) :: & - procWork ,&! amount of work per processor + procWork, &! amount of work per processor priority ! priority for moving a given block type (distrb), intent(inout) :: & @@ -2394,7 +2179,7 @@ subroutine ice_distributionRake (procWork, procID, blockWork, & i, n, &! dummy loop indices np1, &! n+1 corrected for cyclical wrap iproc, inext, &! processor ids for current and next - nprocs, numBlocks, &! number of blocks, processors + nprocs, numBlocks, &! number of blocks, processors lastPriority, &! priority for most recent block minPriority, &! minimum priority lastLoc, &! location for most recent block diff --git a/cicecore/shared/ice_domain_size.F90 b/cicecore/shared/ice_domain_size.F90 index 999a35f48..b0ac9b036 100644 --- a/cicecore/shared/ice_domain_size.F90 +++ b/cicecore/shared/ice_domain_size.F90 @@ -21,7 +21,7 @@ module ice_domain_size ! namelist integer (kind=int_kind), public :: & - max_blocks , & ! max number of blocks per processor + max_blocks , & ! number of blocks allocated per task block_size_x, & ! size of block in first horiz dimension block_size_y, & ! size of block in second horiz dimension nx_global , & ! i-axis size @@ -47,15 +47,6 @@ module ice_domain_size integer (kind=int_kind), public, parameter :: & max_nstrm = 5 ! max number of history output streams - !*** The model will inform the user of the correct - !*** values for the parameter below. A value higher than - !*** necessary will not cause the code to fail, but will - !*** allocate more memory than is necessary. A value that - !*** is too low will cause the code to exit. - !*** A good initial guess is found using - !*** max_blocks = (nx_global/block_size_x)*(ny_global/block_size_y)/ - !*** num_procs - !======================================================================= end module ice_domain_size diff --git a/configuration/scripts/cice_decomp.csh b/configuration/scripts/cice_decomp.csh index bcf27beee..d990c628f 100755 --- a/configuration/scripts/cice_decomp.csh +++ b/configuration/scripts/cice_decomp.csh @@ -167,7 +167,8 @@ setenv ICE_DECOMP_NXGLOB $nxglob setenv ICE_DECOMP_NYGLOB $nyglob setenv ICE_DECOMP_BLCKX $blckx setenv ICE_DECOMP_BLCKY $blcky -setenv ICE_DECOMP_MXBLCKS $mxblcks +# tcraig, do not override max blocks value of -1 +#setenv ICE_DECOMP_MXBLCKS $mxblcks setenv ICE_DECOMP_DECOMP $decomp setenv ICE_DECOMP_DSHAPE $dshape diff --git a/configuration/scripts/tests/decomp_suite.ts b/configuration/scripts/tests/decomp_suite.ts index 8d47506d6..d33572f0b 100644 --- a/configuration/scripts/tests/decomp_suite.ts +++ b/configuration/scripts/tests/decomp_suite.ts @@ -1,8 +1,10 @@ # Test Grid PEs Sets BFB-compare restart gx3 4x2x25x29x4 dslenderX2 restart gx1 64x1x16x16x10 dwghtfile +restart gx1 32x2x10x12x32 dsectcart,short restart gbox180 16x1x6x6x60 dspacecurve,debugblocks decomp gx3 4x2x25x29x5 none +decomp gx3 4x2x25x29 none decomp gx3 4x2x25x29x5 dynpicard,reprosum decomp gx3 4x2x25x29x5 dyneap restart gx3 1x1x50x58x4 droundrobin,thread restart_gx3_4x2x25x29x4_dslenderX2 @@ -13,7 +15,7 @@ restart gx3 6x2x50x58x1 droundrobin restart_gx3_4x2x25x2 restart gx3 5x2x33x23x4 droundrobin restart_gx3_4x2x25x29x4_dslenderX2 restart gx3 4x2x19x19x10 droundrobin restart_gx3_4x2x25x29x4_dslenderX2 restart gx3 20x2x5x4x30 dsectrobin,short restart_gx3_4x2x25x29x4_dslenderX2 -restart gx3 16x2x5x10x20 drakeX2 restart_gx3_4x2x25x29x4_dslenderX2 +restart gx3 16x2x5x10 drakeX2 restart_gx3_4x2x25x29x4_dslenderX2 restart gx3 8x2x8x10x20 droundrobin,maskhalo restart_gx3_4x2x25x29x4_dslenderX2 restart gx3 1x4x25x29x16 droundrobin restart_gx3_4x2x25x29x4_dslenderX2 restart gx3 1x8x30x20x32 droundrobin restart_gx3_4x2x25x29x4_dslenderX2 @@ -23,28 +25,29 @@ restart gx3 16x2x2x2x200 droundrobin restart_gx3_4x2x25x2 restart gx3 16x2x3x3x100 droundrobin restart_gx3_4x2x25x29x4_dslenderX2 restart gx3 16x2x8x8x80 dspiralcenter restart_gx3_4x2x25x29x4_dslenderX2 restart gx3 10x1x10x29x4 dsquarepop,thread restart_gx3_4x2x25x29x4_dslenderX2 -restart gx3 8x1x25x29x4 drakeX2,thread restart_gx3_4x2x25x29x4_dslenderX2 +restart gx3 8x1x25x29 drakeX2,thread restart_gx3_4x2x25x29x4_dslenderX2 -smoke gx3 4x2x25x29x4 debug,run2day,dslenderX2 -smoke gx1 64x1x16x16x10 debug,run2day,dwghtfile -smoke gbox180 16x1x6x6x60 debug,run2day,dspacecurve,debugblocks -smoke gx3 1x1x25x58x8 debug,run2day,droundrobin,thread smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 20x1x5x116x1 debug,run2day,dslenderX1,thread smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 6x2x4x29x18 debug,run2day,dspacecurve smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 8x2x10x12x16 debug,run2day,droundrobin smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 6x2x50x58x1 debug,run2day,droundrobin smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 5x2x33x23x4 debug,run2day,droundrobin smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 4x2x19x19x10 debug,run2day,droundrobin smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 20x2x5x4x30 debug,run2day,dsectrobin,short smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 16x2x5x10x20 debug,run2day,drakeX2 smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 8x2x8x10x20 debug,run2day,droundrobin,maskhalo smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 1x6x25x29x16 debug,run2day,droundrobin smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 1x8x30x20x32 debug,run2day,droundrobin smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 1x1x120x125x1 debug,run2day,droundrobin,thread smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 16x2x1x1x800 debug,run2day,droundrobin smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 16x2x2x2x200 debug,run2day,droundrobin smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 16x2x3x3x100 debug,run2day,droundrobin smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 16x2x8x8x80 debug,run2day,dspiralcenter smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 10x1x10x29x4 debug,run2day,dsquarepop,thread smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day -smoke gx3 8x1x25x29x4 debug,run2day,drakeX2,thread smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day +smoke gx3 4x2x25x29 debug,run2day,dslenderX2 +smoke gx1 64x1x16x16 debug,run2day,dwghtfile +smoke gx1 32x2x10x12 debug,run2day,dsectcart +smoke gbox180 16x1x6x6 debug,run2day,dspacecurve,debugblocks +smoke gx3 1x1x25x58 debug,run2day,droundrobin,thread smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 20x1x5x116 debug,run2day,dslenderX1,thread smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 6x2x4x29 debug,run2day,dspacecurve smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 8x2x10x12x18 debug,run2day,droundrobin smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 6x2x50x58 debug,run2day,droundrobin smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 5x2x33x23 debug,run2day,droundrobin smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 4x2x19x19x10 debug,run2day,droundrobin smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 20x2x5x4 debug,run2day,dsectrobin,short smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 16x2x5x10 debug,run2day,drakeX2 smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 8x2x8x10x20 debug,run2day,droundrobin,maskhalo smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 1x6x25x29x16 debug,run2day,droundrobin smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 1x8x30x20x32 debug,run2day,droundrobin smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 1x1x120x125x1 debug,run2day,droundrobin,thread smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 16x2x1x1x800 debug,run2day,droundrobin smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 16x2x2x2x200 debug,run2day,droundrobin smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 16x2x3x3x100 debug,run2day,droundrobin smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 16x2x8x8 debug,run2day,dspiralcenter smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 10x1x10x29 debug,run2day,dsquarepop,thread smoke_gx3_4x2x25x29_debug_dslenderX2_run2day +smoke gx3 8x1x25x29 debug,run2day,drakeX2,thread smoke_gx3_4x2x25x29_debug_dslenderX2_run2day diff --git a/configuration/scripts/tests/first_suite.ts b/configuration/scripts/tests/first_suite.ts index bef24d9eb..208c786f8 100644 --- a/configuration/scripts/tests/first_suite.ts +++ b/configuration/scripts/tests/first_suite.ts @@ -2,18 +2,18 @@ smoke gx3 8x2 diag1,run5day # decomp_suite restart gx3 4x2x25x29x4 dslenderX2 -smoke gx3 4x2x25x29x4 debug,run2day,dslenderX2 +smoke gx3 4x2x25x29 debug,run2day,dslenderX2 # reprosum_suite smoke gx3 4x2x25x29x4 dslenderX2,diag1,reprosum # travis_suite smoke gx3 1x2 run2day # gridsys_suite -smoke gx3 1x1x100x116x1 reprosum,run10day -smoke gx1 32x1x16x16x32 reprosum,run10day -smoke gx3 1x1x100x116x1 reprosum,run10day,gridcd -smoke gx1 32x1x16x16x32 reprosum,run10day,gridcd -smoke gx3 1x1x100x116x1 reprosum,run10day,gridc -smoke gx1 32x1x16x16x32 reprosum,run10day,gridc +smoke gx3 1x1x100x116 reprosum,run10day +smoke gx1 32x1x16x16 reprosum,run10day +smoke gx3 1x1x100x116 reprosum,run10day,gridcd +smoke gx1 32x1x16x16 reprosum,run10day,gridcd +smoke gx3 1x1x100x116 reprosum,run10day,gridc +smoke gx1 32x1x16x16 reprosum,run10day,gridc # perf_suite -smoke gx1 32x1x16x16x15 run2day,droundrobin -smoke gx1 64x1x16x16x8 run2day,droundrobin,thread +smoke gx1 32x1x16x16 run2day,droundrobin +smoke gx1 64x1x16x16 run2day,droundrobin,thread diff --git a/configuration/scripts/tests/gridsys_suite.ts b/configuration/scripts/tests/gridsys_suite.ts index e2731dd39..eca6497a4 100644 --- a/configuration/scripts/tests/gridsys_suite.ts +++ b/configuration/scripts/tests/gridsys_suite.ts @@ -1,17 +1,17 @@ # Test Grid PEs Sets BFB-compare -smoke gx3 1x1x100x116x1 reprosum,run10day -smoke gx1 32x1x16x16x32 reprosum,run10day -smoke gx3 1x1x100x116x1 reprosum,run10day,gridcd -smoke gx1 32x1x16x16x32 reprosum,run10day,gridcd -smoke gx3 1x1x100x116x1 reprosum,run10day,gridc -smoke gx1 32x1x16x16x32 reprosum,run10day,gridc +smoke gx3 1x1x100x116 reprosum,run10day +smoke gx1 32x1x16x16 reprosum,run10day +smoke gx3 1x1x100x116 reprosum,run10day,gridcd +smoke gx1 32x1x16x16 reprosum,run10day,gridcd +smoke gx3 1x1x100x116 reprosum,run10day,gridc +smoke gx1 32x1x16x16 reprosum,run10day,gridc smoke gx3 8x2 diag1,run5day smoke gx3 8x4 diag1,run5day,debug restart gx3 4x2 debug,diag1 restart2 gx1 16x2 debug,diag1 restart tx1 40x2 diag1 -smoke gbox12 1x1x12x12x1 boxchan +smoke gbox12 1x1x12x12 boxchan smoke gbox80 4x2 boxchan1e,debug smoke gbox80 8x1 boxchan1n smoke gbox80 1x1 box2001 @@ -22,19 +22,19 @@ smoke gbox80 4x2 boxclosed,boxforcee,run1day smoke gbox80 4x1 boxclosed,boxforcene,run1day,kmtislands smoke gbox80 4x2 boxopen,kmtislands,boxforcee,run1day smoke gbox80 2x2 boxclosed,boxforcen,run1day,vargrid -smoke gx3 1x1x25x29x16 reprosum,run10day,dwblockall smoke_gx3_1x1x100x116x1_reprosum_run10day -smoke gx3 1x1x5x4x580 reprosum,run10day,dwblockall smoke_gx3_1x1x100x116x1_reprosum_run10day -smoke gx3 1x1x5x4x580 reprosum,run10day smoke_gx3_1x1x100x116x1_reprosum_run10day -smoke gx1 32x1x16x16x32 reprosum,run10day,cmplogrest,dwblockall smoke_gx1_32x1x16x16x32_reprosum_run10day -smoke gx1 32x1x16x12x40 reprosum,run10day,cmplogrest,dwblockall smoke_gx1_32x1x16x16x32_reprosum_run10day -smoke gx1 32x1x16x12x40 reprosum,run10day,cmplogrest smoke_gx1_32x1x16x16x32_reprosum_run10day +smoke gx3 1x1x25x29 reprosum,run10day,dwblockall smoke_gx3_1x1x100x116_reprosum_run10day +smoke gx3 1x1x5x4 reprosum,run10day,dwblockall smoke_gx3_1x1x100x116_reprosum_run10day +smoke gx3 1x1x5x4 reprosum,run10day smoke_gx3_1x1x100x116_reprosum_run10day +smoke gx1 32x1x16x16 reprosum,run10day,cmplogrest,dwblockall smoke_gx1_32x1x16x16_reprosum_run10day +smoke gx1 32x1x16x12 reprosum,run10day,cmplogrest,dwblockall smoke_gx1_32x1x16x16_reprosum_run10day +smoke gx1 32x1x16x12 reprosum,run10day,cmplogrest smoke_gx1_32x1x16x16_reprosum_run10day smoke gx3 8x2 diag1,run5day,gridcd smoke gx3 8x4 diag1,run5day,debug,gridcd restart gx3 4x2 debug,diag1,gridcd restart2 gx1 16x2 debug,diag1,gridcd restart tx1 40x2 diag1,gridcd -smoke gbox12 1x1x12x12x1 boxchan,gridcd +smoke gbox12 1x1x12x12 boxchan,gridcd smoke gbox80 4x2 boxchan1e,debug,gridcd smoke gbox80 8x1 boxchan1n,gridcd smoke gbox80 1x1 box2001,gridcd @@ -45,19 +45,19 @@ smoke gbox80 4x2 boxclosed,boxforcee,run1day,gridcd smoke gbox80 4x1 boxclosed,boxforcene,run1day,kmtislands,gridcd smoke gbox80 4x2 boxopen,kmtislands,boxforcee,run1day,gridcd smoke gbox80 2x2 boxclosed,boxforcen,run1day,vargrid,gridcd -smoke gx3 1x1x25x29x16 reprosum,run10day,dwblockall,gridcd smoke_gx3_1x1x100x116x1_gridcd_reprosum_run10day -smoke gx3 1x1x5x4x580 reprosum,run10day,dwblockall,gridcd smoke_gx3_1x1x100x116x1_gridcd_reprosum_run10day -smoke gx3 1x1x5x4x580 reprosum,run10day,gridcd smoke_gx3_1x1x100x116x1_gridcd_reprosum_run10day -smoke gx1 32x1x16x16x32 reprosum,run10day,cmplogrest,dwblockall,gridcd smoke_gx1_32x1x16x16x32_gridcd_reprosum_run10day -smoke gx1 32x1x16x12x40 reprosum,run10day,cmplogrest,dwblockall,gridcd smoke_gx1_32x1x16x16x32_gridcd_reprosum_run10day -smoke gx1 32x1x16x12x40 reprosum,run10day,cmplogrest,gridcd smoke_gx1_32x1x16x16x32_gridcd_reprosum_run10day +smoke gx3 1x1x25x29 reprosum,run10day,dwblockall,gridcd smoke_gx3_1x1x100x116_gridcd_reprosum_run10day +smoke gx3 1x1x5x4 reprosum,run10day,dwblockall,gridcd smoke_gx3_1x1x100x116_gridcd_reprosum_run10day +smoke gx3 1x1x5x4 reprosum,run10day,gridcd smoke_gx3_1x1x100x116_gridcd_reprosum_run10day +smoke gx1 32x1x16x16 reprosum,run10day,cmplogrest,dwblockall,gridcd smoke_gx1_32x1x16x16_gridcd_reprosum_run10day +smoke gx1 32x1x16x12 reprosum,run10day,cmplogrest,dwblockall,gridcd smoke_gx1_32x1x16x16_gridcd_reprosum_run10day +smoke gx1 32x1x16x12 reprosum,run10day,cmplogrest,gridcd smoke_gx1_32x1x16x16_gridcd_reprosum_run10day smoke gx3 8x2 diag1,run5day,gridc smoke gx3 8x4 diag1,run5day,debug,gridc restart gx3 4x2 debug,diag1,gridc restart2 gx1 16x2 debug,diag1,gridc restart tx1 40x2 diag1,gridc -smoke gbox12 1x1x12x12x1 boxchan,gridc +smoke gbox12 1x1x12x12 boxchan,gridc smoke gbox80 4x2 boxchan1e,debug,gridc smoke gbox80 8x1 boxchan1n,gridc smoke gbox80 1x1 box2001,gridc @@ -68,9 +68,9 @@ smoke gbox80 4x2 boxclosed,boxforcee,run1day,gridc smoke gbox80 4x1 boxclosed,boxforcene,run1day,kmtislands,gridc smoke gbox80 4x2 boxopen,kmtislands,boxforcee,run1day,gridc smoke gbox80 2x2 boxclosed,boxforcen,run1day,vargrid,gridc -smoke gx3 1x1x25x29x16 reprosum,run10day,dwblockall,gridc smoke_gx3_1x1x100x116x1_gridc_reprosum_run10day -smoke gx3 1x1x5x4x580 reprosum,run10day,dwblockall,gridc smoke_gx3_1x1x100x116x1_gridc_reprosum_run10day -smoke gx3 1x1x5x4x580 reprosum,run10day,gridc smoke_gx3_1x1x100x116x1_gridc_reprosum_run10day -smoke gx1 32x1x16x16x32 reprosum,run10day,cmplogrest,dwblockall,gridc smoke_gx1_32x1x16x16x32_gridc_reprosum_run10day -smoke gx1 32x1x16x12x40 reprosum,run10day,cmplogrest,dwblockall,gridc smoke_gx1_32x1x16x16x32_gridc_reprosum_run10day -smoke gx1 32x1x16x12x40 reprosum,run10day,cmplogrest,gridc smoke_gx1_32x1x16x16x32_gridc_reprosum_run10day +smoke gx3 1x1x25x29 reprosum,run10day,dwblockall,gridc smoke_gx3_1x1x100x116_gridc_reprosum_run10day +smoke gx3 1x1x5x4 reprosum,run10day,dwblockall,gridc smoke_gx3_1x1x100x116_gridc_reprosum_run10day +smoke gx3 1x1x5x4 reprosum,run10day,gridc smoke_gx3_1x1x100x116_gridc_reprosum_run10day +smoke gx1 32x1x16x16 reprosum,run10day,cmplogrest,dwblockall,gridc smoke_gx1_32x1x16x16_gridc_reprosum_run10day +smoke gx1 32x1x16x12 reprosum,run10day,cmplogrest,dwblockall,gridc smoke_gx1_32x1x16x16_gridc_reprosum_run10day +smoke gx1 32x1x16x12 reprosum,run10day,cmplogrest,gridc smoke_gx1_32x1x16x16_gridc_reprosum_run10day diff --git a/configuration/scripts/tests/perf_suite.ts b/configuration/scripts/tests/perf_suite.ts index a4d8ef588..a7da95390 100644 --- a/configuration/scripts/tests/perf_suite.ts +++ b/configuration/scripts/tests/perf_suite.ts @@ -1,29 +1,29 @@ # Test Grid PEs Sets BFB-compare -smoke gx1 32x1x16x16x15 run2day,droundrobin -smoke gx1 64x1x16x16x8 run2day,droundrobin,thread +smoke gx1 32x1x16x16 run2day,droundrobin +smoke gx1 64x1x16x16 run2day,droundrobin,thread # -smoke gx1 1x1x320x384x1 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -smoke gx1 1x1x160x192x4 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -smoke gx1 1x1x80x96x16 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -smoke gx1 1x1x40x48x64 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -smoke gx1 1x1x20x24x256 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day +smoke gx1 1x1x320x384 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +smoke gx1 1x1x160x192 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +smoke gx1 1x1x80x96 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +smoke gx1 1x1x40x48 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +smoke gx1 1x1x20x24 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day # -smoke gx1 1x1x16x16x480 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -smoke gx1 2x1x16x16x240 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -smoke gx1 4x1x16x16x120 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -smoke gx1 8x1x16x16x60 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -smoke gx1 16x1x16x16x30 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -#smoke gx1 32x1x16x16x15 run2day,droundrobin -smoke gx1 64x1x16x16x8 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -smoke gx1 128x1x16x16x4 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day +smoke gx1 1x1x16x16 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +smoke gx1 2x1x16x16 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +smoke gx1 4x1x16x16 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +smoke gx1 8x1x16x16 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +smoke gx1 16x1x16x16 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +#smoke gx1 32x1x16x16 run2day,droundrobin +smoke gx1 64x1x16x16 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +smoke gx1 128x1x16x16 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day # -smoke gx1 64x1x16x16x8 run2day,droundrobin smoke_gx1_32x1x16x16x15_droundrobin_run2day -#smoke gx1 64x1x16x16x8 run2day,droundrobin,thread -smoke gx1 32x2x16x16x16 run2day,droundrobin smoke_gx1_64x1x16x16x8_droundrobin_run2day_thread -smoke gx1 16x4x16x16x32 run2day,droundrobin smoke_gx1_64x1x16x16x8_droundrobin_run2day_thread -smoke gx1 8x8x16x16x64 run2day,droundrobin smoke_gx1_64x1x16x16x8_droundrobin_run2day_thread -smoke gx1 4x16x16x16x128 run2day,droundrobin smoke_gx1_64x1x16x16x8_droundrobin_run2day_thread -smoke gx1 32x2x16x16x16 run2day,droundrobin,ompscheds smoke_gx1_64x1x16x16x8_droundrobin_run2day_thread -smoke gx1 32x2x16x16x16 run2day,droundrobin,ompschedd1 smoke_gx1_64x1x16x16x8_droundrobin_run2day_thread -smoke gx1 32x2x16x16x16 run2day,droundrobin,ompscheds1 smoke_gx1_64x1x16x16x8_droundrobin_run2day_thread +smoke gx1 64x1x16x16 run2day,droundrobin smoke_gx1_32x1x16x16_droundrobin_run2day +#smoke gx1 64x1x16x16 run2day,droundrobin,thread +smoke gx1 32x2x16x16 run2day,droundrobin smoke_gx1_64x1x16x16_droundrobin_run2day_thread +smoke gx1 16x4x16x16 run2day,droundrobin smoke_gx1_64x1x16x16_droundrobin_run2day_thread +smoke gx1 8x8x16x16 run2day,droundrobin smoke_gx1_64x1x16x16_droundrobin_run2day_thread +smoke gx1 4x16x16x16 run2day,droundrobin smoke_gx1_64x1x16x16_droundrobin_run2day_thread +smoke gx1 32x2x16x16 run2day,droundrobin,ompscheds smoke_gx1_64x1x16x16_droundrobin_run2day_thread +smoke gx1 32x2x16x16 run2day,droundrobin,ompschedd1 smoke_gx1_64x1x16x16_droundrobin_run2day_thread +smoke gx1 32x2x16x16 run2day,droundrobin,ompscheds1 smoke_gx1_64x1x16x16_droundrobin_run2day_thread # diff --git a/configuration/scripts/tests/unittest_suite.ts b/configuration/scripts/tests/unittest_suite.ts index 840fc822e..779e218ff 100644 --- a/configuration/scripts/tests/unittest_suite.ts +++ b/configuration/scripts/tests/unittest_suite.ts @@ -15,21 +15,34 @@ unittest gx1 28x1 gridavgchk,dwblockall unittest gx1 16x2 gridavgchk unittest gbox128 8x2 gridavgchk unittest gbox80 1x1x10x10x80 halochk,cyclic,debug +unittest gbox80 1x1x10x10 halochk,cyclic,debug unittest gbox80 1x1x24x23x16 halochk +unittest gbox80 1x1x24x23 halochk unittest gbox80 1x1x23x24x16 halochk,cyclic +unittest gbox80 1x1x23x24 halochk,cyclic unittest gbox80 1x1x23x23x16 halochk,open +unittest gbox80 1x1x23x23 halochk,open unittest tx1 1x1x90x60x16 halochk,dwblockall +unittest tx1 1x1x90x60 halochk,dwblockall unittest tx1 1x1x90x60x16 halochk,dwblockall,tripolet +unittest tx1 1x1x90x60 halochk,dwblockall,tripolet unittest tx1 1x1x95x65x16 halochk,dwblockall +unittest tx1 1x1x95x65 halochk,dwblockall unittest tx1 1x1x95x65x16 halochk,dwblockall,tripolet +unittest tx1 1x1x95x65 halochk,dwblockall,tripolet unittest gx3 4x2 halochk,dwblockall,debug unittest gx3 8x2x16x12x10 halochk,cyclic,dwblockall +unittest gx3 8x2x16x12 halochk,cyclic,dwblockall unittest gx3 17x1x16x12x10 halochk,open,dwblockall +unittest gx3 17x1x16x12 halochk,open,dwblockall unittest tx1 4x2 halochk,dwblockall unittest tx1 4x2 halochk,dwblockall,tripolet unittest tx1 4x2x65x45x10 halochk,dwblockall +unittest tx1 4x2x65x45 halochk,dwblockall unittest tx1 4x2x57x43x12 halochk,dwblockall,tripolet +unittest tx1 4x2x57x43 halochk,dwblockall,tripolet unittest gx3 1x1 optargs unittest gx3 1x1 opticep unittest gx3 4x2x25x29x4 debug,run2day,dslenderX2,opticep,cmplog smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day +unittest gx3 4x2x25x29 debug,run2day,dslenderX2,opticep,cmplog smoke_gx3_4x2x25x29x4_debug_dslenderX2_run2day unittest gx3 8x2 diag1,run5day,opticep,cmplog smoke_gx3_8x2_diag1_run5day diff --git a/doc/source/user_guide/ug_case_settings.rst b/doc/source/user_guide/ug_case_settings.rst index 6deab8c11..405e64dc1 100644 --- a/doc/source/user_guide/ug_case_settings.rst +++ b/doc/source/user_guide/ug_case_settings.rst @@ -369,6 +369,7 @@ domain_nml "``maskhalo_remap``", "logical", "mask unused halo cells for transport", "``.false.``" "``maskhalo_bound``", "logical", "mask unused halo cells for boundary updates", "``.false.``" "``max_blocks``", "integer", "maximum number of blocks per MPI task for memory allocation", "-1" + "", "``-1``", "find number of blocks per MPI task automatically", "" "``nprocs``", "integer", "number of MPI tasks to use", "-1" "", "``-1``", "find number of MPI tasks automatically", "" "``ns_boundary_type``", "``cyclic``", "periodic boundary conditions in y-direction", "``open``" diff --git a/doc/source/user_guide/ug_implementation.rst b/doc/source/user_guide/ug_implementation.rst index 7d172e91d..4f349c264 100644 --- a/doc/source/user_guide/ug_implementation.rst +++ b/doc/source/user_guide/ug_implementation.rst @@ -213,22 +213,24 @@ ghost cells, and the same numbering system is applied to each of the four subdomains. The user sets the ``NTASKS`` and ``NTHRDS`` settings in **cice.settings** -and chooses a block size ``block_size_x`` :math:`\times`\ ``block_size_y``, -``max_blocks``, and decomposition information ``distribution_type``, ``processor_shape``, -and ``distribution_type`` in **ice_in**. That information is used to -determine how the blocks are -distributed across the processors, and how the processors are -distributed across the grid domain. The model is parallelized over blocks +and chooses a block size, ``block_size_x`` :math:`\times`\ ``block_size_y``, +and decomposition information ``distribution_type``, ``processor_shape``, +and ``distribution_wgt`` in **ice_in**. +This information is used to determine how the blocks are +distributed across the processors. The model is parallelized over blocks for both MPI and OpenMP. Some suggested combinations for these parameters for best performance are given in Section :ref:`performance`. The script **cice.setup** computes some default decompositions and layouts -but the user can overwrite the defaults by manually changing the values in -`ice_in`. At runtime, the model will print decomposition +but the user can override the defaults by manually changing the values in +`ice_in`. The number of blocks per processor can vary, and this is computed +internally when the namelist ``max_blocks=-1``. ``max_blocks`` +can also be set by the user, although this may use extra memory and the +model will abort if ``max_blocks`` is set too small for the decomposition. +At runtime, the model will print decomposition information to the log file, and if the block size or max blocks is inconsistent with the task and thread size, the model will abort. The code will also print a warning if the maximum number of blocks is too large. -Although this is not fatal, it does use extra memory. If ``max_blocks`` is -set to -1, the code will compute a tentative ``max_blocks`` on the fly. +Although this is not fatal, it does use extra memory. A loop at the end of routine *create_blocks* in module **ice_blocks.F90** will print the locations for all of the blocks on