Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace MPI calls with ESMF calls #100

Open
mvertens opened this issue Jun 7, 2024 · 8 comments
Open

Replace MPI calls with ESMF calls #100

mvertens opened this issue Jun 7, 2024 · 8 comments

Comments

@mvertens
Copy link
Contributor

mvertens commented Jun 7, 2024

In MOSART - the calls using
shr_mpi_sum, shr_mpi_max, shr_mpi_min, mpi_barrier, mpi_bcast could be replaced by calls to
ESMF_VMAllReduce and ESMF_VMBroadcast.
The following must be taken into account, however:

  • ESMF_VMAllReduce only accepts one-dimensional arrays. tmp_in and tmp_glob are 2d-arrays. So copies will need to be made and 6 calls will need to be done and then copies need to be made.
  • Using ESMF_VMBroadcast -only accepts 1d arrays. That means that an array of length 1 needs to be created and then a copy of finidat needs to be made to that and then broadcast and then copied back. I think this will make the code more complicated and harder to read.

ESMF currently only has interfaces to a small subset of MPI communications and only supports arrays in those interfaces. So this needs to be taken into consideration in the decision to replace the MPI calls with ESMF calls.

@ekluzek
Copy link
Contributor

ekluzek commented Jun 7, 2024

@mvertens thanks for the analysis there. It sounds like at least for MOSART this wouldn't be a good thing to do right now anyway. @billsacks as our ESMF rep. do you think this would be good to do in the future?

@jedwards4b made the following point about doing this here:

#94 (review)

Since, ESMF VM objects abstract the distributed memory communication package it's a general interface where MPI could be switched to another protocol, but all under the covers without touching CESM code. That's a worthwhile thing to do even if we don't do it immediately.

@billsacks
Copy link
Member

I'd like to understand this better before weighing in on it myself. It sounds like it may complicate the code somewhat, but buys us a potential for more abstraction / long-term maintainability.

Things I'd like to understand - and this probably warrants a discussion with others on the ESMF core team - are:

  • What is the likelihood of needing to move to something other than mpi? I assume, that, in the near future, this would be for GPUs???
  • Is it reasonable for ESMF to support whatever alternative(s) CESM might want? i.e., will ESMF_VMAllReduce and ESMF_VMBroadcast actually support the potentially-desired underlying calls?
  • How much work would it be for ESMF to extend its wrappers to keep the CESM code simpler? If feasible, it seems like that would be better than adding complexity in the CESM code due to limitations in the ESMF wrappers.

@jedwards4b
Copy link
Contributor

Exactly the discussion you should have. We may need to move away from mpi or to some combination of mpi with something else - ideally we would keep that complexity isolated.
I think that ESMF_VM* will need to be extended to work with scalars and multidimensional arrays.
I don't think it would be too complicated and I think this embraces a vision of what the esmf interface should be capable of.

@billsacks
Copy link
Member

Sounds great, @jedwards4b . I'll add this to the agenda for an upcoming ESMF core team call. The team is stretched thin right now, but we can discuss whether it makes sense to put this on our roadmap.

@billsacks
Copy link
Member

@jedwards4b - one thing that could help for this discussion is: Do you have any examples of alternative calls? e.g., for mpi_bcast, what would be a specific alternative that could be wrapped by the ESMF calls?

@jedwards4b
Copy link
Contributor

The CMEPS mediator is an example, there are no calls to MPI there, only ESMF is used.

@billsacks
Copy link
Member

Sorry, let me clarify: what I meant is: what would be the underlying library call that takes the place of mpi_bcast that the ESMF_VMBroadcast might wrap in the future?

@jedwards4b
Copy link
Contributor

I don't have any examples - perhaps Gerhard does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants