-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libpmi: add support for PMI_PORT #2156
Comments
Great idea, and this might help with auto-start of Flux instance under Slurm. For this bootstrap mode, two new options would be needed
e.g., a flux session could be started with simply:
Once flux-start has opened the PMI port, it could then launch the configured |
I got part way done implementing this in #2172 and realized that, for the multi thread case (e.g. PAMI with PMIx calls intercepted + openmpi in spectrum MPI), if both threads are doing a put / barrier /get pattern, there was no way to prevent the barrier calls from becoming interspersed, since the barriers are "anonymous" (unnamed). For example, thread 0 might enter the barrier first one rank, and thread 1 might enter it first on another rank, and the barrier count might be reached before either barrier is complete, causing premature barrier exit. However there may be a way to distinguish the two threads. When PMI_PORT is used, an additional
In the mpich code it looked like the ID was passed to the client in a PMI_ID environment variable and I think this is used to identify which rank is connecting on the common listen port. But if say the PAMI interceptor were to create a unique ID (like rank + (size * tid), where tid=1 for pami and 0 for openmpi, maybe we could use this to keep the barriers separate? |
This cannot be done using the PMI-1 API however, since PMI-1 doesn't provide a way to set the id via the API. So I guess it would need to be via direct use of the flux `pmi_simple_client class (not currently exported). This behavior is also not defined for PMI-1 as far as I know so we may be straying into a bad place that increases complexity to introduce unexpected behavior. |
This is probably not an advisable change to make, despite its existence in the mpich reference implementation. We now have flux-pmix to deal with PAMI/spectrum MPI (or at least that is the preferred path to a solution). |
Currently the Flux simple PMI implementation exclusively uses file descriptor passing to establish a connection between a PMI provider such as
flux-start
orwrexecd
(job shell) and a program rank. The PMI provider creates a socketpair, then passes the client end of it to the program rank via the PMI_FD environment variable. This mechanism is used in mpich and is sort of a defacto standard for the PMI-1 wire protocol. I documented it in Flux RFC 13.It may be useful to allow program ranks to connect remotely to a PMI provider, or to allow multiple threads within a rank to establish ndependent connections.
There exists in the MPICH code base another option for establishing a PMI-1 wire protocol connection that is less commonly used (and configured off by default IIRC). If one sets an envirnoment variable PMI_PORT to a
hostname:port
tuple, a program rank can connect to a PMI provider over TCP.Supporting this mechanism in the PMI implementation in
flux-start
could enable an instance to be started with pdsh or similar.flux-start
on rank 0 would need to do something like spawn a script that could obtain the allocated port number, and then run a script that calls something likeSupporting it in
wrexecd
(or job shell) could help with #1789Security and scalability concerns apply of course.
The text was updated successfully, but these errors were encountered: