libpmi: add support for PMI_PORT #2156

garlick · 2019-05-10T18:04:03Z

Currently the Flux simple PMI implementation exclusively uses file descriptor passing to establish a connection between a PMI provider such as flux-start or wrexecd (job shell) and a program rank. The PMI provider creates a socketpair, then passes the client end of it to the program rank via the PMI_FD environment variable. This mechanism is used in mpich and is sort of a defacto standard for the PMI-1 wire protocol. I documented it in Flux RFC 13.

It may be useful to allow program ranks to connect remotely to a PMI provider, or to allow multiple threads within a rank to establish ndependent connections.

There exists in the MPICH code base another option for establishing a PMI-1 wire protocol connection that is less commonly used (and configured off by default IIRC). If one sets an envirnoment variable PMI_PORT to a hostname:port tuple, a program rank can connect to a PMI provider over TCP.

Supporting this mechanism in the PMI implementation in flux-start could enable an instance to be started with pdsh or similar. flux-start on rank 0 would need to do something like spawn a script that could obtain the allocated port number, and then run a script that calls something like

pdsh -w hostlist flux-start --pmi-port=hostname:port --pmi-rank=%n  --pmi-size=size`

Supporting it in wrexecd (or job shell) could help with #1789

Security and scalability concerns apply of course.

The text was updated successfully, but these errors were encountered:

grondo · 2019-05-10T18:14:26Z

Great idea, and this might help with auto-start of Flux instance under Slurm.

For this bootstrap mode, two new options would be needed

The hostlist of target systems
The path to the script used to run the remaining (non rank 0) flux-start commands

e.g., a flux session could be started with simply:

$ flux start --bootstrap=rsh --hostlist=foo[0-12] --rsh-command=/path/to/script

Once flux-start has opened the PMI port, it could then launch the configured rsh-command (perhaps with sensible default), substituting $PMI_HOST $PMI_PORT $PMI_SIZE and $PMI_HOSTLIST in the environment of the script.

garlick · 2019-06-05T13:43:33Z

I got part way done implementing this in #2172 and realized that, for the multi thread case (e.g. PAMI with PMIx calls intercepted + openmpi in spectrum MPI), if both threads are doing a put / barrier /get pattern, there was no way to prevent the barrier calls from becoming interspersed, since the barriers are "anonymous" (unnamed). For example, thread 0 might enter the barrier first one rank, and thread 1 might enter it first on another rank, and the barrier count might be reached before either barrier is complete, causing premature barrier exit.

However there may be a way to distinguish the two threads. When PMI_PORT is used, an additional initack handshake is performed in which the client presents itself with an ID in addition to the rank. The ABNF wire protocol looks this:

    C:initack = "cmd=initack" SP "pmiid=" int LF
    S:initack = "cmd=initack" LF
    S:initack = "cmd=set" SP "size=" int LF
    S:initack = "cmd=set" SP "rank=" int LF
    S:initack = "cmd=set" SP "debug=" int LF

In the mpich code it looked like the ID was passed to the client in a PMI_ID environment variable and I think this is used to identify which rank is connecting on the common listen port. But if say the PAMI interceptor were to create a unique ID (like rank + (size * tid), where tid=1 for pami and 0 for openmpi, maybe we could use this to keep the barriers separate?

garlick · 2019-06-05T13:54:50Z

This cannot be done using the PMI-1 API however, since PMI-1 doesn't provide a way to set the id via the API. So I guess it would need to be via direct use of the flux `pmi_simple_client class (not currently exported). This behavior is also not defined for PMI-1 as far as I know so we may be straying into a bad place that increases complexity to introduce unexpected behavior.

garlick · 2022-08-17T17:35:37Z

This is probably not an advisable change to make, despite its existence in the mpich reference implementation. We now have flux-pmix to deal with PAMI/spectrum MPI (or at least that is the preferred path to a solution).

garlick closed this as completed Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libpmi: add support for PMI_PORT #2156

libpmi: add support for PMI_PORT #2156

garlick commented May 10, 2019 •

edited

Loading

grondo commented May 10, 2019

garlick commented Jun 5, 2019

garlick commented Jun 5, 2019

garlick commented Aug 17, 2022

libpmi: add support for PMI_PORT #2156

libpmi: add support for PMI_PORT #2156

Comments

garlick commented May 10, 2019 • edited Loading

grondo commented May 10, 2019

garlick commented Jun 5, 2019

garlick commented Jun 5, 2019

garlick commented Aug 17, 2022

garlick commented May 10, 2019 •

edited

Loading