Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpmi: move some complexity to broker #2172

Merged
merged 12 commits into from
Jun 5, 2019
Merged

Conversation

garlick
Copy link
Member

@garlick garlick commented May 29, 2019

(This needs some more tests but wanted to park it publicly since I'll be out until June 10)

This is a refactoring of the Flux PMI-1 code after the realization that it would be easier to reason about our libpmi.so if it was only used by jobs launched by Flux, not by the Flux broker. The Flux broker needs to not only bootstrap under Flux, but under SLURM, potentially other resource managers, and standalone. Jobs launched by Flux, however, just need to use Flux's preferred method (PMI-1 wire protocol).

As such, libpmi.so is now only a simple PMI-1 wire protocol client. If PMI_FD, PMI_RANK, and PMI_SIZE are not set, this fails. The code formerly in libpmi.so to also try dlopening() SLURM's libpmi.so or falling back to singleton operation is now in the broker. Since it doesn't need to be a faithful PMI-1 ABI, it is quite a bit simpler.

Edit: removing this, see comment below:

Finally, libpmi.so is now thread safe, in the coarsest way possible - by spinlock. This, in combination with a usecount incremented by PMI_Init() and decremented by PMI_Finalize(), should allow e.g. a PAMI thread and an OpenMPI thread in one process to independently make PMI-1 calls without scrambling the wire protocol. Fixes #1789.

I have a client implementation of the PMI_PORT (TCP) connection method that I didn't add to this PR. It had a ways to go on the server side, and I think the changes proposed above ought to be sufficient for the spectrum use case described. (We might have other uses for PMI_PORT but it may not need to be in this PR).

@garlick
Copy link
Member Author

garlick commented May 30, 2019

Just thought of a couple of problems that this lock won't fix.

If thread 1 calls PMI_Init() ... PMI_Finalize() then thread 2 tries to call PMI_Init(), the finalize will already have decremented the usecount to zero and closed the file descriptor. OK, we could just leave the file descriptor open indefinitely since we didn't open it, but is that reasonable?

If both threads attempt barriers, since the barriers are "anonymous" (unnamed), the barrier entries from the two threads could be co-mingled, causing a premature barrier exit. This is a more serious problem, and we know that it will happen in the PAMI use case.

I think the PMI_PORT solution is required in order to solve #1789 after all. I'll go ahead and back out the lock and usecount on the global PMI context. Sigh.

@garlick garlick changed the title libpmi: add thread safety, move some complexity to broker libpmi: move some complexity to broker May 30, 2019
@garlick
Copy link
Member Author

garlick commented May 30, 2019

If both threads attempt barriers, since the barriers are "anonymous" (unnamed), the barrier entries from the two threads could be co-mingled, causing a premature barrier exit. This is a more serious problem, and we know that it will happen in the PAMI use case.

Argh. PMI_PORT won't fix this. Barriers are still anonymous. There may not be a way to map that PMIx usage onto PMI-1.

@garlick
Copy link
Member Author

garlick commented May 30, 2019

I'll try to get a little more test coverage going this evening (when I get to my hotel) and then try to wrap this up as general PMI cleanup. As evidenced by issues like #2170, anything that makes what's happening easier to understand (in code and/or behavior) is probably useful.

@garlick garlick force-pushed the pmi_port2 branch 3 times, most recently from 6b85635 to 5f6a993 Compare May 31, 2019 15:49
@garlick
Copy link
Member Author

garlick commented May 31, 2019

I think this might be close to ready now that the coverage is decent.

I did try to test it on ipa before I left town, but due to lack of free nodes, was only able to check that an instance did dlopen the slurm ibpmi.so (in singleton mode), not that an instance could be launched under slurm (ran out of time). If someone has a chance to give that a quick try I'd appreciate it.

Some squashing is needed before this can be merged.

@grondo
Copy link
Contributor

grondo commented May 31, 2019

I can try it on fluke, unless there is something special about api vs our dedicated test cluster? (If so we should get that fixed)

@garlick
Copy link
Member Author

garlick commented May 31, 2019

I didn't try - sorry, was hurrying up to leave and thought of ipa first.

Hmm, I wonder if it would be feasible to add a conditionally run sharness test for this and then bring up a slurm instance in one of our docker containers for travis?

@grondo
Copy link
Contributor

grondo commented May 31, 2019

Hmm, I wonder if it would be feasible to add a conditionally run sharness test for this and then bring up a slurm instance in one of our docker containers for travis?

That is a great idea. I can look into that.

@dongahn
Copy link
Member

dongahn commented May 31, 2019

Hmm, I wonder if it would be feasible to add a conditionally run sharness test for this and then bring up a slurm instance in one of our docker containers for travis?

Being able to bring up a slurmy instance in the Travis CI can help our debugging tools tested better as well. This can be a multi-purpose item. We have a long standing issue LLNL/LaunchMON#25. Tagging @lee218llnl.

@grondo
Copy link
Contributor

grondo commented May 31, 2019

@garlick, what am I looking for in testing with Slurm? I can launch a Flux instance under Slurm on your branch. With FLUX_PMI_DEBUG set, the brokers report lt-flux-broker: using PMI-1 wire protocol.
Is that sufficient?

fluke is running slurm 18.08.6-2

@garlick
Copy link
Member Author

garlick commented May 31, 2019 via email

@garlick
Copy link
Member Author

garlick commented Jun 2, 2019

Squashed. I'd be OK if this went in, if it looks OK.

@grondo
Copy link
Contributor

grondo commented Jun 3, 2019

This looks great! Further testing showed some leaks in the test pmi server thread. Not at all critical, but this patch closes the leaks:

diff --git a/src/common/libpmi/test/server_thread.c b/src/common/libpmi/test/server_thread.c
index 8485e3f..2a10cd1 100644
--- a/src/common/libpmi/test/server_thread.c
+++ b/src/common/libpmi/test/server_thread.c
@@ -147,6 +147,7 @@ static void *server_thread (void *arg)
         BAIL_OUT ("flux_reactor_run failed");
     for (i = 0; i < ctx->size; i++)
         flux_watcher_destroy (w[i]);
+    free (w);
     flux_reactor_destroy (reactor);
     return NULL;
 }
@@ -205,6 +206,7 @@ void pmi_server_destroy (struct pmi_server_context *ctx)
     pmi_simple_server_destroy (ctx->pmi);
     zhashx_destroy (&ctx->kvs);
     ctx->magic = ~MAGIC_VALUE;
+    free (ctx->sfd);
     free (ctx);
 }
 

@garlick
Copy link
Member Author

garlick commented Jun 3, 2019 via email

@garlick
Copy link
Member Author

garlick commented Jun 3, 2019 via email

Problem: our PMI-2 library is incomplete and is an
attractive nuisance at this point.

It was added to allow OpenMPI jobs to be launched under
flux when OpenMPI, through its MCA plugins, determines that
it should dlopen slurm's libpmi2.so.  OpenMPI could be
tricked via LD_LIBRARY_PATH into opening a Flux libpmi2.so
that provides the minimal API needed by OpenMPI, and maps
it to the simple PMI-1 wire protocol.  For more background
see flux-framework#746

Since we submitted a flux MCA module that helps OpenMPI
choose the right PMI in the flux environment, and shipping
this libpmi2.so with flux is misleading as it doesn't implement
the full PMI-2 "spec", let's get rid of it.

Fixes flux-framework#2082
Problem: function for converting array of integers
to comma-separated string exists in both test/pminfo.c
and test/clique.c.

Refactor to pmi_cliquetostr() in clique.[ch].
Problem: the PMI implementation in libpmi is a source of
confusion because it's used by the flux broker to bootstrap
in non-Flux environments, AND it is the PMI library provided
to users for launching under Flux.

We can significantly dumb this thing down if it is assumed
to only be used by users running under Flux and then it will
be more understandable when looked at only from that perspective.
Drop singleton and dlopen (wrap) methods, and in fact drop the
method abstraction and just assume the "simple client" is the
only method.

Update pmi unit tests.
Problem: PMI library doesn't consistently protect against
NULL parameters used to return values (segfault).

Return PMI_ERR_INVALID_ARG in those cases.
Problem: strtoul() and related functions are used without
proper error handling.

As recommended in strtoul(3), one must set errno = 0 before
the call and check for errno != 0 after the call.
Problem: the code for obtaining the "clique" information
from the 'PMI_process_mapping' key hardwires canonical
PMI API functions, but a user (internal for now) might want
to use the handled simple_client functions instead.

Move this code to the simple_client class.  Currently this is
used behind the canonical clique functions in the Flux libpmi.so,
but not internally.  In the past it was used to assist with
PGM wireum.

Update test/pminfo.
Problem: support dlopen()ing libpmi.so and for
falling back to singleton operation was removed
from the Flux libpmi.so, but the broker needs
this capability.

Re-add this capability, with significantly reduced
complexity given that it need not be a faithful
PMI-1 API interface.
Problem: libpmi/test/pminfo.c didn't call PMI_Get_universe_size().

Add a call.
Pull out the server thread to its own reusable module,
and change it slightly so that it could support multiple
clients.
Problem: the PMI_*() API functions don't have unit
tests per se.

Add a unit test for these interfaces.
@garlick
Copy link
Member Author

garlick commented Jun 5, 2019

Pulled in @grondo's fix, rebased on current master, and squashed.

@codecov-io
Copy link

Codecov Report

Merging #2172 into master will increase coverage by 0.47%.
The diff coverage is 80.91%.

@@            Coverage Diff             @@
##           master    #2172      +/-   ##
==========================================
+ Coverage   80.45%   80.92%   +0.47%     
==========================================
  Files         200      199       -1     
  Lines       31899    31597     -302     
==========================================
- Hits        25665    25571      -94     
+ Misses       6234     6026     -208
Impacted Files Coverage Δ
src/common/libpmi/keyval.c 100% <100%> (ø) ⬆️
src/common/libpmi/clique.c 100% <100%> (+42.06%) ⬆️
src/broker/boot_pmi.c 63.86% <53.7%> (+0.43%) ⬆️
src/broker/pmiutil.c 70.73% <70.73%> (ø)
src/broker/liblist.c 85.07% <85.07%> (ø)
src/common/libpmi/pmi.c 94.24% <89.7%> (+27.19%) ⬆️
src/common/libpmi/simple_client.c 91.71% <98.8%> (-4.86%) ⬇️
... and 4 more

@grondo
Copy link
Contributor

grondo commented Jun 5, 2019

Great! Merging.

@grondo grondo merged commit e6614e1 into flux-framework:master Jun 5, 2019
@garlick garlick deleted the pmi_port2 branch June 5, 2019 14:30
@garlick
Copy link
Member Author

garlick commented Jun 5, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libpmi: support multiple PMI sessions per rank
4 participants