-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libpmi: move some complexity to broker #2172
Conversation
Just thought of a couple of problems that this lock won't fix. If thread 1 calls If both threads attempt barriers, since the barriers are "anonymous" (unnamed), the barrier entries from the two threads could be co-mingled, causing a premature barrier exit. This is a more serious problem, and we know that it will happen in the PAMI use case. I think the PMI_PORT solution is required in order to solve #1789 after all. I'll go ahead and back out the lock and usecount on the global PMI context. Sigh. |
Argh. PMI_PORT won't fix this. Barriers are still anonymous. There may not be a way to map that PMIx usage onto PMI-1. |
I'll try to get a little more test coverage going this evening (when I get to my hotel) and then try to wrap this up as general PMI cleanup. As evidenced by issues like #2170, anything that makes what's happening easier to understand (in code and/or behavior) is probably useful. |
6b85635
to
5f6a993
Compare
I think this might be close to ready now that the coverage is decent. I did try to test it on ipa before I left town, but due to lack of free nodes, was only able to check that an instance did dlopen the slurm ibpmi.so (in singleton mode), not that an instance could be launched under slurm (ran out of time). If someone has a chance to give that a quick try I'd appreciate it. Some squashing is needed before this can be merged. |
I can try it on fluke, unless there is something special about api vs our dedicated test cluster? (If so we should get that fixed) |
I didn't try - sorry, was hurrying up to leave and thought of ipa first. Hmm, I wonder if it would be feasible to add a conditionally run sharness test for this and then bring up a slurm instance in one of our docker containers for travis? |
That is a great idea. I can look into that. |
Being able to bring up a slurmy instance in the Travis CI can help our debugging tools tested better as well. This can be a multi-purpose item. We have a long standing issue LLNL/LaunchMON#25. Tagging @lee218llnl. |
@garlick, what am I looking for in testing with Slurm? I can launch a Flux instance under Slurm on your branch. With fluke is running slurm 18.08.6-2 |
Good enough. Thanks!
…On Fri, May 31, 2019, 10:31 AM Mark Grondona ***@***.***> wrote:
@garlick <https://github.com/garlick>, what am I looking for in testing
with Slurm? I can launch a Flux instance under Slurm on your branch. With
FLUX_PMI_DEBUG set, the brokers report lt-flux-broker: using PMI-1 wire
protocol.
Is that sufficient?
fluke is running slurm 18.08.6-2
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2172?email_source=notifications&email_token=AABJPW5UXZCNATOQJST4FOTPYFOF5A5CNFSM4HQXRU22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWV4CWA#issuecomment-497795416>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABJPW34P7UBP2C4A4EAALTPYFOF5ANCNFSM4HQXRU2Q>
.
|
Squashed. I'd be OK if this went in, if it looks OK. |
This looks great! Further testing showed some leaks in the test pmi server thread. Not at all critical, but this patch closes the leaks: diff --git a/src/common/libpmi/test/server_thread.c b/src/common/libpmi/test/server_thread.c
index 8485e3f..2a10cd1 100644
--- a/src/common/libpmi/test/server_thread.c
+++ b/src/common/libpmi/test/server_thread.c
@@ -147,6 +147,7 @@ static void *server_thread (void *arg)
BAIL_OUT ("flux_reactor_run failed");
for (i = 0; i < ctx->size; i++)
flux_watcher_destroy (w[i]);
+ free (w);
flux_reactor_destroy (reactor);
return NULL;
}
@@ -205,6 +206,7 @@ void pmi_server_destroy (struct pmi_server_context *ctx)
pmi_simple_server_destroy (ctx->pmi);
zhashx_destroy (&ctx->kvs);
ctx->magic = ~MAGIC_VALUE;
+ free (ctx->sfd);
free (ctx);
}
|
Good catch! If you want to push that to my branch and then merge that
would be great!
…On Mon, Jun 3, 2019, 10:42 AM Mark Grondona ***@***.***> wrote:
This looks great! Further testing showed some leaks in the test pmi server
thread. Not at all critical, but this patch closes the leaks:
diff --git a/src/common/libpmi/test/server_thread.c b/src/common/libpmi/test/server_thread.c
index 8485e3f..2a10cd1 100644--- a/src/common/libpmi/test/server_thread.c+++ b/src/common/libpmi/test/server_thread.c@@ -147,6 +147,7 @@ static void *server_thread (void *arg)
BAIL_OUT ("flux_reactor_run failed");
for (i = 0; i < ctx->size; i++)
flux_watcher_destroy (w[i]);+ free (w);
flux_reactor_destroy (reactor);
return NULL;
}@@ -205,6 +206,7 @@ void pmi_server_destroy (struct pmi_server_context *ctx)
pmi_simple_server_destroy (ctx->pmi);
zhashx_destroy (&ctx->kvs);
ctx->magic = ~MAGIC_VALUE;+ free (ctx->sfd);
free (ctx);
}
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2172?email_source=notifications&email_token=AABJPW3AIQJO76XQVPOE3VLPYVJ2BA5CNFSM4HQXRU22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW2E65A#issuecomment-498356084>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABJPW3BGVJ4LDMDXNWNIV3PYVJ2BANCNFSM4HQXRU2Q>
.
|
(or I will incorporate that this evening when I get back to my laptop)
…On Mon, Jun 3, 2019, 10:57 AM Jim Garlick ***@***.***> wrote:
Good catch! If you want to push that to my branch and then merge that
would be great!
On Mon, Jun 3, 2019, 10:42 AM Mark Grondona ***@***.***>
wrote:
> This looks great! Further testing showed some leaks in the test pmi
> server thread. Not at all critical, but this patch closes the leaks:
>
> diff --git a/src/common/libpmi/test/server_thread.c b/src/common/libpmi/test/server_thread.c
> index 8485e3f..2a10cd1 100644--- a/src/common/libpmi/test/server_thread.c+++ b/src/common/libpmi/test/server_thread.c@@ -147,6 +147,7 @@ static void *server_thread (void *arg)
> BAIL_OUT ("flux_reactor_run failed");
> for (i = 0; i < ctx->size; i++)
> flux_watcher_destroy (w[i]);+ free (w);
> flux_reactor_destroy (reactor);
> return NULL;
> }@@ -205,6 +206,7 @@ void pmi_server_destroy (struct pmi_server_context *ctx)
> pmi_simple_server_destroy (ctx->pmi);
> zhashx_destroy (&ctx->kvs);
> ctx->magic = ~MAGIC_VALUE;+ free (ctx->sfd);
> free (ctx);
> }
>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#2172?email_source=notifications&email_token=AABJPW3AIQJO76XQVPOE3VLPYVJ2BA5CNFSM4HQXRU22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW2E65A#issuecomment-498356084>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AABJPW3BGVJ4LDMDXNWNIV3PYVJ2BANCNFSM4HQXRU2Q>
> .
>
|
Problem: our PMI-2 library is incomplete and is an attractive nuisance at this point. It was added to allow OpenMPI jobs to be launched under flux when OpenMPI, through its MCA plugins, determines that it should dlopen slurm's libpmi2.so. OpenMPI could be tricked via LD_LIBRARY_PATH into opening a Flux libpmi2.so that provides the minimal API needed by OpenMPI, and maps it to the simple PMI-1 wire protocol. For more background see flux-framework#746 Since we submitted a flux MCA module that helps OpenMPI choose the right PMI in the flux environment, and shipping this libpmi2.so with flux is misleading as it doesn't implement the full PMI-2 "spec", let's get rid of it. Fixes flux-framework#2082
Problem: function for converting array of integers to comma-separated string exists in both test/pminfo.c and test/clique.c. Refactor to pmi_cliquetostr() in clique.[ch].
Problem: the PMI implementation in libpmi is a source of confusion because it's used by the flux broker to bootstrap in non-Flux environments, AND it is the PMI library provided to users for launching under Flux. We can significantly dumb this thing down if it is assumed to only be used by users running under Flux and then it will be more understandable when looked at only from that perspective. Drop singleton and dlopen (wrap) methods, and in fact drop the method abstraction and just assume the "simple client" is the only method. Update pmi unit tests.
Problem: PMI library doesn't consistently protect against NULL parameters used to return values (segfault). Return PMI_ERR_INVALID_ARG in those cases.
Problem: strtoul() and related functions are used without proper error handling. As recommended in strtoul(3), one must set errno = 0 before the call and check for errno != 0 after the call.
Problem: the code for obtaining the "clique" information from the 'PMI_process_mapping' key hardwires canonical PMI API functions, but a user (internal for now) might want to use the handled simple_client functions instead. Move this code to the simple_client class. Currently this is used behind the canonical clique functions in the Flux libpmi.so, but not internally. In the past it was used to assist with PGM wireum. Update test/pminfo.
Problem: support dlopen()ing libpmi.so and for falling back to singleton operation was removed from the Flux libpmi.so, but the broker needs this capability. Re-add this capability, with significantly reduced complexity given that it need not be a faithful PMI-1 API interface.
Problem: libpmi/test/pminfo.c didn't call PMI_Get_universe_size(). Add a call.
Pull out the server thread to its own reusable module, and change it slightly so that it could support multiple clients.
Problem: the PMI_*() API functions don't have unit tests per se. Add a unit test for these interfaces.
Pulled in @grondo's fix, rebased on current master, and squashed. |
Codecov Report
@@ Coverage Diff @@
## master #2172 +/- ##
==========================================
+ Coverage 80.45% 80.92% +0.47%
==========================================
Files 200 199 -1
Lines 31899 31597 -302
==========================================
- Hits 25665 25571 -94
+ Misses 6234 6026 -208
|
Great! Merging. |
Thanks! |
(This needs some more tests but wanted to park it publicly since I'll be out until June 10)
This is a refactoring of the Flux PMI-1 code after the realization that it would be easier to reason about our
libpmi.so
if it was only used by jobs launched by Flux, not by the Flux broker. The Flux broker needs to not only bootstrap under Flux, but under SLURM, potentially other resource managers, and standalone. Jobs launched by Flux, however, just need to use Flux's preferred method (PMI-1 wire protocol).As such,
libpmi.so
is now only a simple PMI-1 wire protocol client. If PMI_FD, PMI_RANK, and PMI_SIZE are not set, this fails. The code formerly inlibpmi.so
to also try dlopening() SLURM'slibpmi.so
or falling back to singleton operation is now in the broker. Since it doesn't need to be a faithful PMI-1 ABI, it is quite a bit simpler.Edit: removing this, see comment below:
I have a client implementation of the PMI_PORT (TCP) connection method that I didn't add to this PR. It had a ways to go on the server side, and I think the changes proposed above ought to be sufficient for the spectrum use case described. (We might have other uses for PMI_PORT but it may not need to be in this PR).