-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SCREAMv1 test to e3sm_gpucxx suite #5495
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested on Summit using both cpu and gpu, and it works fine, thanks.
with latest master on Summit runs into bld errors:
On Crusher with
Does this look familiar in runs of this case with SCREAM repo? |
Thanks @amametjanov this actually passes for me on the SCREAM repo, and confirmed it fails with a local master merge into E3SM on summit. I'll try to figure out what's going on with the master merge. |
Okay, I see what happened. It looks like #5481 changed some things in hommexx that haven't been fixed upstream in E3SM yet. The fixes are on the SCREAM repo, so I will need to open another PR to pull those into E3SM I think, or pull those into this PR (looks like the diffs only affect sources in the eamxx directory, plus shoc). |
979b436
to
e2caf84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should not be 356 commits for such a simple addition.
@rljacob I had to pull in a scream->e3sm merge because there were conflicts between the two repos since this PR was opened that prevented SCREAMv1 from building on the E3SM repo. I can move this to a separate PR, or rename this one to reflect this. Was waiting on my tests to update this PR (which just passed). |
Please make a separate PR. |
If you need to resync SCREAM with E3SM and its 300+ commits, that should be its own PR. |
9d34608
to
5111a4f
Compare
This requires #5582 to fix SCREAMv1 build errors. |
Done. My reasoning here was to consider this PR the upstream merge, and as a side effect add the test to make sure it doesn’t break again, rather than considering the upstream merge the effect. In any event, #5582 is opened to bring EAMxx up to date, then this PR will just add the test. |
(PR #5495) Add SCREAMv1 test to e3sm_gpucxx suite Add F2010-SCREAMv1 test to e3sm_gpucxx suite to get test coverage on GPU for EAMxx codebase in main E3SM repo. [BFB]
Do we need to add machine-specific cmake in
Similar error on Crusher: https://my.cdash.org/viewTest.php?buildid=2341352 Works on Ascent: https://my.cdash.org/viewTest.php?buildid=2341401 |
notes: needs some syncing and restesting. @brhillman will do. |
5111a4f
to
13d1962
Compare
@brhillman is this ready now? |
If I copy
|
@rljacob , currently waiting on a couple CIME PRs and then I will do a CIME update. Once that is done, I will do an upstream merge to SCREAM, which will force me to sort out the remaining build system issues. |
@brhillman I believe this modified suite will now build and run after merging to next. Can you confirm? |
@amametjanov can you verify the new version of this suite will build on a system running gpucxx ? |
It probably needs to be rebased to get Jim's recent build system changes. |
I'm trying on pm-gpu on latest E3SM
but getting
|
@amametjanov , I've fixed that on the EAMXX fork. I will do a downstream merge soon. |
… (PR #5495) Add SCREAMv1 test to e3sm_gpucxx suite Add F2010-SCREAMv1 test to e3sm_gpucxx suite to get test coverage on GPU for EAMxx codebase in main E3SM repo. [BFB]
The downstream merge is done, merged to next. |
The only place we run this test is pm-gpu and looks like SCREAM has a runtime error: At least its not build time! |
Rob that job looks like it ran out of walltime |
@jgfouca how do we increase the walltime for the scream test? |
@rljacob , there are a few ways. This new test in the suite |
Lets do the second option but wait until pm comes back online. |
Note that when I try this same test on muller-gpu (generally same as pm-gpu), I get an error and a hang. We would not expect any ne4 test to take much time, so maybe don't increase the walltime for the test.
Checking out scream instead (again on muller-gpu) and these tests work fine. So must be something we are doing differently. |
Should I merge to master or hold off? |
@ndkeen is the one SCREAM test in this suite still running out time? |
If I recall, I think we found some issue with the test hanging. When I tried it just now, I get a build error:
|
@bartgol since Jim is on vacation, could you take a look at the above build error on pm-gpu? |
It seems like the ekat submodule is not recent enough. However, it is recent enough in e3sm/master. Perhaps the branch is old, and is forcing a roll back on the ekat submodule? I'll try merging into next locally now, to see what happens to the submodule. |
Oh, wait, this has the new ekat, but an old EAMxx. EAMxx is still using code from an old ekat. We need to update eamxx first. Perhaps the cleanest thing is to remove the branch from next, hard reset the branch to track an up-to-date eamxx, and then remerge to next. |
I'd rather have an eamxx update in a separate PR. Then when that is on next, this should work. |
That's fine with me. And now, let's draw straws! :) Jokes aside, how soon do we want this to be integrated? |
I'm really sick of seeing this open PR so an eamxx downstream merge soon would be good. But I think there is an upstream still pending over in scream. @jgfouca can you push that along? |
@rljacob the downstream merge in the scream repo should move along today. There was a failing test due to a bad auto-resolution of a conflict. Should be fixed now, and tests should be passing on the next AT round (mappy is a bit backed up due to it being offline for a few days). |
Add F2010-SCREAMv1 test to e3sm_gpucxx suite to get test coverage on GPU for EAMxx codebase in main E3SM repo.
[BFB]