-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fsurdat: PCT_SAND, PCT_CLAY, ORGANIC differ with different PE layouts on derecho #2502
Comments
I have submitted a 4-node job and an 8-node job:
in
|
First I compare two files that I expect (hope) to be identical because derecho generated them on the same number of nodes. I'm relieved to find that they are indeed identical:
Next I compare the two files that I generated today:
and find diffs as shown in the following sample ncview images. |
My assessment of this visual examination: |
This seems like an unlikely thing to be able to work on and resolve by ctsm5.3.0. Since, the number of gridcells affected is small that might be OK, but the fact that the differences is large is concerning. |
I wonder if it's related to my "ambiguous nearest neighbors" issue: ESMF issue #276: For nearest-neighbor remapping, ensure results are independent of processor count if there are equidistant source points You can test by shifting the input datasets by a tiny amount (I used 1e-6°). |
This would be nice to fix, but is likely related to the ESMF issue @samsrabin noted about nearest neighbor issues with different PE counts. Falls in the quality of life category (for now), but should be addressed by the CESM3 release. If this is a quick fix it would let us create more accurate 5.3 surface data. Let's not spend more half a day of active time testing this to see if it work and then implementing it (roughly). |
I likely deleted earlier samples of this problem, so I have generated new ones in |
My latest test still fails unfortunately. I generated an fsurdat file four times as follows:
where suffixes 1 used 512 tasks and suffixes 2 used 256 tasks and
I used ncdiff and found that the tweaked files differ similarly to the way that the default files differ. @samsrabin thank you for the time that you put into trying out your hypothesis. I don't know whether this result rules out your hypothesis or whether there is more experimentation that could be done. What are your thoughts? Either way, we will probably need to follow up post ctsm5.3. |
Thanks for checking, @slevis-lmwg. Let's plan to do another test once the ESMF bug is fixed—I think your latest test shows that's not the issue, but maybe worth a shot. |
As @wwieder pointed out in #2744 (comment), the fix in slevis-lmwg#9 is likely to resolve this issue. |
Thanks @billsacks I was hoping that might be the case and I'll redo the testing that @slevis-lmwg did and see if that's correct. |
Hurray! I tried slevis-lmwg#9 out for f09-1850 with 256 processors and 128 and am getting identical results between the two now. So this is really good news! |
Brief summary of bug
I ran mksurfdata_esmf on derecho to generate fsurdat/landuse files for the VR grids ne0np4CONUS, ne0np4.ARCTIC, and ne0np4.ARCTICGRIS grids (PR #2490 iss #2487). Accidentally, I tried two PE layouts:
Possibly related to issue #2430.
General bug information
CTSM version you are using: ctsm5.2.001
Does this bug cause significantly incorrect results in the model's science? Maybe
Configurations affected: All ctsm5.2.0 and newer, as well as hacked simulations that use 5.2 fsurdat files
Details of bug
I used
/glade/campaign/cesm/cesmdata/cseg/tools/cime/tools/cprnc/cprnc -m <file1> <file2>
to get info like this:
@ekluzek proposed this follow-up:
Perform testing with f09 to make easier to visualize (VR are unstructured grids and difficult to view).
The text was updated successfully, but these errors were encountered: