Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For nearest-neighbor remapping, ensure results are independent of processor count if there are equidistant source points #276

Open
2 tasks done
billsacks opened this issue Jul 31, 2024 Discussed in #261 · 3 comments
Assignees
Labels
bug Something isn't working source: discussions who: NCAR Originates from NCAR

Comments

@billsacks
Copy link
Member

billsacks commented Jul 31, 2024

For nearest-neighbor remapping, if there are equidistant source points, there is currently some logic that says that, if there are equidistant source points, arbitrarily use the point with the smallest ID. But, according to @oehmke , that logic isn't done in the multi-processor case, because currently the IDs aren't sent between processors. This results in nearest-neighbor mapping giving different results with different processor counts if there are equidistant source points. @oehmke proposes adding a send of the IDs so that the multi-processor case can break ties using the ID, similarly to in the single-processor case.

Discussed in https://github.com/orgs/esmf-org/discussions/261

Originally posted by samsrabin July 10, 2024

Requirements

Affiliation(s)

NSF-NCAR

ESMF Version

No response

Issue

In CTSM, we use ESMF to read some input files. One particular pair of input files, specifying crop sowing window start and end dates, is at half-degree resolution. We tell ESMF to do nearest-neighbor1 spatial interpolation as necessary to match the simulation grid.

When I do a run at 10°x15° resolution, some of the simulation gridcell centers are located exactly at the "corners" of four half-degree input pixels, meaning that those four neighbors are equally near. It doesn't matter to me which of those ESMF chooses as the "nearest neighbor," as long as it's consistent.

Unfortunately, it's not: At least one gridcell has a different "nearest neighbor" chosen depending on how many processors the job is split across.

As an example, I've made a figure based on two cases that are identical in setup except that Case 1 used 128 processors and Case 2 used 64. Due to this issue, a certain crop in the gridcell centered at latitude 0, longitude 30°E2 gets sowing window of days 7-82 in Case 1 and 336-46 in Case 2.

The white/gray/black in this figure represents the half-degree sowing window files. Gray pixels match the values in Case 1, black pixels match Case 2, and white pixels match neither. The red lines intersect at the center of the 10x15 CTSM gridcell.
screenshot_1104
It looks like Case 1 reads from the pixel to the southwest, whereas Case 2 reads from the pixel to the northwest.

Some notes:

  • I'm not 100% certain this is an ESMF issue as opposed to something weird that CTSM is doing, but I'm at the point where I've done all the troubleshooting I can within CTSM.
  • This reproduces every time, over dozens of tests.

Tagging @ekluzek, @billsacks, and @briandobbins, who have expressed interest in this. By the way, I think I mentioned to y'all that I was having an ERP test pass but the equivalent PEM test fail—this is why! The read of sowing windows only happens at the very beginning of the test, so changing processor count halfway through makes no difference.

Autotag

@oehmke

Footnotes

  1. It needs to be nearest-neighbor because dates are modulo—interpolating between Jan. 2 [day 2] and Dec. 31 (day 365) should give Jan. 1 (day 1), not July 3-4 (day [2+365]/2 = 183.5)—and that's not something ESMF can do, to my knowledge.

  2. There are other crops in this gridcell that also get different sowing windows. There are no crops in any other gridcell that get different sowing windows, but that doesn't necessarily mean different "nearest" neighbors are getting chosen. That might be happening, just with input pixels that don't differ.

@samsrabin
Copy link

Following up: Is this something that's on the roadmap to be in the ESMF version used in the CESM3 release? No worries if not, but in that case I'll need to make some of my tooling more robust and official.

@oehmke
Copy link
Contributor

oehmke commented Aug 30, 2024

Yep, it's on the roadmap to ESMF 8.8.0, which is what we're targeting for CESM3. I'm hoping to get it done soon-ish, so we can make sure that it works awhile before the release.

@samsrabin
Copy link

Excellent, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working source: discussions who: NCAR Originates from NCAR
Projects
None yet
Development

No branches or pull requests

3 participants