Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On read, change order of dealing with redist vs undist #169

Merged
merged 3 commits into from
Aug 10, 2023

Conversation

billsacks
Copy link
Member

The code has not properly handled the read of a Field or Array with both ungridded / undistributed dimensions and > 1 DE per PET. It appears that this is due to an incorrect ordering of dealing with these two situations in IO::read. This commit changes that ordering to resolve this issue.

Note that I also needed to reassign temp_array_p to temp_array_undist_p before doing the redistStore and redist; without this, the redistStore call generated an error with this trace-back:

20230724 170325.547 ERROR            PET3 ESMCI_Array.C:6238 ESMCI::Array::tRedistStore() Invalid argument  - srcArray and dstArray must provide identical number of exclusive elements
20230724 170325.547 ERROR            PET3 ESMCI_Array.C:6018 ESMCI::Array::redistStore() Invalid argument  - Internal subroutine call returned Error
20230724 170325.547 ERROR            PET3 ESMCI_IO.C:387 ESMCI::IO::read() Invalid argument  - Internal subroutine call returned Error
20230724 170325.547 ERROR            PET3 ESMCI_IO.C:282 ESMCI::IO::read() Invalid argument  - Internal subroutine call returned Error
20230724 170325.548 ERROR            PET3 ESMCI_IO_F.C:210 c_esmc_ioread() Unable to read from file  - Internal subroutine call returned Error
20230724 170325.548 ERROR            PET3 ESMF_IO.F90:397 ESMF_IOAddArray() Unable to read from file  - Internal subroutine call returned Error
20230724 170325.548 ERROR            PET3 ESMF_FieldPr.F90:336 ESMF_FieldRead Unable to read from file  - Internal subroutine call returned Error
20230724 170325.548 ERROR            PET3 esmApp.F90:185 Unable to read from file  - Passing error in return code

Resolves #150

I have verified that a round-trip write-then-read of a single-tile Field with both ungridded dimensions and 0 or > 1 DE per PET on some PETs gives identical results to the original Field.

@theurich and/or @oehmke - I would like your input on two things here (I am assigning you both as reviewers, but input from one of you is probably sufficient):

(1) I would like a second set of eyes / brain on the logic here to confirm that this flow of logic looks reasonable. Specifically, I want confirmation that the logic around the redistStore call looks reasonable. At a high level, for an Array with both undistributed dimensions and > 1 DE per PET, we (a) create a temp Array with 1 DE per PET; (b) create an aliased version of this Array that treats all dimensions as distributed, saving the result of (a) as temp_array_undist_p; (c) Read into the Array created in (b); (d) do a redistStore and then redist from temp_array_undist_p into the original (user) Array. I think we can rely on temp_array_undist_p being filled properly with data by the Read into temp_array_p because of the reference sharing between temp_array_undist_p and temp_array_p (and my testing gives me more confidence in this), but I want to check to make sure you don't see any edge cases where this logic will fail. (I had originally hoped that I could do the redistStore and redist directly from temp_array_p back to the original (user) Array, but I wasn't too surprised to find that this failed, with the message given in the above traceback.)

(2) I did this testing with a standalone program (https://github.com/billsacks/esmfprojects-multi_tile_io/blob/main/esmApp.F90), not a unit test. I plan to add unit testing of this complex case (Array / Field with both ungridded dimensions and 0 or > 1 DE per PET on some PETs) in my multi-tile IO unit tests. My question is: do you feel I should also add unit tests of this combination on single-tile Arrays / Fields – probably in ESMF_ArrayIOUTest and ESMF_FieldIOUTest? It seems like we'll get the necessary code / logic coverage through the multi-tile tests, but the advantage of testing this combination with single-tile Arrays / Fields is that, if the test fails in the future, it would allow distinguishing between an issue with multi-tile vs. an issue that also impacts single-tile. The downside is that I think setting up these tests will be a bit complex, so adding single-tile versions of these tests will mean more test code to write and maintain. I can go either way on this myself; do you have feelings?

The code has not properly handled the read of a Field or Array with both
ungridded / undistributed dimensions and > 1 DE per PET. It appears that
this is due to an incorrect ordering of dealing with these two
situations in IO::read. This commit changes that ordering to resolve
this issue.

Note that I also needed to reassign temp_array_p to temp_array_undist_p
before doing the redistStore and redist; without this, the redistStore
call generated an error with this trace-back:

20230724 170325.547 ERROR            PET3 ESMCI_Array.C:6238 ESMCI::Array::tRedistStore() Invalid argument  - srcArray and dstArray must provide identical number of exclusive elements
20230724 170325.547 ERROR            PET3 ESMCI_Array.C:6018 ESMCI::Array::redistStore() Invalid argument  - Internal subroutine call returned Error
20230724 170325.547 ERROR            PET3 ESMCI_IO.C:387 ESMCI::IO::read() Invalid argument  - Internal subroutine call returned Error
20230724 170325.547 ERROR            PET3 ESMCI_IO.C:282 ESMCI::IO::read() Invalid argument  - Internal subroutine call returned Error
20230724 170325.548 ERROR            PET3 ESMCI_IO_F.C:210 c_esmc_ioread() Unable to read from file  - Internal subroutine call returned Error
20230724 170325.548 ERROR            PET3 ESMF_IO.F90:397 ESMF_IOAddArray() Unable to read from file  - Internal subroutine call returned Error
20230724 170325.548 ERROR            PET3 ESMF_FieldPr.F90:336 ESMF_FieldRead Unable to read from file  - Internal subroutine call returned Error
20230724 170325.548 ERROR            PET3 esmApp.F90:185 Unable to read from file  - Passing error in return code

Resolves #150
@billsacks billsacks self-assigned this Jul 25, 2023
@theurich
Copy link
Member

@billsacks - Just a quick question with regards of this PR and 8.5.0. I see that this is work on develop, and that makes perfect sense. Also I do not feel any urgency to squeeze this into 8.5.0 (i.e. no user reports of this issue that we know of). But I want to confirm that this aligns with your thinking, too. Thanks.

@billsacks
Copy link
Member Author

Right - I do not intend for this to go into 8.5.0. Thanks for confirming.

This tests a case that was failing until the work on this branch: each
of these situations was tested individually, but the combination wasn't
tested and wasn't working.
@billsacks
Copy link
Member Author

I just pushed a commit that adds unit tests for this previously-failing case. This feels like a lot of test code for testing a single obscure scenario, but it kind of feels like that's just the way it has to be... though I'm open to feedback. (I could have shortened it a bit by looping over the two DEs per PET, but I kept it this way for consistency with some other code in this test module.)

@theurich
Copy link
Member

theurich commented Aug 2, 2023

@billsacks - commenting on your point (1) from above. Your outline of steps taken does appear correct to me. Your hope to Redist() directly from temp_array_p to the original (user) provided Array wasn't even that far off. You can do this with ESMF, but then you do have to provide the srcToDstTransposeMap argument to allow mixing distributed and undistributed dims in a redist. Not very convenient here, especially because the information is already available within the arrays held by temp_array_undist_p.
The only suggestion I have perhaps wrt the code change is to move line 365, which sets temp_array_undist_p above the if (has_undist) { of line 364. Doing this will set temp_array_undist_p for all cases. I think then you can simply replace all of the use of temp_array_p with temp_array_undist_p in the entire block of line 383-417, and eliminate the extra if (has_undist){ block 387-395. In my mind this follows more the style of the existing code where the code above the read() sets a number of pointer variables that can then be used consistently below, without further if-logic.

@theurich
Copy link
Member

theurich commented Aug 2, 2023

As for the new test code, the main thing is that the multi-DE + undistr dim issue is being tested. I did not do a thorough review, but from what I looked at it seem good to me. Test code in my experience often does get bulky, but it's not necessarily a bad thing.

@billsacks
Copy link
Member Author

Thanks a lot for looking at this @theurich ! I like your suggestion; I'll make that change.

@billsacks billsacks merged commit b423b22 into develop Aug 10, 2023
2 checks passed
@billsacks billsacks deleted the fix/read_ungridded_multiple_des branch August 10, 2023 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problems reading a field with ungridded dimensions and > 1 DE per PET
2 participants