Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug in getdxdyc for parallel runs #238

Open
wants to merge 1 commit into
base: Develop
Choose a base branch
from

Conversation

jcphill
Copy link

@jcphill jcphill commented Mar 29, 2022

linearpart::getdxdyc() would silently fail to return values
for neighbor cells from other ranks, resulting in bad AreaDinf output.

linearpart<datatype>::getdxdyc() would silently fail to return values
for neighbor cells from other ranks, resulting in bad AreaDinf output.
@dtarb
Copy link
Owner

dtarb commented Mar 29, 2022

Do you have an example where this actually causes a problem. I have not investigated this specifically now, but I am skeptical because Areadinf has been tested a lot with multiple processes and ranks, and I think the approach used of having the buffer row at the bounds of each rank, and swapping after each pass likely prevents an actual error.

@jcphill
Copy link
Author

jcphill commented Mar 29, 2022

Yes, in TauDEM-Test-Data/Input/Geographic running "mpiexec -np XXX AreaDinf enogeo.tif" with different rank counts gives enogeosca.tif files for which gdalcompare.py reports pixel differences (thousands of pixels but maximum difference of 2 or so).

@dtarb
Copy link
Owner

dtarb commented Mar 29, 2022

Thanks. I'll check it out.

@jcphill
Copy link
Author

jcphill commented Mar 29, 2022

Example output (from original, unfixed version):

$ mpiexec -n 1 ../../../build/areadinf -ang enogeoang.tif -sca enogeosca1.tif
AreaDinf version 5.3.9
Input file enogeoang.tif has geographic coordinate system.
This run may take on the order of 1 minutes to complete.
This estimate is very approximate.
Run time is highly uncertain as it depends on the complexity of the input data
and speed and memory of the computer. This estimate is based on our testing on
a dual quad core Dell Xeon E5405 2.0GHz PC with 16GB RAM.
Nodata value input to create partition from file: -340282346638528859811704183484516925440.000000
Nodata value recast to float used in partition raster: -340282346638528859811704183484516925440.000000
Processors: 1
Read time: 0.137529
Compute time: 1.248470
Write time: 0.041544
Total time: 1.427543
$ mpiexec -n 12 ../../../build/areadinf -ang enogeoang.tif -sca enogeosca12.tif
AreaDinf version 5.3.9
Input file enogeoang.tif has geographic coordinate system.
Nodata value input to create partition from file: -340282346638528859811704183484516925440.000000
Nodata value recast to float used in partition raster: -340282346638528859811704183484516925440.000000
This run may take on the order of 1 minutes to complete.
This estimate is very approximate.
Run time is highly uncertain as it depends on the complexity of the input data
and speed and memory of the computer. This estimate is based on our testing on
a dual quad core Dell Xeon E5405 2.0GHz PC with 16GB RAM.
Processors: 12
Read time: 0.028167
Compute time: 0.140669
Write time: 0.033118
Total time: 0.201954
$ gdalcompare.py enogeosca1.tif enogeosca12.tif
Files differ at the binary level.
Band 1 checksum difference:
Golden: 5380
New: 5431
Pixels Differing: 31527
Maximum Pixel Difference: 1.875
Differences Found: 2

@jcphill
Copy link
Author

jcphill commented Jun 14, 2022

Do you have time to look at this?

@dtarb
Copy link
Owner

dtarb commented Jun 15, 2022

Sorry - I have not had time yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants