Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with information transfer between SWAN and ROMS #328

Open
Tucumeu opened this issue Oct 22, 2024 · 9 comments
Open

Problem with information transfer between SWAN and ROMS #328

Tucumeu opened this issue Oct 22, 2024 · 9 comments

Comments

@Tucumeu
Copy link

Tucumeu commented Oct 22, 2024

Dear all,

I am using the 3.8 version to try to replicate a case that worked fine with the previous one (3.7), but I am running into an unexpected problem.

I have a smaller domain B nested into a larger A grid, and I want to run a coupled ROMS+SWAN two-way nesting simulation. All the input files are from the successful v3.7 run, so I assume they are ok. However, when I run the v3.8 model it works for a short while and then crashes due to a segmentation fault. I presume this happens when SWAN is trying to send wave data to ROMS, because I get the following onscreen message
MCT::m_AttrVect::indexRA_:: FATAL--attribute not found: "DISBOT" Traceback:
|X|MCT::m_AttrVect::indexRA_

When I run the coupled models in each domain separately, i.e., ROMS+SWAN in domain A, and the same in grid B, both simulations work well, so the issue appears only when I combine both models and both domains. I have re-made the connectivity and scrip files again and again, but the problem is still there. Any idea why this could happen?

The cluster I used for the v3.7 is not the same as the one I am using now for v3.8, but on the latter the Inlet_test/Refined case runs fine so I presume the problem is not related to the COAWST installation.

Thanks

@jcwarner-usgs
Copy link
Collaborator

when the simulation starts, one of the first things is to do a coupling exchange. Did this happen? did you see the DISBOT exchange, something like
SWANtoROMS Min/Max DISBOT (Wm-2): 0.000000E+00 3.278870E-05
...
if so, then the disbot field is active.
At some later time during that same simulation, it would be strange to have an mct call saying the disbot attr is not found.
can you send the full stdout of that run?
-j

@Tucumeu
Copy link
Author

Tucumeu commented Oct 22, 2024

Hi John,
Yes, there is an initial exchange between both SWAN grids to both ROMS grids.

I am attaching the log file for one of the failed runs.
input_C.txt

@jcwarner-usgs
Copy link
Collaborator

can you set NINFO =1 and rerun that?
there is a lot of info that is not being printed to that file.
is there also an error out file?
the error you report is not in that file.
i really think that roms may have blown up, and you are not seeing that written to the screen.

@Tucumeu
Copy link
Author

Tucumeu commented Oct 25, 2024 via email

@jcwarner-usgs
Copy link
Collaborator

oh. yes you need to have dt roms divide evenly into the coupling interval.
also need to have dt of swan divide evenly into the coupling interval.
how do you submit the job? what is the command line?
mpirun -np X ./coawstM input.file &> output_file

also, the log file was not attached

@Tucumeu
Copy link
Author

Tucumeu commented Oct 28, 2024

True, sorry. I attach it now.
The command I use is mpirun -np 30 coawstM coupling.in > test.log, assigning 6 mpi nodes to SWAN and 24 to ROMS.

test.log

slurm-10688509.log

@jcwarner-usgs
Copy link
Collaborator

this is strange. at the beginning all the models exchange:

== SWAN grid 1 sent wave data to ROMS grid 1
** ROMS grid 1 recv data from SWAN grid 1
SWANtoROMS Min/Max DISBOT (Wm-2): 0.000000E+00 0.000000E+00
SWANtoROMS Min/Max DISSURF (Wm-2): 0.000000E+00 0.000000E+00
...

then
roms goes to 30 minutes
100 2022-01-01 00:30:00.00 2.199223E-03 3.167227E+02 3.167249E+02 8.086213E+10 01
(081,082,20) 0.000000E+00 2.834716E-03 2.372127E+00 1.497969E-01

and then swan to 30 mintues
+time 20220101.003000 , step 3; iteration 12; sweep 4 grid 2
== SWAN grid 1 sent wave data to ROMS grid 1

then you get that error
MCT::m_AttrVect::indexRA_:: FATAL--attribute not found: "DISBOT" Traceback:
|X|MCT::m_AttrVect::indexRA_
01B.MCT(MPEU)::die.: from MCT::m_AttrVect::indexRA_()
[gs30r3b04:3453727:0:3453727] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))

but disbot already existed.

Can i see your swan.in? i am not sure why you have so many interations per step.

when you run it, try
mpirun -np 30 coawstM coupling.in &> test.log

can you look in the roms his file?
can you cahnge the coupling to be every 10 min?
does it always stop at the first coupling exchange (after init).
-j

can i see your

@Tucumeu
Copy link
Author

Tucumeu commented Nov 12, 2024 via email

@Tucumeu
Copy link
Author

Tucumeu commented Nov 12, 2024

Here come the files, with the SWAN *.in files renamed to *.txt
input_B_AB.txt
test.log
input_A_AB.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants