Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UWM failed on HERA ROCKY #2211

Closed
jiandewang opened this issue Mar 27, 2024 · 16 comments
Closed

UWM failed on HERA ROCKY #2211

jiandewang opened this issue Mar 27, 2024 · 16 comments
Assignees
Labels
bug Something isn't working

Comments

@jiandewang
Copy link
Collaborator

Description

I am testing updated MOM6 code in UWM but got unexpected failure so I turned back to develop branch and changed nothing, but got the same error.

EXTCDE MPI_ABORT, IEXIT= 52

see error information at /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_73092/cpld_control_p8_mixedmode_intel

To Reproduce:

clone today's UWM (hash # c54e986)
run one of S2S job, for my case I ran "cpld_control_p8_mixedmode_inte"

Additional context

Output

@jiandewang jiandewang added the bug Something isn't working label Mar 27, 2024
@jiandewang
Copy link
Collaborator Author

I am repeating on orion, jobs are running now, at least no dying job at this moment.

@DeniseWorthen
Copy link
Collaborator

@jiandewang That error message is coming from WW3 I believe. Can you check what you have in log.ww3?

@jkbk2004
Copy link
Collaborator

@jiandewang can you re-run? WW3_input_data_20220624 is re-covered on hera.

@DusanJovic-NOAA
Copy link
Collaborator

I also see this error when I run cpld_control_p8 with the current develop branch:

180:  *** WAVEWATCH III ERROR IN W3IOGR : 
180:      ERROR IN READING FROM mod_def.ww3 FILE
180:      IOSTAT =   67     MOD DEF FILE WAS GENERATED WITH A DIFFERENT            
180:      WW3 VERSION OR USING A DIFFERENT SWITCH FILE.          
180:      MAKE SURE WW3_GRID IS COMPILED WITH SAME SWITCH        
180:      AS WW3_SHEL OR WW3_MULTI, RUN WW3_GRID AGAIN           
180:      AND THEN TRY AGAIN THE PROGRAM YOU JUST USED.          
180: 
180: 
180: 
180: EXTCDE MPI_ABORT, IEXIT=    52
180: 

/scratch1/NCEPDEV/stmp2/Dusan.Jovic/FV3_RT/rt_1182930/cpld_control_p8_intel

@DeniseWorthen
Copy link
Collaborator

@jiandewang @DusanJovic-NOAA That is why the original WW3-input data needs to be retained. Input data should never be overwritten. Only adding is allowable.

@DeniseWorthen
Copy link
Collaborator

@jiandewang can you re-run? WW3_input_data_20220624 is re-covered on hera.

@jkbk2004 Please make that everyone on your team understands the importance of NOT overwriting input data.

@jkbk2004
Copy link
Collaborator

@jiandewang can you re-run? WW3_input_data_20220624 is re-covered on hera.

@jkbk2004 Please make that everyone on your team understands the importance of NOT overwriting input data.

We always backed up. @zach1221 @FernandoAndrade-NOAA FYI

@JessicaMeixner-NOAA
Copy link
Collaborator

@jiandewang just confirming that it is an input error. Was there a reason that the WW3 input data was over-written? We add a specific date/time stamp so that we can version control the input and not over-write it.

@jkbk2004
Copy link
Collaborator

@jiandewang just confirming that it is an input error. Was there a reason that the WW3 input data was over-written? We add a specific date/time stamp so that we can version control the input and not over-write it.

My fault! input directory names were switched back and forth.

@jiandewang
Copy link
Collaborator Author

just are running normal now. Close this issue

@jiandewang
Copy link
Collaborator Author

same problem happened on c5, need to do the same fixing

@jiandewang jiandewang reopened this Mar 28, 2024
@DeniseWorthen
Copy link
Collaborator

@jiandewang It looks to me the files are OK on Gaea. Are you sure your rt didn't fail on Gaea because of this #2198? The fix for this will be coming in w/ the WW3 PR today but before that you need to modify this part of rt.sh

STMP=/gpfs/f5/epic/scratch
PTMP=/gpfs/f5/epic/scratch

@jiandewang
Copy link
Collaborator Author

jiandewang commented Mar 28, 2024

@DeniseWorthen yes I changed those two lines otherwise my job will not be able to be sumbitted.
see my rundir: /gpfs/f5/nggps_emc/scratch/Jiande.Wang/ptmp/Jiande.Wang/FV3_RT/rt_235629/cpld_control_p8_intel/out
180: *** WAVEWATCH III ERROR IN W3IOGR :
180: ERROR IN READING FROM mod_def.ww3 FILE
180: IOSTAT = 67 MOD DEF FILE WAS GENERATED WITH A DIFFERENT
180: WW3 VERSION OR USING A DIFFERENT SWITCH FILE.
180: MAKE SURE WW3_GRID IS COMPILED WITH SAME SWITCH
180: AS WW3_SHEL OR WW3_MULTI, RUN WW3_GRID AGAIN
180: AND THEN TRY AGAIN THE PROGRAM YOU JUST USED.
180:
180:
180:
180: EXTCDE MPI_ABORT, IEXIT= 52

the UWM is based on yesterday's commit
my UWM: /gpfs/f5/nggps_emc/scratch/Jiande.Wang/MOM6-update/NCAR-20230913/ufs-weather-model

@jkbk2004
Copy link
Collaborator

@jiandewang Sorry about interruption. WW3_input_data_20220624 is restored. looks like running ok. Can you check again?

PASS -- COMPILE 's2swa_32bit_intel' [22:16, 20:32]
PASS -- TEST 'cpld_control_p8_mixedmode_intel' [11:23, 07:30](3070 MB)

@jiandewang
Copy link
Collaborator Author

@jiandewang Sorry about interruption. WW3_input_data_20220624 is restored. looks like running ok. Can you check again?

PASS -- COMPILE 's2swa_32bit_intel' [22:16, 20:32]
PASS -- TEST 'cpld_control_p8_mixedmode_intel' [11:23, 07:30](3070 MB)

thanks for the quick action, let me re-launch my job.
I have a NCAR MOM6 PR which I really want to give them a balck and white answer before my A/L

@jiandewang
Copy link
Collaborator Author

works fine now. close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

5 participants