Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hera specific paths in parm/config/gfs/yaml/defaults.yaml #2683

Closed
RussTreadon-NOAA opened this issue Jun 12, 2024 · 17 comments · Fixed by #2920
Closed

Hera specific paths in parm/config/gfs/yaml/defaults.yaml #2683

RussTreadon-NOAA opened this issue Jun 12, 2024 · 17 comments · Fixed by #2920
Assignees

Comments

@RussTreadon-NOAA
Copy link
Contributor

What is wrong?

The ocnanal and prepoceanobs sections of parm/config/gfs/yaml/defaults.yaml contain Hera specific paths

ocnanal:
  SOCA_INPUT_FIX_DIR: "/scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca"  # TODO: These need to go to glopara fix space.
prepoceanobs:
  SOCA_INPUT_FIX_DIR: "/scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca"  # TODO: These need to go to glopara fix space.
  DMPDIR: "/scratch1/NCEPDEV/global/glopara/data/experimental_obs"

The above caused gdasocnanalprep from C48mx500_3DVarAOWCDA CI to fail on Cactus

OSError: unable to copy /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca/rossrad.nc to /lfs/h2/emc/stmp/russ.treadon/RUNDIRS/praowcda/gdasocnanal_18/rossrad.nc
+ JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP[47]: status=1

The hardwired paths will also impact this job on other non-Hera platforms (e.g, Orion, Hercules, Jet, ...).

What should have happened?

gdasocnanalprep should run to completion on wcos2

What machines are impacted?

WCOSS2, Orion, Hercules, Jet, Cloud

Steps to reproduce

  1. install g-w develop on wcoss2
  2. enable C48mx500_3DVarAOWCDA CI on wcoss2
  3. run C48mx500_3DVarAOWCDA CI

gdasprepoceanobs will fail with a wxflow not found error. Fix this by adding wxflow to PYTHONPATH. The same change needs to be added to gdasocnanalprep. After defining wxflow in gdasocnanalprep, the job will run up to the error

OSError: unable to copy /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca/rossrad.nc to /lfs/h2/emc/stmp/russ.treadon/RUNDIRS/praowcda/gdasocnanal_18/rossrad.nc
+ JGDAS_GLOBAL_OCEAN_ANALYSIS_PREP[47]: status=1

Additional information

Comments in parm/config/gfs/yaml/defaults.yaml note that we need to copy files in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca to g-w space on supported platforms.

More challenging is deciding how best to replicate DMPDIR=/scratch1/NCEPDEV/global/glopara/data/experimental_obs on supported platforms.

Do you have a proposed solution?

No response

@RussTreadon-NOAA RussTreadon-NOAA added bug Something isn't working triage Issues that are triage labels Jun 12, 2024
@RussTreadon-NOAA
Copy link
Contributor Author

@WalterKolczynski-NOAA WalterKolczynski-NOAA removed the triage Issues that are triage label Jun 17, 2024
@aerorahul aerorahul removed the bug Something isn't working label Jul 2, 2024
@aerorahul
Copy link
Contributor

@guillaumevernieres FYI

@aerorahul
Copy link
Contributor

Work is in progress to copy obs and fix files to glopara space that should hopefully move this issue towards resolution.

@KateFriedman-NOAA
Copy link
Member

@guillaumevernieres Can you confirm that /scratch1/NCEPDEV/global/glopara/fix/gdas/soca/20240802 contains what you have in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static. I did a diff and see a few things different but am not sure if they need to be in the glopara fix copy or not:

[Kate.Friedman@hfe01 glopara]$ diff -r /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/ fix/gdas/soca/20240802/
Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca: MOM_input~
Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca: bkgerr
Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca: bkgerr
Only in fix/gdas/soca/20240802/common: #fields_metadata.yaml#

Thanks!

@guillaumevernieres
Copy link
Contributor

@guillaumevernieres Can you confirm that /scratch1/NCEPDEV/global/glopara/fix/gdas/soca/20240802 contains what you have in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static. I did a diff and see a few things different but am not sure if they need to be in the glopara fix copy or not:

[Kate.Friedman@hfe01 glopara]$ diff -r /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/ fix/gdas/soca/20240802/
Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca: MOM_input~
Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca: bkgerr
Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca: bkgerr
Only in fix/gdas/soca/20240802/common: #fields_metadata.yaml#

Thanks!
Thanks for checking @KateFriedman-NOAA

fix/gdas/soca/20240802/common: #fields_metadata.yaml#

can be removed and the rest of the diff ignored.

However, it looks like the links to the common files were not preserved in your copy.

KateFriedman-NOAA added a commit to KateFriedman-NOAA/global-workflow that referenced this issue Sep 11, 2024
Use it to define location of experimental_obs.

Refs NOAA-EMC#2683
@KateFriedman-NOAA
Copy link
Member

can be removed and the rest of the diff ignored

@guillaumevernieres Thanks for the feedback! I have removed the #fields_metadata.yaml# file from soca/20240802/common.

However, it looks like the links to the common files were not preserved in your copy.

The original gdas/soca fix copy (20240624) that was provided to us only had symlinks for the rossrad.nc file so that's what we have:

[role.glopara@hfe06 20240802]$ pwd
/scratch1/NCEPDEV/global/glopara/fix/gdas/soca/20240802
[role.glopara@hfe06 20240802]$ ll */soca | grep rossrad.nc
lrwxrwxrwx 1 role.glopara global         23 Jun 27 14:15 rossrad.nc -> ../../common/rossrad.nc
lrwxrwxrwx 1 role.glopara global      23 Jun 27 14:15 rossrad.nc -> ../../common/rossrad.nc
lrwxrwxrwx 1 role.glopara global      23 Jun 27 14:15 rossrad.nc -> ../../common/rossrad.nc
lrwxrwxrwx 1 role.glopara global      23 Jun 27 14:16 rossrad.nc -> ../../common/rossrad.nc

In comparing the files to check if they are identical and can be made back into symlinks (like in your set) I find this difference:

[role.glopara@hfe06 20240802]$ /apps/nccmp/1.9.1/gcc-13.2.0/bin/nccmp -dgB 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc common/RECCAP2_region_masks_all_v20221025.nc
DIFFER : VARIABLE : lon : ATTRIBUTE : _FillValue : VALUES : nan <> nan

I can either leave the 20240802 as is or create a new timestamp to set up those symlinks (I don't want to touch the used files in the 20240802 set since it is currently "live" in global-workflow develop). Let me know your thoughts, thanks!

@KateFriedman-NOAA
Copy link
Member

@guillaumevernieres Gentle poke about my comment/question above. If we end up needing another soca timestamp I'd like to include it in the PR that I'm ready to open to resolve this issue. Let me know, thanks!

@guillaumevernieres
Copy link
Contributor

@guillaumevernieres Gentle poke about my comment/question above. If we end up needing another soca timestamp I'd like to include it in the PR that I'm ready to open to resolve this issue. Let me know, thanks!

Sorry for the late reply @KateFriedman-NOAA .
A new timestamp with the symlinks sounds good to me. I'll check what version of RECCAP2_region_masks_all_v20221025.nc we should use.

@KateFriedman-NOAA
Copy link
Member

A new timestamp with the symlinks sounds good to me. I'll check what version of RECCAP2_region_masks_all_v20221025.nc we should use.

@guillaumevernieres Okie dokie, I'll make a new timestamp once the file sources is confirmed by you. Please also check the other files, I only compared a few but the ones I compared all reported a similar small difference. Thanks!

@guillaumevernieres
Copy link
Contributor

@KateFriedman-NOAA , let's use 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc as the common file.

@KateFriedman-NOAA
Copy link
Member

@RussTreadon-NOAA I am trying to test the C48mx500_3DVarAOWCDA CI test on WCOSS2-Cactus to confirm my updates to resolve the hardcoded paths is good. The gdasmarinebmat job is failing, not sure if this is expected/known. The gdasprepoceanobs job ran and succeeded. The gdasocnanalprep has not yet run. Would you mind taking a look?

HOMEgfs: /lfs/h2/emc/global/noscrub/kate.friedman/git/feature-experimental_obs_path
EXPDIR: /lfs/h2/emc/ptmp/kate.friedman/comrot/RUNTESTS/EXPDIR/testcyc_C48_S2S
COMROT: /lfs/h2/emc/ptmp/kate.friedman/comrot/RUNTESTS/COMROOT/testcyc_C48_S2S
log: /lfs/h2/emc/ptmp/kate.friedman/comrot/RUNTESTS/COMROOT/testcyc_C48_S2S/logs/2021032418/gdasmarinebmat.log.0

Note, the first attempt at the gdasmarinebmat job hit the walltime. I increased the walltime and let it try again. It's currently hung and will hit the new walltime.

I can at least confirm that DMPDIR is being set correctly via the defaults.yaml now but haven't been able to run the job that uses it yet:

kate.friedman@clogin03:/lfs/h2/emc/ptmp/kate.friedman/comrot/RUNTESTS/COMROOT/testcyc_C48_S2S> grep DMPDIR= logs/2021032418/gdasprepoceanobs.log 
+++ config.base[49]: export DMPDIR=/lfs/h2/emc/dump/noscrub/dump
+++ config.base[49]: DMPDIR=/lfs/h2/emc/dump/noscrub/dump
+++ config.prepoceanobs[17]: export DMPDIR=/lfs/h2/emc/global/noscrub/emc.global/data/experimental_obs
+++ config.prepoceanobs[17]: DMPDIR=/lfs/h2/emc/global/noscrub/emc.global/data/experimental_obs

@RussTreadon-NOAA
Copy link
Contributor Author

@KateFriedman-NOAA: I am no longer able to run C48mx500_3DVarAOWCDA from g-w PR #2875 after updating the gdas.cd hash. This is a known issue. g-w PR #2920 needs to be merged into g-w develop. After this I can bring the updated g-w develop into g-w PR #2785 and C48mx500_3DVarAOWCDA should work again.

I don't know if this impacts your test. Tagging @guillaumevernieres

@guillaumevernieres
Copy link
Contributor

Correct @RussTreadon-NOAA , the WCDA test won't work with the new gdas.cd # .

@KateFriedman-NOAA
Copy link
Member

Okie dokie, thanks @RussTreadon-NOAA @guillaumevernieres for confirming the failure is expected. Let me know if it would help to merge the changes I've prepped for this issue into that open PR. My changes are here: develop...KateFriedman-NOAA:global-workflow:feature/experimental_obs_path . Otherwise I'll wait to retest and submit this via PR after that other PR goes in.

@RussTreadon-NOAA
Copy link
Contributor Author

I think the WCDA test works with g-w PR #2920. If true, we should merge #2920 first. After this we can update other g-w PRs.

@KateFriedman-NOAA
Copy link
Member

let's use 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc as the common file.

@guillaumevernieres Done:

[role.glopara@hfe02 20240919]$ pwd
/scratch1/NCEPDEV/global/glopara/fix/gdas/soca/20240919
[role.glopara@hfe02 20240919]$ rsync -azv 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc common/RECCAP2_region_masks_all_v20221025.nc 
sending incremental file list
RECCAP2_region_masks_all_v20221025.nc

sent 22,039 bytes  received 35 bytes  44,148.00 bytes/sec
total size is 58,414  speedup is 2.65

...and then I removed the other copies and made symlinks to the common one:

[role.glopara@hfe02 20240919]$ ll common/RECCAP2_region_masks_all_v20221025.nc 
-rw-r--r-- 1 role.glopara global 58414 May 14 19:37 common/RECCAP2_region_masks_all_v20221025.nc
[role.glopara@hfe02 20240919]$ ll */soca/RECCAP2*
lrwxrwxrwx 1 role.glopara global 50 Sep 20 13:08 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc -> ../../common/RECCAP2_region_masks_all_v20221025.nc
lrwxrwxrwx 1 role.glopara global 50 Sep 20 13:08 360x320x75/soca/RECCAP2_region_masks_all_v20221025.nc -> ../../common/RECCAP2_region_masks_all_v20221025.nc
lrwxrwxrwx 1 role.glopara global 50 Sep 20 13:08 4500x3297x75/soca/RECCAP2_region_masks_all_v20221025.nc -> ../../common/RECCAP2_region_masks_all_v20221025.nc
lrwxrwxrwx 1 role.glopara global 50 Sep 20 13:09 72x35x25/soca/RECCAP2_region_masks_all_v20221025.nc -> ../../common/RECCAP2_region_masks_all_v20221025.nc

Are there any other files to adjust in the new gdas/soca/20240919 set?

@KateFriedman-NOAA
Copy link
Member

Merging work prepped to resolve this issue into PR #2920.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants