Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the source code corresponding to the pre-industrial branch? #13

Open
penguian opened this issue Feb 16, 2024 · 14 comments
Open
Assignees

Comments

@penguian
Copy link

penguian commented Feb 16, 2024

Rather than using the pre-compiled executables, I changed config.yaml for the pre-industrial branch to use the executables built via https://github.com/coecms/access-esm-build-gadi/tree/master :

diff --git a/config.yaml b/config.yaml
index 6d3935f..4ea0bbe 100644
--- a/config.yaml
+++ b/config.yaml
@@ -10,14 +10,14 @@ submodels:
     - name: atmosphere
       model: um
       ncpus: 192
-      exe: /g/data/access/payu/access-esm/bin/coe/um7.3x
+      exe: /g/data/tm70/pcl851/src/coecms/access-esm-build-gadi/bin/um_hg3.exe
       input:
         - /g/data/access/payu/access-esm/input/pre-industrial/atmosphere
 
     - name: ocean
       model: mom
       ncpus: 180
-      exe: /g/data/access/payu/access-esm/bin/coe/mom5xx
+      exe: /g/data/tm70/pcl851/src/coecms/access-esm-build-gadi/bin/mom5xx
       input:
         - /g/data/access/payu/access-esm/input/pre-industrial/ocean/common
         - /g/data/access/payu/access-esm/input/pre-industrial/ocean/pre-industrial
@@ -25,7 +25,7 @@ submodels:
     - name: ice
       model: cice
       ncpus: 12
-      exe: /g/data/access/payu/access-esm/bin/coe/cicexx
+      exe: /g/data/tm70/pcl851/src/coecms/access-esm-build-gadi/bin/cice4.1_access-mct-12p-20240115
       input:
         - /g/data/access/payu/access-esm/input/pre-industrial/ice

The resulting archive/access-esm/restart000/atmosphere/fixed.restart_dump.astart differs in 3528 out of 5358 fields:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* (CUMF-II) Module Information *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

mule       : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/mule/__init__.py (version 2022.07.1)
um_utils   : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/um_utils/__init__.py (version 2022.07.1)
um_packing : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.07/lib/python3.10/site-packages/um_packing/__init__.py (version 2022.07.1) (packing lib from SHUMlib: 2023061)


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* CUMF-II Comparison Report *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

File 1: archive.build-gadi.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
File 2: archive.coecms.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
Files DO NOT compare
  * 0 differences in fixed_length_header (with 7 ignored indices)
  * 3 differences in real_constants (with 0 ignored indices)
  * 3528 field differences, of which 3528 are in data

Compared 5358/5358 fields, with 1830 matches

Maximum RMS diff as % of data in file 1: 1728.247530970832  (field 1939)
Maximum RMS diff as % of data in file 2: 1540966.1195676462 (field 2146)

%%%%%%%%%%%%%%%%%%
* real_constants *
%%%%%%%%%%%%%%%%%%
Components DO NOT compare (compared 38/38 values)
Component differences:
  Index 18 (mean_diabatic_flux) differs - file_1: 1.1837522069452046e+16  file_2: 1.1564718318975124e+16
  Index 20 (energy)             differs - file_1:  1.296698556808685e+24  file_2: 1.2972116950953004e+24
  Index 21 (energy_drift)       differs - file_1:  3.235504453499748e-07  file_2: 3.6300878600888835e-07

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Field 1/5358 - U COMPNT OF WIND AFTER TIMESTEP *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Lookup compares, data DOES NOT compare
Compared 64/64 lookup values.
File_1 lookup info:
  t1(0102/01/01 00:00:01)  lblev(1)/blev(9.9982061118072)  lbproc(0)
Data differences:
  Number of point differences  : 27840/27840
  Maximum absolute difference  : 40.503974864469633
  RMS difference               : 5.9704684001465225
  RMS diff as % of file_1 data : 113.21319232395794
  RMS diff as % of file_2 data : 95.945432469935682
[...]
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* Field 5358/5358 - Height at Tropopause Level *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Lookup compares, data DOES NOT compare
Compared 64/64 lookup values.
File_1 lookup info:
  t1(0101/12/01 00:00:335)  lblev(0)/blev(-1.0)  lbproc(128)
Data differences:
  Number of point differences  : 27730/27840
  Maximum absolute difference  : 3178.9881505981466
  RMS difference               : 521.75065971373635
  RMS diff as % of file_1 data : 4.0601309452938192
  RMS diff as % of file_2 data : 4.0515202394169556
@penguian
Copy link
Author

Using the strings command, I can find the following differences in source code directories between um7.3x executable used by the pre-industrial configuration and the um_hg3 executable created by https://github.com/coecms/access-esm-build-gadi/tree/master :

um7.3x                                                                        | um_hg3
                                                                              > /apps/intel-ct/2019.3.199/mkl/lib/intel64
                                                                              > /lib64/ld-linux-x86-64.so.2
/projects/access/apps/fcm/2019.09.0/lib                                       <
/g/data/p66/pbd562/build/fcm-2019.09.0/lib                                    | /g/data/access/projects/access/apps/fcm/2019.09.0/lib
/g/data/p66/pbd562/projects/access/apps/dummygrib/lib                         | /g/data/tm70/pcl851/src/access-esm-build-gadi/lib/dummygrib
/g/data/p66/pbd562/test/t47-hxw/jan20/4.0.2/gcom/preprocess/src/gcom          | /home/599/mrd599/cylc-run/vn7.0_nci_gadi/share/nci_gadi_ifort_mpp/preprocess/src/gcom
/g/data/p66/pbd562/test/t47-hxw/jan20/4.0.2/oasis3-mct/lib                    | /home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib
/g/data/p66/pbd562/test/t47-hxw/jan20/4.0.2/oasis3-mct/Linux/lib              <
/scratch/p66/txz599/UM/UM_ACCESS-ESM1p5_r343/submodels/UM/ummodel_hg3/ppsrc   | /g/data/tm70/pcl851/src/access-esm-build-gadi/src/UM/ummodel_hg3/ppsrc

@penguian
Copy link
Author

Questions:

  1. In which repositories and branches can I find the source code, Makefiles, etc. that were used to build
exe: /g/data/access/payu/access-esm/bin/coe/um7.3x
exe: /g/data/access/payu/access-esm/bin/coe/mom5xx
exe: /g/data/access/payu/access-esm/bin/coe/cicexx
  1. Why do the executables built by https://github.com/coecms/access-esm-build-gadi/tree/master differ from these?

@HoWol76 HoWol76 self-assigned this Feb 19, 2024
@HoWol76
Copy link
Contributor

HoWol76 commented Feb 21, 2024

I'm investigating this now. Can you tell me what the differences are?

@penguian
Copy link
Author

@HoWol76 Do you mean the differences in code or the differences in output?

@HoWol76
Copy link
Contributor

HoWol76 commented Feb 21, 2024

How do you define differences? What would need to happen for the code to be identical?

@penguian
Copy link
Author

I also ran using a UM executable created from Martin Dix's private https://github.com/MartinDix/ESM1.5 repository on my branch https://github.com/penguian/access-esm-build-gadi/tree/build-um-from-MartinDix
See https://github.com/penguian/access-esm/tree/pre-industrial-MartinDix

The difference in source code is in qxreconf:

diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/ereport_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/ereport_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/ereport_mod.F90	2024-02-20 11:16:08.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/ereport_mod.F90	2024-02-20 11:13:23.000000000 +1100
@@ -50,7 +50,7 @@
   Integer, Intent( InOut )          :: ErrorStatus
 
 ! Local scalars
-  Integer, Parameter            :: unset = -99
+  Integer           :: flush_code
   Character (Len=*), Parameter  :: astline = '************************&
   &*****************************************************'
   Character (Len=*), Parameter  :: msg ='Job Aborted from Ereport'
@@ -76,8 +76,8 @@
     Write (6,*) astline
 
 ! DEPENDS ON: um_fort_flush
-    Call Um_Fort_Flush(6, unset)
-    Call Um_Fort_Flush(0, unset)
+    Call Um_Fort_Flush(6, flush_code)
+    Call Um_Fort_Flush(0, flush_code)
 
     ! On T3E use Cray abort
 #if defined (T3E)
diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90	2024-02-20 11:16:08.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_calc_len_ancil_mod.F90	2024-02-20 11:13:23.000000000 +1100
@@ -119,7 +119,8 @@
       N_Pseudo_Levs = Recondat_Node % Recondat_Info % RPLevs
     Else
       ErrorStatus=1
-      Cmessage='StashCode is not a valid prognostic variable'
+      write(Cmessage, '(a,i3,i4)')                                    &
+        'StashCode is not a valid prognostic variable', SectionCode, StashCode
       Call Ereport( RoutineName, ErrorStatus, Cmessage )
     End If
 
diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90	2024-02-20 11:16:09.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_set_data_source_mod.F90	2024-02-20 11:13:24.000000000 +1100
@@ -341,9 +341,9 @@
 
     ! Check that Source is now set correctly otherwise, fail
     If ( data_source( i ) % source == Input_Dump ) Then
-      Write ( Cmessage, *) 'Section ',                              &
+      Write ( Cmessage, '(a,i2,a,i4,a)') 'Section ',                &
                            fields_out( i ) % stashmaster % section, &
-                           'Item ',                                 &
+                           ' Item ',                                &
                            fields_out( i ) % stashmaster % item ,   &
                            ' : Required field is not in input dump!'
       ErrorStatus = 30
diff -rub ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90 MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90
--- ACCESS-NRI/UM_v7/UM/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90	2024-02-20 11:16:09.000000000 +1100
+++ MartinDix/ESM1.5/umbase_hg3/src/utility/qxreconf/rcf_vertical_mod.F90	2024-02-20 11:13:24.000000000 +1100
@@ -81,7 +81,7 @@
 
 ! Local Data
 Character (Len=*), Parameter      :: RoutineName='Rcf_vertical'
-Character (Len=80)                :: Cmessage
+Character (Len=100)                :: Cmessage
 Integer                           :: ErrorStatus
 Integer                           :: i
 Integer                           :: j
@@ -102,8 +102,10 @@
     ! sizes should be the same, but will check
     If ( field_in % level_size /= field_out % level_size .OR. &
          field_in % levels /= field_out % levels ) Then
-      Cmessage = 'No interpolation, but data field sizes/levels are &
-                 &different!'
+      write(cmessage,'(a,2i10,2i4)')                                    &
+        'No interpolation, but data field sizes/levels are different!', &
+        field_in % level_size, field_out % level_size,                  &
+        field_in % levels, field_out % levels
       ErrorStatus = 10
       Call Ereport( RoutineName, ErrorStatus, Cmessage )
     End If
Only in ACCESS-NRI/UM_v7/UM/ummodel_hg3: bin
Only in MartinDix/ESM1.5/: umrecon

According to mule-cumf the restart000/atmosphere/fixed.restart_dump.astart output is bitwise identical between the
https://github.com/ACCESS-NRI/UM_v7 and https://github.com/MartinDix/ESM1.5 versions of access-esm-build-gadi/bin/um_hg3.exe:

[pcl851@gadi-login-06 access-esm.3.old]$ cat logs/cumf.build-gadi.1-build-gadi.MartinDix.1.log 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* (CUMF-II) Module Information *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

mule       : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/mule/__init__.py (version 2022.07.1)
um_utils   : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/um_utils/__init__.py (version 2022.07.1)
um_packing : /g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/um_packing/__init__.py (version 2022.07.1) (packing lib from SHUMlib: 2023061)


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
* CUMF-II Comparison Report *
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

File 1: archive.build-gadi.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
File 2: archive.build-gadi.MartinDix.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart
Files compare
  * 0 differences in fixed_length_header (with 7 ignored indices)
  * 0 field differences, of which 0 are in data

Compared 5358/5358 fields, with 5358 matches

@penguian
Copy link
Author

penguian commented Feb 21, 2024

Note that Makefile in https://github.com/penguian/access-esm-build-gadi/tree/build-um-from-MartinDix contains the line

cp patch/UM_exe_generator-ACCESS1.5 $@/compile/

so the UM_exe_generator-ACCESS1.5 shell script that builds um_hg3.exe comes from https://github.com/penguian/access-esm-build-gadi and not from either UM source code repository.

@penguian
Copy link
Author

In contrast, when I run the pre-industrial configuration with the original 'coe' executables /g/data/access/payu/access-esm/bin/coe/um7.3x, etc. the resulting archive.coecms.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart file differs from archive.build-gadi.1/access-esm/restart000/atmosphere/fixed.restart_dump.astart in 3528 out of 5358 fields as shown above.

Without the original source code and scripts that created the /g/data/access/payu/access-esm/bin/coe/* executables, it is difficult to tell what is causing the difference in UM restart000 output. The differences in UM restart output may be caused by a source code or compilation difference in UM, or it may be in CICE, MOM, Oasis3-MCT, GCOM, etc.

@penguian
Copy link
Author

penguian commented Mar 7, 2024

I have investigated why, when running the pre-industrial branch configuration, the executables built from https://github.com/coecms/access-esm-build-gadi/tree/master do not produce bitwise identical output when compared to the executables at /g/data/access/payu/access-esm/bin/coe/

Briefly, the build using the default Makefile settings creates an environment.sh file that includes the line

OASIS_MANUAL=False

which causes

module load oasis3-mct-local/ompi.4.0.2

so that the executables are built using the module version of Oasis3-MCT.

I have created the branches

and am running the pre-industrial configuration again, to make sure that the output reproduces the output from the executables at /g/data/access/payu/access-esm/bin/coe/

@HoWol76
Copy link
Contributor

HoWol76 commented Mar 7, 2024

I was just to update this myself.

The old executables were probably build of code revision 338, as opposed to the most recent 343. The difference is small, a few variables (wresp, thinning) get initialised to 0.0

@penguian
Copy link
Author

penguian commented Mar 7, 2024

Thanks. As far as I can tell, the main difference is in Oasis3-MCT. I will need to contact @MartinDix to chase down the source code to compare with https://github.com/penguian/oasis3-mct/tree/new_modules_pbd562

 #%Module

set help            "Oasis3 coupler"
set install-contact "Martin Dix"
set install-date    "2020-01-17"
set url             "https://verc.enes.org/oasis"
set prefix          ~access/apps/oasis3-mct/ompi.4.0.2

conflict            oasis3 oasis3-mct
prereq  openmpi/4.0.2

source              ~access/modules/common

@penguian
Copy link
Author

penguian commented Mar 7, 2024

I think I found the source.

$ strings ~access/apps/oasis3-mct/ompi.4.0.2/lib/*.a |grep '^/[a-z]'|cut -d'(' -f1|sort -u|head -n 5
/apps/openmpi/4.0.2/include/Intel
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src/mod_oasis_advance.F90
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src/mod_oasis_auxiliary_routines.F90
/home/599/mrd599/cylc-run/u-bp124/share/oasis3-mct_local/lib/psmile/src/mod_oasis_coupler.F90

In u-bp124 I see:

./suite.rc:svn checkout https://access-svn.nci.org.au/svn/oasis/branches/dev/mrd599/oasis3-mct-errorhandling oasis3-mct_local

@penguian
Copy link
Author

penguian commented Mar 7, 2024

There are many source code changes between https://github.com/penguian/oasis3-mct/tree/new_modules_pbd562
and file:///g/data/access/access-svn/oasis/branches/dev/mrd599/oasis3-mct-errorhandling
so this is likely to be the cause of differences between executable behaviours.

@penguian
Copy link
Author

If you run

svn co file:///g/data/access/access-svn/oasis/branches/dev/mrd599/oasis3-mct-errorhandling oasis3-mct-local
cd oasis3-mct-local
svn log --diff

you will see

[...]

------------------------------------------------------------------------
r42 | hxy599 | 2014-06-26 11:33:11 +1000 (Thu, 26 Jun 2014) | 1 line

update to Oasis2-MCT2.0 branch@r1024
[...]
Index: lib/scrip/src/remap_bicubic.f
===================================================================
--- lib/scrip/src/remap_bicubic.f	(revision 41)
+++ lib/scrip/src/remap_bicubic.f	(revision 42)
@@ -80,7 +80,7 @@
      &    max_iter = 100   ! max iteration count for i,j iteration
 
       real (kind=dbl_kind), parameter ::
-     &     converge = epsilon(1.0_dbl_kind) ! convergence criterion
+     &     converge = 1.e-10_dbl_kind ! convergence criterion
 
 !***********************************************************************
[...]

I think that the change to converge would be enough to cause the drift in output values from the pre-industrial configuration that is seen when using the ~access/apps/oasis3-mct/ompi.4.0.2 module as opposed to compiling from
https://github.com/penguian/oasis3-mct/tree/new_modules_pbd562

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants