Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: enable testing on Windows with GCC #291

Conversation

gnikit
Copy link

@gnikit gnikit commented Jun 2, 2024

  • Compilers are installed via the GA fortran-setup (choco in the backend)
  • Version of NetCDF via APT has been replaced with conda, where possible
    and packman on Windows
  • Consequently, include and link directories need to be setup at runtime
    for variable expansion to occur
  • In the Makefiles changes have been made to properly handle variable expansions
    and bash-like regular expressions

Additional Notes

  • GCC 12+ on the Windows runners causes some rather weird behaviour that I was unable to recreate locally with the same compiler version on a Windows 10 VM. I tried a fair bit of time trying to get to the bottom of it but I am not sure what is going on.
  • environment-noplots.yml is not used anymore so if you have no use for it I could delete it.
  • On Windows I install NetCDF via msys because the conda-forge version has been compiled GFortran 5 and hence its ABI is incompatible and the the conda-forge package is not available whilst use-only-tar-bz2 is true.
  • I believe the Windows cache for Anaconda is not setup correctly.
  • It would be great if we could cache the compilers, especially for Windows.
  • You could probably test the Intel compilers in the same Workflow.
  • I would suggest enabling backtraces and increasing compile/runtime checks with GCC specifically -fcheck=all -finit-real=snan -finit-integer=-999999 -finit-character=0 -ffpe-trap=invalid,zero,overflow -fbacktrace. The -ffpe-trap=... options cause for a segfault which I am guessing are indicative of a bug.

@RobertPincus
Copy link
Member

@gnikit Thanks so much this is wonderful. Do you mind to set the target of the PR to the new branch feature-add-windows-CI that I just made?

@RobertPincus
Copy link
Member

@skosukhin Alerting you to this which will help tons in building conda feedstocks for the libraries

@gnikit
Copy link
Author

gnikit commented Jun 3, 2024

@gnikit Thanks so much this is wonderful. Do you mind to set the target of the PR to the new branch feature-add-windows-CI that I just made?

Yes, no problem. I also just noticed that a few debug commits got pushed in this remote. I will drop them.

As for the conda-forge feedstock, let's make an Issue to track progression and have any necessary discussions.

@gnikit gnikit changed the base branch from main to feature-add-windows-ci June 3, 2024 09:41
@gnikit
Copy link
Author

gnikit commented Jun 3, 2024

@RobertPincus there are a series of conflicts with the new base branch, is it behind main by any chance?

@RobertPincus
Copy link
Member

@gnikit Seems like you branched from main before commit 41c5fcd? With that commit develop and main should have been synced, though there are a few mall changes at first.

@gnikit
Copy link
Author

gnikit commented Jun 3, 2024

That's all right. I can rebase/recreate to earth-system-radiation:feature-add-windows-ci just to be safe

- Compilers are installed via the GA fortran-setup (choco in the backend)
- Version of NetCDF via APT has been replaced with conda, where possible
  and packman on Windows
- Consequently, include and link directories need to be setup at runtime
  for variable expansion to occur
- In the Makefiles changes have been made to properly handle variable expansions
  and bash-like regular expressions
@RobertPincus
Copy link
Member

@skosukhin This PR puts Windows and Linux CI in the same file - I'm wondering if that's wisest, or if we should create Windows-specific files?

@gnikit
Copy link
Author

gnikit commented Jun 11, 2024

@skosukhin This PR puts Windows and Linux CI in the same file - I'm wondering if that's wisest, or if we should create Windows-specific files?

The majority of the code between the Linux and Windows GA workflows is the same, so you would essentially double your maintenance cost were you to split them in two files.

@RobertPincus
Copy link
Member

Yes, I saw that @gnikit but it also seems like you have to do some shenanigans to adapt the Windows workflow to Linux (e.g. renaming files) and vice-versa (e.g. using chocoltey? to manage compilers). We find that people use the Linux workflow as a guide to set up local installations so I wonder if something that was more Windows-like would be a better example in the long run? (The big caveat is that Gnu make seems to be very not-like-Windows, but maybe we'll move to CMake or similar over time).

@RobertPincus
Copy link
Member

No strong opinions on my side by @skosukhin has lots of experience with the CI...

@RobertPincus
Copy link
Member

@skosukhin ?

Copy link
Collaborator

@skosukhin skosukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, having the tests for Linux and Windows in the same file looked like a good idea to me but the difference in the steps that are run on Linux and Windows is quite big to consider moving the Windows tests to a separate file. The CI scripts are often the best and the most consistent sources of information on how to build and run things. I am afraid, that we might lose that. I don't insist though.

I am also not sure if we need to test on Windows at all. Do we really?

- nodefaults
dependencies:
- python=3.11
- numpy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- numpy

Why is the name environment-ci.yml better than environment-noplots.yml? What is the benefit of the channels entry? Is it relevant to CI only or can we simply add it to the existing environment files?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the name environment-ci.yml better than environment-noplots.yml

Either names are fine. I didn't want to delete your conda config files, which is why I made another environment.

What is the benefit of the channels entry?

It is a good idea to instruct conda from which channels to fetch your packages, e.g. netcdf is not available in default it's only available in conda-forge. This is true for pretty much all your packages. Channels help make your env reproducible, so IMO you should be specifying it everywhere. Docs link to channels.


Unrelated to the previous bit, numpy has been added because it is needed in compare-to-reference.py.

@@ -57,10 +57,10 @@ tests: rrtmgp_allsky
$(RUN_CMD) bash all_tests.sh

check:
$${PYTHON-python} ${RRTMGP_ROOT}/examples/compare-to-reference.py --ref_dir ${RRTMGP_DATA}/examples/all-sky/reference --tst_dir ${RRTMGP_ROOT}/examples/all-sky \
$${PYTHON-python} "$(RRTMGP_ROOT)/examples/compare-to-reference.py" --ref_dir "$(RRTMGP_DATA)/examples/all-sky/reference" --tst_dir "$(RRTMGP_ROOT)/examples/all-sky" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember, ${var} and $(var) mean the same thing for GNU Make. Are these changes decorative or does mingw-w64-x86_64-make choke with the current syntax?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$var and ${var} are equivalent with the latter being more robust in defining variable boundaries e.g. echo ${var}foo.
$(var) is used for command execution in your shell. In this case, the usage of $(RRTMGP_ROOT) is incorrect and leads to an erroneous expansion of the variable under msys2 + mingw-make.

There are some action runs in my fork where that is demonstrated, (the link to the Action run currently escapes me).

Comment on lines +124 to +126
echo FCINCLUDE=-I ${CONDA_PREFIX}/include >> $GITHUB_ENV
echo LDFLAGS=-L ${CONDA_PREFIX}/lib >> $GITHUB_ENV
echo LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH} >> $GITHUB_ENV
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo FCINCLUDE=-I ${CONDA_PREFIX}/include >> $GITHUB_ENV
echo LDFLAGS=-L ${CONDA_PREFIX}/lib >> $GITHUB_ENV
echo LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH} >> $GITHUB_ENV
echo "FCINCLUDE=-I${CONDA_PREFIX}/include" >> $GITHUB_ENV
echo "LDFLAGS=-L${CONDA_PREFIX}/lib" >> $GITHUB_ENV
echo "LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are equivalent I believe, just different styling, so happy to change those.

Comment on lines +133 to +135
echo FCINCLUDE="-I/mingw64/include" >> $GITHUB_ENV
echo LDFLAGS="-L/mingw64/lib" >> $GITHUB_ENV
echo LD_LIBRARY_PATH="/mingw64/lib:/mingw64/bin:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo FCINCLUDE="-I/mingw64/include" >> $GITHUB_ENV
echo LDFLAGS="-L/mingw64/lib" >> $GITHUB_ENV
echo LD_LIBRARY_PATH="/mingw64/lib:/mingw64/bin:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
echo "FCINCLUDE=-I/mingw64/include" >> $GITHUB_ENV
echo "LDFLAGS=-L/mingw64/lib" >> $GITHUB_ENV
echo "LD_LIBRARY_PATH=/mingw64/lib:/mingw64/bin:${LD_LIBRARY_PATH}" >> $GITHUB_ENV

Is the LD_LIBRARY_PATH variable needed here? If so, do we need the /mingw64/bin entry?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before for the styling, happy to change it to match.

Is the LD_LIBRARY_PATH variable needed here?

Yes, MSYS is not in the runner's PATH, see docs and we are not setting the -Wl,-rpath,/bath/to/libs. We have to set explicitly LD_LIBRARY_PATH.

If so, do we need the /mingw64/bin entry?

This is where the compiler's runtime libraries are. I have no idea what compelled the MSYS packagers to place them there, but I am sure there is a reason which I don't know about.

see output from my fork's Action
https://github.com/gnikit/rte-rrtmgp/actions/runs/9326569389/job/25675276582#step:4:312

/mingw64/bin/libgfortran-5.dll

if [[ $(uname) == *"NT"* ]]; then
# Rename files ending with .exe to the same name without the extension
for file in *.exe; do
mv "$file" "${file%.exe}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should not move the files but set the EXEEXT variable to .exe when on Windows and honour it in the makefiles and scripts:

%$(EXEEXT): %.o
	$(FC) $(FCFLAGS) -o $@ $^ $(LDFLAGS) $(LIBS)
./rrtmgp_allsky${EXEEXT} ...

and so on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, I like this more and it was also my first implementation, but then I realised I didn't know if something else in your pipelines was consuming these binaries, so I played it safe. Happy to make that change.

#
- name: Install NetCDF-Fortran via Conda
if: runner.os == 'Linux'
run: conda install -y -c conda-forge netcdf-fortran
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of installing netcdf-fortran with Conda (given that it seems to be non-portable and we do that on Linux only)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually quite portable, see their conda-forge feedstock. All their installers across OSs and architectures are at v4.6.1. Since conda and conda-forge are great at managing such dependencies I would be inclined to use them as much as possible in this case.
The problem with Windows is that the gfortran version is rather old. You don't want to be testing with an old compiler but for packaging purposes it's fine.

Bigger picture: When you are packaging your feedstock one of the steps is to make sure that all your dependencies i.e. netcdf-fortran are available in conda-forge. There are ways around this (e.g. building from source), but the APT package manager is not one of them.

@RobertPincus
Copy link
Member

@skosukhin The idea of Windows testing is in part to prepare for a Conda feedstock for the libraries. @gnikit was responding to a request I made for help on a Discourse since I have no experience with and no access to Windows development environments.

@gnikit had asked about moving to CMake (as you had in the past); it's starting to seem like that's a really good idea and worth the hassle of having to learn.

@gnikit
Copy link
Author

gnikit commented Jun 20, 2024

I will try and answer your comments @skosukhin in the coming days but my availability is very limited so I would suggest you push directly onto this branch. To setup the win CI.

@RobertPincus RobertPincus merged commit b735833 into earth-system-radiation:feature-add-windows-ci Jun 20, 2024
35 checks passed
@RobertPincus
Copy link
Member

@gnikit Thanks for your suggestions. I've merged your changes into our branch where we might continue to make some small edits.

@gnikit
Copy link
Author

gnikit commented Jun 20, 2024

At first, having the tests for Linux and Windows in the same file looked like a good idea to me but the difference in the steps that are run on Linux and Windows is quite big to consider moving the Windows tests to a separate file.

Completely up to you guys. IMO this way is simpler/easier to maintain, since there is a less duplication, but this is very subjective. The right answer here is whatever is easier for you to maintain.

The CI scripts are often the best and the most consistent sources of information on how to build and run things. I am afraid, that we might lose that. I don't insist though.

Personally, I prefer a well written section in the README or the docs page, since CI workflows can get rather messy, but again completely up to you. As a side note, I found building the lib extremely easy which was great since it made the task a lot easier.

I am also not sure if we need to test on Windows at all. Do we really?

For me it makes sense to test it if you expect to have/already have users on Windows. It builds up a sense of security and trust to your users.

@gnikit
Copy link
Author

gnikit commented Jun 20, 2024

@RobertPincus @skosukhin I think I finished with my replies (also saw that this got merged in). Always happy to chat more on this, so feel free to ping me if you need something (I might be slow to reply but I'll do my best).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants