-
Notifications
You must be signed in to change notification settings - Fork 19
Minutes and agenda of spack team meetings
- Only Dominic showed up.
- uenv with spack v0.22 on Balfrin
- OpenBLAS fails to build
- CrayMpich fails to build
- Demo https://github.com/C2SM/spack-c2sm/pull/1003
- Icon package in spack
- It was introduced in spack v0.22
- Matthieu provides a uenv with spack v0.22 on Balfrin (in the week of the 14th Oct.)
- Dominic will do the initial adaption so we can make plans, based on it.
- Tentative week to update, early Nov.
- Spack on Alps
- Icon Variants
- We hold them until the Flag injection thing is solved.
- Flag injection
- Doesn't work always.
- Matthieu and Dominic will look into it.
- Strategy of spack-c2sm for the remainder of the year
- Drop old machines + remove the upstream things used for Daint
- Simplify sysconfigs to achieve a machine independent config
- Upgrade to spack v0.22
- Make use of Sergey's package description of ICON
- Add Santis to CI
- Simplify CI
- Discontinue support of Tsa, Daint, Dom in main.
- MCH thinks Tsa is no longer used. But we wait for the thumbs up.
- Jonas will think about how to test COSMO, infero, int2lm and oasis.
- Flaky test
- They are slowing CI down.
- Jenkins triggers
- Reevaluate once Daint is powered off.
- MCH's reasoning why to do it: https://meteoswiss.atlassian.net/wiki/spaces/APN/pages/235736706/Deprecation+of+GitHub+keywords+triggers+for+Jenkins+plans
- Future mch-environment/v7 on Balfrin
- Currently we think the problem is in eccodes.
- Icon Alps building
- hdf5%gcc caused an issue with emvorado and rttov (because there's a fortran interface)
- openblas with gcc doesn't match the references, which it wont anyway.
- Icon Balfrin building
- mch-environment/v7 with spack v0.21 has a hdf5 problem.
- Uenv CI
- Tests work
- py-tabulate
- The package was extended with v0.8.10, but that extension was never upstreamed. Dominic took care of it: /scratch/mch/jenkins/workspace/stackyspack_PR-4. Please always upstream to help reduce the long term maintenance cost of spack-c2sm.
- mch-environment/v6 update
- Waiting on CSCS
- Matthieu will reach out to Ben
- uenvs in CI
- We need Balfrin's uenv in the uenv registry
- StackySpack
- This is Dominic's experiment to strip away as much complexity as possible with loosing as little functionality as possible.
- Spack on Euler
- Euler will come with software installed by spack
- There's no nice curated compilers.yaml etc.
- mch-environment/v6 update
- Waiting on CSCS
- Alps infrastructure
- Replaced python's unittest with pytest
- Tests can be called for different uenvs
- A version of icon was compiled on Säntis
- Icon and Spack version
- Icon contains a lot of spack-c2sm versions. We only need the versions in the spack_tag, everything else can be removed.
- Finally upgrade spack to v0.21.1!
- There's a problem with MPI and HD5 in mch-environment/v6. Dominic will look into it.
- We decided to merge dev_v0.21.1 even though the blas/lapack problem on Daint remains.
- pytest fixtures
- We agreed that the proposed solution shall be implemented.
- spack v0.21.1
- Icon4py gt4py is fixed.
- icon exclaim still has a problem with lapack + blas on Daint
- machine locking
- we'll try without locking spack-c2sm and fall back to locking if required.
- The uenvs group libraries together. Thus deprecation of old versions can happen bunch by bunch, rather than lib by lib. That scales well.
- spack v0.21.1
- Gt4Py issues. Bumping up the typing extension versions doesn't help. The package concretizes to 2 versions of pip.
- Using externally installed packages might give more insight where the problem is coming form.
- We need a team of experts, spack, icon-dsl, gt4py to tackle to issues together.
- Open issues
- spack v0.21.1
- There are still red tests.
- We can drop support of Daint, for the failing Icon tests.
- Issue has been raised Feb 14th: https://github.com/C2SM/spack-c2sm/issues/911 . Icon Exclaim with Gt4py doesn't build.
- Xavier will forward the issue to the Exclaim board.
- Icon build system
- We don't have any intel.
- Open issues
- spack v0.21.1
- There are still red tests. We're in no hurry, so we wait for Jonas.
- Icon package with Sergey
- John Biscomb is trying to use CMake to build ICON, thus it's not the right time to create a spack package.
- Dominic is not going to proceed on the Icon package for spack, for now.
- Stackinator
- C2SM will use the Stackinator to create user environments.
- There will be project based software stacks.
- MCH encountered a problem with the mounted squashfs in scripts that call other scripts and some of them are auto-generated. Some nested environments didn't see the squashfs. We think there is a way to make it work, we just decided it's too much work for us, since we mount uenvs permanently.
- spack update is blocked by some gt4py dependencies. There's is a plan to go forward. This will be the last version supported for Daint.
- replace blas/lapack on Daint
- numpy conflicts with nvhpc's built-in blas/lapack
- How do we progress? It's maintaining two versions of spack vs. updating spack on Daint and solving the blas/lapack problem. Abishek will bring this to the EXCLAIM team.
- upgrade to spack 0.21.1
- Daint runs but Tsa and Balfrin pose a problem. Dominic will look into it.
- spack-c2sm with a squashfs
- reusing the installed libs failed at first for Matthieu.
- Fork is superseded by upgrade to spack v0.21.1. We expect this to be merged before MCH goes into production.
- Dominic will invite Matthieu to the spack-c2sm meeting.
- Introduce Sergey's package of Icontools
- Current package not working with MPI (EXCLAIM might need it)
- Licence not clarified with Sergey
- Xavier organises meeting with Sergey about ICON spack packages
- Work on Spack to be ready for Alps
- Automatic detection of upstream instances
- Dominic knows more about this workflow, scripts are part of spack-c2sm
- May be difficult with BuildBot
- Sam and Abishek work on Pre-Alps
- Plan is to use spack-c2sm eventually
- Fork of Dominic
- Better use fork at C2SM, ask Dominic
- Jonas makes fork into C2SM
Happy New Year
-
--enable-loop-exchange
and-disable-loop-exchange
both set for GPU-build of ICON- Issue is open: https://github.com/C2SM/spack-c2sm/issues/897
- Extend Icon package to support DSL's fused options. -> next DSL planning
- Extend Icon package to support DSL's serialized option. -> next DSL planning
- dev-build
- dev_path problem with Jenkin's "@" in the file path is fixed, by reconfiguring Jenkins.
- flexpart and fdb use dev-build
- Icon has "extra-config-args" now
- This can be used as
extra-config-args=--disable-something
- This can be MISused as
extra-config-args=--enable-fused
- This can be used as
- Separate machine config from spack-c2sm
- John will give input
- With spack v0.20 apparently
spack dev-build <package> @<my_string>
doesn't work any more and no longer shows up in their Developer Workflows Tutorial. But if the version specifier matches a known version, spack will use the local repository and treat it as if it was the specified version.- The tutorial shifted from dev-build (back in 2019) https://spack-tutorial.readthedocs.io/en/lanl19/tutorial_developer_workflows.html to using spack environments (latest tutorial) https://spack-tutorial.readthedocs.io/en/latest/tutorial_developer_workflows.html
- Problem if dev_path=workspace/my_plan@2/, Jonas open a ticket at CSCS to fix Jenkins
- How to have more flexibility with the ICON package
- spack provides "fflags" "cflags" and "cppflags", can be defined in
spack.yaml
- make variant to pass configure-options, Jonas opens issue about that
- spack provides "fflags" "cflags" and "cppflags", can be defined in
- ICON package: Different set of debugging flags
- Open an issue to keep for early next year, Jonas opens issue
- CMake build option for icon4py stencils in icon-exclaim, Jonas checks how fortran-support implements this
- Status of discussion regarding spack refactoring, will be discussed on Wednesday
- ICON package
- Sergey has been informed in greater detail. We're going to meet with him.
- Package maintainers
- Dominic will create an issues, so we can tackle the problem async.
- Testing of package's branches
- Assigned!
- Daint's upstream
- Abishek is working on populating the upstream more.
- New gt4py version is causing issues
- Abishek is having some issues with a PR.
- Balfrin upgrade
- We'll keep you updated about the upgrade in Slack.
- Testing of branches in spack-c2sm
- Jonas creates a tag for icon-c2sm and updates the tests.
- Dominic identifies all packages that have tests based on branches, contacts the owners.
- Beginning of December we will remove tests of branches if they fail.
- ICON-C will pick up the task to extend the BB scripts such that they can be launched by BB, users, spack.
- Balfrin upgrade
- 22.Nov an update is scheduled for Balfrin
- Dominic adds concertizer reuse=True in sysconfigs.
- Vial has PR and worked pretty smoothly.
- icon4py
- New dependencies were added, some of them are now optional, but this hasn't made it to spack-c2sm yet.
- There's a problem with maybe nested venvs, surely with "contaminated" PYTHONPATH. Resetting PYTHONPATH is a solution, https://github.com/C2SM/spack-c2sm/compare/main...nested_venv might be an other one.
- Daint's upstream is less populated than before, thus the builds of the CI of icon4py, icon-dsl, gt4py take longer.
- ICON in spack
- Sergey hesitated to collaborate, thus Dominic closed the PR https://github.com/spack/spack/pull/40043
- LD_LIBRARY_PATH in spack v0.20
- we need to test it to see if it affects anything
- spack-c2sm testing
- ICON22 will look into using C2SM's testing script in icon-nwp
- spack for COSMO
- Ulrich Schättler is considering using spack to fetch dependencies of ICON.
- Dominic is trying to put Sergey's ICON package and deps into spack/spack.
- Michael's problem with flexpart-cosmo
- dev-build works, but install doesn't
- Tsa v0.20.1
- no timeline yet
- MCH is experimenting with a binary cache
- MCH will let the other spack dev's know about the ongoing work.
- ICON-DSL PRs coming in and new packages
- fix needed for fc-group variant for v0.20.1
- needed some adjusting, since ";" is not allowed in a spec in v0.20.1
- We need to take a decision if v0.20.1 should run on Tsa.
- Alps, Spack and EXCLAIM
- Preliminary access is given to a Alps vCluster (H100 ?) to EXCLAIM
- Coordination with a bigger audience of EXCLAIM need to happen
- PyBind issue
- Enrique is back as of today and will look into it.
- Icon-DSL dependencies on Balfrin
- MCH will discuss internally if EXCLAIM should directly ask for upstream change requests.
- Spack long path issue
- Paths can't be longer than 127 characters.
- On BuildBot the cache-folder is moved to the same level as spack-c2sm.
- srun related arguments in cosmo-dycore
- The quoted variants don't work in spack v0.20
- We move the machine specific slurm instructions to the cosmo repository
- py-pybind11 version lock
- gt4py uses the system python instead of the one provided by spack
- isolate CI runs with
export SPACK_DISABLE_LOCAL_CONFIG=true
andexport SPACK_USER_CACHE_PATH="$parent_dir"/user-cache
- We could put this into spack-c2sm/setup-env.sh
- clean up after spack update
- Dominic will look into this once Balfrin is back
- We need documentation what to delete if variants are changed in the spack env and if the icon configure is edited
- spack clean up daint : Xavier will send a mail Friday , removal 16.08.2023 - contact [email protected]
- spack icon testing: cannot see the log file. Should have the log in case of failure.
- currently issue with Jenkins cloning dkrz repo
- possibly review the list of test
- check if srun works for compiling on Balfrin
- In packages one should use the variant not using the spec
- Review the icon package : after october 2 in view for Alps
- spack env activate side effect: inform user for the time being, discuss with CSCS. take further action later.
- Sam Kellerhals is back. Dominic sends new doodle.
- End support for spack-c2sm v0.17.0.1
- Deletion of upstream on Daint is okay
- End support for spack-c2sm v0.18.0.0 - v0.18.1.3
- Deletion of upstream on Daint is okay
- Upstream with DSL deps on Balfrin
- Jonas sends list of dependencies.
- Dominic creates design document at MCH.
- Icon package
- Sam Kellerhals will be back on July 10th. So the new meeting slot has to wait.
- "Makes icon a CudaPackage" PR
# sysconfigs/<machine>/packages.yaml
packages:
all:
variants: cuda_arch=80
# spack.yaml
spack:
specs:
- icon @develop %nvhpc gpu=80
https://github.com/C2SM/spack-c2sm/blob/main/test/system_test.py#L254-L267:
def test_install_c2sm_test_cpu_gcc(self):
spack_env_dev_install_and_test(
'config/cscs/spack/v0.18.1.1/daint_cpu_gcc', 'icon-2.6.6')
uses the old API. How do we proceed? Solution: Create a tag icon-2.6.6.1
- Dominic creates doodle for other time slot of spack meeting
- upstream PR
- https://github.com/C2SM/spack-c2sm/issues/738
- Rttov in upstream on Daint
- Discuss BB implementation (after discussion with Ralf from DKRZ)
- Upstream strategy on Daint/Balfrin
- Tests of Python packages don't find key-modules shipped with spack for dev-build
- https://github.com/C2SM/spack-c2sm/issues/721
- As Jenkins-user only!
- cuda-gcc variant in ICON
- unused and untested
- a lot of code in
package.py
of ICON - Remove? Yes!
- Balfrin update
- There was a caching problem on BuildBot on Balfrin. It was due to the old version of spack-c2sm in use. Newer version solve this. Moving forward with icon will solve the problem eventually.
- Dominic will ask CSCS to provide OpenBLAS in the upstream.
- Compilation on Daint
- Should we provide a non-spack compilation for ICON?
- MCH and C2SM are not interested to support one. Will and Sergey are though.
- Exclaim
- latest icon4py and gt4py are building
- some design around variants are going on
- Abishek will organize a meeting to review the design of the build system
- dace fix
- The icon package needs a variant for it #697
- dace plus serialbox fix
- Creates a decoding error.
- Find a solution to do something like this:
spack install icon fcgroup='(DACE,externals/dace_icon,-O1),(JSBACH,externals/jsbach_icon,-O0)'
- Publish spack-c2sm v0.18.1.1 today?
- Dominic will create v0.18.1.0 on dev_v0.18.1
- We merge everything into main and create v0.18.1.1
- Dominic organizes a dedicated clean-up day in the week of the 6th March
- test_cosmo.py for c2sm-features like version broken -> removed and fix in later release
- ICON DSL
- Jonas is visiting Exclaim tomorrow, and they start working on ICON DSL
- gt4py has a tag
- Icon on Balfrin
- Dominic will create a PR for that test
-
ICON-PR
- We still need to handle cosmo-eccodes-definitions
- A stable branch of spack-c2sm for buildbot should work
- Timeline for v0.18.1.1
- Daint upgrade was postponed ~end of February
- Make decisions on Thursday 16.Feb 14:00-15:00
- Upstream for Daint is ready to be tested. MCH decided to not have an upstream on Tsa.
- Features for v0.18.1.2
- ICON as a CudaPackage
- Docs
- There are now preview actions that preview the docs and post it to every PR commit.
- Jonas startet overhauling the docs for v0.18.1.1
- icon4py and gt4py versions
- A first tag maybe after the functional merge
- Spack v 0.18.1 unable to run ICON with Serialbox Ticket at CSCS pending, possible discussion points below:
- Stick to v0.17.1 on Daint
- Get rid of Serialbox on Daint
- Only support v0.17.1 for GPU-devs, users use v0.18.1
- Introduce v0.18.1 on Alps not on Daint
- Transfer to pure Spack package to avoid divergence between EXCLAIM and MCH/C2SM
- problem solved, Serialbox needs to be linked statically!
- Spack locking problem
- A deadlock when two processes try to upgrade a lock from read to write
- A conceptual bug when a process tries to "upgrade" a "non-lock" to a write-lock
- Upstreams
- Tsa: Claw fails because Ant fails: Unable to find a javac compiler; com.sun.tools.javac.Main is not on the classpath. Perhaps JAVA_HOME does not point to the JDK. It is currently set to "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.302.b08-0.el7_9.x86_64/jre"
- Daint: Claw fails because Ant fails because OpenJDK fails: Permission denied: '/project/g110/spack/upstream/daint_v1/openjdk/11.0.15_10/gcc/vva3wvrqfcoaktjumjsyi3kd75hep6vr/legal/jdk.localedata/thaidict.md'
- Test overview v0.18.1 https://github.com/C2SM/spack-c2sm/issues/624
- Decisions are added in issue
- Daint upgrade
- Go-No-Go Meeting 17.Jan
- Upgrade 17.Feb
- Icon and Cosmo seem to work. Not validating but running.
- Status of spack-c2sm v0.18.1
- Sam and Abishek will ask for Versions in Py4Icon and Gt4Py
- problems with mo_util_texthash.f90 with Serialbox and nvidia for Sergey's package of icon
- not with the old package of icon
- Jonas will ask for help. Maybe Sergey, Xavier or Pirmin can help.
- Docs (MD or io-page)
- Michael takes a look of how to make io pages work for multiple versions and maybe with a preview function of the rendering
- Agree on upstream content for Tsa and Daint
- We start an experiment in a new branch
- Dominic installs an upstream for Tsa and Daint on project/g110 with a Jenkins plan and pings Michael
- Michael makes use of the upstream and tests it
- Piz Daint upgrade 15.2.22
- Upgrade to nvhpc 22.3
- We're looking for help from the Exclaim devs
- srun testing
- also for dev_v0.18.1 now
- Next meeting
- Xavier deletes the invite. Dominic sends out new ones.
- gt4py and icon4py as Spack packages
- Update from v0.18.1
- cosmo,cosmo-dycore and int2lm run on Daint
- icon runs on GPU as well
- Upstream instances, release management v0.18.1
- Dominic presented an idea of versioned, immutable upstreams.
- Sergey's icon package was adopted to work on daint.
- Current plan to introduce it after Piz Daint's update 15.2.22
- Icon-Exclaim works now with dev-build, spack environments and buildbot.
Participants: Xavier, Jonas, Michael, Abishek, Dominic
- Linking error on Piz Daint with nvhpc
- There's a problem with nvhpc that Theo and Harmen couldn't solve. There's a workaround right now. We still seek for a better solution.
- Coordinate future of COSMO install commands
- Jonas suggests to keep
installcosmo
anddevbuildcosmo
. We agree. - C2SM will take care of COSMO on Daint, MCH on Tsa.
- Jonas suggests to keep
-
cosmo ~eccodes
doesn't work anymore. Cosmodepends_on('cosmo-grib-api-definitions', when='~eccodes')
, which uses github.com/elsagermann/libgrib-api-cosmo-resources, which doesn't exist anymore. What do we do?- Remove grib from cosmo and fix +eccodes, and remove cosmo-grib-api-definitions?
- It's in https://github.com/C2SM-RCM/libgrib-api-cosmo-resources. Dominic creates an issue to clean it up with Xavier
- icon4py
- missing setup.py, pyproject.toml etc.
- It's set up as namespace packages, thus setup.py in each folder. This gives you the option to not install them all, but choose.
- Samuel and Jonas will have a look at it, maybe Enrique or Hannes as well.
- blas/lapack libs for ICON package of Sergey
- When building blas/lapack the tolerance tests fail.
- We need to discuss this with the experts (Annika).
- Dominic invites Abishek and Samuel to MeteoSwiss Slack.
Participants: Carlos, Michael, Dominic
- Exclaim
- Abishek will join the spack team next meeting.
- Will tries to merge Sergey's package of ICON. We directed him to dev_v0.18.1
- Cosmo with spack env
- There's progress and it looks promising so far.
- In the end one env for CPU and GPU should be enough and it should go into the package's repo.
- dev_0.18.1
- We need to work out an upstream model for Piz Daint, only accessible for BuildBot.
-
Dusk and dawn can be removed(edit: This was based on a wrong assumption) - omni-xmod-pool, omnicompiler and xcodeml-tools need to be discussed with Xavier
Participants: Jonas, Michael, Dominic, Xavier
- Upcoming Daint upgrade
- We don't expect an upgrade, but in any case we would be ready. Communication from CSCS is on short notice and conflicting.
- nvhpc fix
- A Macro is missing. A Module from CSCS fixes it.
- PR feedback
- PR is open. Dominic will review.
- spack v0.18.1
- 1.Nov 13:30-14:30 introduction of new work flow. pros and cons. Future work. Outlook and possibilities.
- Jenkins files
- All Jenkins plans should have an owner. Xavier and Jonas will inform Users about it.
- From Nov 18 we will start to delete Plans without owners.
Participants: Carlos, Xavier, Jonas, Dominic
- Coordinate upcoming Daint upgrade
- 19.Oct
- Dom is ready now
- Cosmo, Int2lm, Icontools works fine
- cuda toolkit module name changed. The ICON config file needs to be adapted.
- Jonas will introduce a patch in spack-c2sm's icon package description and a task in GH to remove the patch when we go to config-less ICON package description.
- Compiler update on Daint
- Icontools has a problem with HDF5 with CCE 14
- We let Will test the branch. Then decide on further actions.
- Write lock timeouts
- Doesn't happen so often now.
- We ask CSCS for a solution. Xavier is raising this in exclaim.
Participants: Jonas, Dominic
- Spack v0.18
- finalize serialbox package. With spack v0.18.1 we're going to use the official serialbox package if possible. Dominic is trying out if adding and removing a c2sm-specific overload works.
- No news on the write lock problem from CSCS. We discussed several options to mitigate the problem. We think any of them is worth it.
Participants: Carlos, Jonas, Dominic
- PR testing
- It's slow.
launch jenkins all
takes over 4h - Write lock timeouts are annoying! #512
- It's got room for parallelization #533
- Ideas:
- Put Jenkins instance and installs into user/local folder, not scratch (Dominic looks up how it's working atm)
- Ask CSCS for help (Buildah uses this https://user.cscs.ch/tools/containers/buildah/. Jonas opens ticket)
- Increase timeout threshold (Dominic creates PR experiment. Should this work, it's only a temporary fix!)
- It's slow.
- Quick overview about icon testing
- There's a test script in icon-exclaim now. Jonas will pass that info to icon-nwp, so they may reuse it.
- Spack monthly will be run on 7th September.
Participants: Xavier, Dominic
- Switches address scheme to 'ssh://'. #526
- Progress update on v0.18.1
- Update from discussion with CSCS about Balfrin/Tasna
- CSCS will provide software stacks via spack, either with a spack upstream or a config file
- We're designing ways to work with binary mirrors and environments
- Automated merge #494
- Weekly plan collects configs on Dom and pushes to a branch. Then manual merge happens.
- spack jenkins plan runs on Dom, concats the yamls, runs tests for icon, cosmo, int2lm, and upon success pushes to branch.
- #494 would make merging to master automatic.
- We decided to merge by hand, to keep the safety of a human in place. Everything else can be automatized, like creating the PR.
- Serialbox' new package was merged #486
- The new version is now very close to the official package
- Documentation for icon
- Xavier updating icon-cscs to icon-nwp, and also the tests in test_spack.py
- Spack environments
- To keep users from getting updates of underlying packages they should start using environments.
- We're going to introduce one into ICON and collaborate with Jonas to test it.
- The new workflow could be introduced with a Daint upgrade. Possibly the one in September.
- Parallel PR tests
- When testing spack PRs in parallel (mostly on Daint), we get "Error: Timed out waiting for a write lock.". We suspect this has something to do with
db_lock_timeout: 20
in config.yaml. We try to set it to default, 120. If this doesn't help, Jonas implemented an approach to randomly delay tests. It surely helps.
- When testing spack PRs in parallel (mostly on Daint), we get "Error: Timed out waiting for a write lock.". We suspect this has something to do with
- Spack tests
- There's
spack test run <spec>
which would run tests on installed packages. We could do this for cosmo by caching the files needed for testing, namely "repo/test" and "repo/ACC/test". The caching mechanism is from spack itself. Dominic is doing a short test with this, it should be rather simple to implement. It would solve #238 and maybe help with the parallel tests.
- There's
- Binary mirror
- No noteworthy progress yet.
- Binary mirror
- Dominic is setting up a prototype
- Serialbox
- According to Hannes Serialbox can be compiled with gcc and with nvhpc
- For now we decided to compile everything with one compiler to mitigate the problem
- Making the Fortran unit tests run a different problem
- We should loop Serialbox through Hannes
- Issues
- Done
Minutes:
- Icon + Spack
- if --ignore-dependencies fails, send mail and recompile without --ignore-dependencies
- Piz Daint upgrade
- Well done, Jonas!
Minutes:
- Serialbox
- Packages have hard-coded compilers. This was needed for a shortcoming of spack. Environments should solve this.
- This is incompatible with automatically detecting stuff on Dom.
- We have a PR that makes it work with nvhpc, not on Tsa where there is pgi.
- Icon, Cosmo and Dycore depend on Serialbox.
- Jonas will prepare something that can be tested on Tsa. Pgi compilation of Serialbox on Tsa.
- Icon status on spack
- +dace and +rttov are already a PR
- Srun on Daint
- postponed
- Upcoming Daint upgrade
- We need a dedicated meeting for the problem with Serialbox
- We need a meeting shortly before and after the upgrade
Minutes:
- Update Dom testing
- key things remain manually (config yaml file containing cuda compiler and cmake%GCC)
- update in spack is comming in soon
- Srun on Daint
- Dominic has on it's todo list to ask CSCS how we should do CI/CD on compute nodes if there's no git on them.
- Muttler and Manali are the vClusters on Alps, we need to manage how close they are to one another.
- Bugfix: cosmo tests didn't run with cppdycore
- cause: Python arg/list problem
- Add and Revert "Removes 'CSCS_APPS_PATH'"
- cause: Icon PR-test deactivated on Tsa
- Srun one level down
- level in the call stack
- launch jenkins [--upstream] [--exclusive] [--tsa] [--daint] ...
- archiveArtifacts artifacts: '*.log', allowEmptyArchive: true
- Adds cosmo dycore tests
- self.Srun('spack install cosmo-dycore ...')
- self.Run('spack install --test=root cosmo-dycore ...')
- Adds more icon tests
- 'spack install icon@nwp ...' cpu+gpu Tsa+Daint
- 'spack -i dev-build ...' cpu+gpu Tsa+Daint
- Icon dev-build -i in the Quick Start
- prevents us from wasting compute resources, but offloads testing the cache to the users
- Dominic proposes to have a machine check for the cache and then remove the -i from the Quick Start.
- If the cache is missing we are informed asap and the users can continue to work, slower though.
- Postponed to next meeting with Xavier and Elsa.
- Add 'srun' for Daint
- prepend 'spack fetch' to every test
- Is it worth it?
- Dominic discusses this with CSCS. (Don't use login nodes to compile and don't use git on compute nodes, doesn't work)
- jasper linking for int2lm and cosmo
- why isn't jasper a dep, but we link against ljasper
- linking problems in container
- Jonas is going to play around with static linking and linker options.
- How does MCH plan to test icon exclaim?
- MCH plans to have a GitHub mirror of icon-NWP
- Once this is done, MCH is going to add a Jenkins plan for it
- The prob-test is the test suite
- This needs a meeting with Jonas, Michael, David, Will, Carlos, Xavier, Mauro, CSCS. (Carlos will start that)
- Failing dev-build of cosmo
- Dominic will review, Xavier will estimate.
Minutes:
- Daint upgrade
- The root problem was in the modules. Prg-env nvidia is incomplete. A library runpath points to inexisting. Fixing it will involve cray.
- We need a ticket to make information public, also for Will to read up.
- runpath has priority over rpath
- The ticket that requested configs from CSCS was answered in Feb.
- Status Quo
- pgi 21.3 is correct and used for icon
- pgi 21.5 is broken but works for cosmo
- Responsibilities:
- C2SM takes care of Daint.
- MCH takes care of Tsa, and later Alps and Cosmo on Alps.
- Who tests Icon on Dom?
- We need a meeting with Carlos, Xavier and Will for this. Carlos will start this and figure out who's responsible for what and negotiate enough time to test ahead for upgrades.
- Spack testing:
- Dominic will figure out a way how to test the workflow with "spack load"
Minutes:
- PR #377 FIX: v0.17.0 to v0.17.1
- Release notes
- Let's postpone it to next meeting!
- Version detection #371 (Users need to adapt
spack installcosmo cosmo@
apn_
5.08.mch.1.0.p3%pgi cosmo_target=cpu ~cppdycore
)- Add comments on that in the package descriptions! (Dominic)
- Documentation rendering
- Ask Carlos if he puts in a veto!
- Merge it!
- Update to Clingo
- We run into a known problem of Clingo. Spack devs are working on it. So we can just wait.
- #364 should also solve this problem.
- Should we use spack develop environement instead of spack dev-build? #364
- Maybe, but for us it's not urgent.
Minutes:
- Slack channels for users to find help
- C2SM people ask for help through email support at c2sm
- There's a spack channel in ICON-GPU workspace (Jonas, Elasa and Xavier monitor this)
- If local support can't fix, or problem is big, we shall put it in a GitHub issues. We shall propagate issues to other channels when we think it's worth it.
- sort out jenkins plans (to prevent race conditions)
- We observed weird behavior, probably related to race conditions. Plans turn red in the night and green after restart. In the middle of a compilation files are not found.
- There's an option for plans to wait on other plans.
- Elsa recalls there's a plug-in for a graphical overview. She'll search for it and report in the #slack. (#374)
- Dominic opens an issue in spack-c2sm to report all issues with jenkins plans we think happen because of race conditions, so we have an overview of the impact they cause. (#375)
- Resumé of last update
- C2SM and APNC users had a good experience in general
- Jonas set up an isolated spack instance so they were able to run through the night.
- There's a jenkins plan to install all the dependencies to the admin spack instance and the build-bot was installing without creating dependencies, so it failed.
- New nvhpc compilers + icon, status
- "introduce new, move users, (eventually remove old)" should work.
- Elsa is working on it.
- PR "ADD: terra-standalone package #155" is over a year old
- Xavier will ping Jean-Marie about it.
- PR "remove osm #308" is getting old
- Dominic will ping Carlos about it.
Minutes:
- Icon spack
- The new claw driver is causing problems. There are two wrappers and buildbot is using both. PGI is the currently used one. Cosmo doesn't use claw on daint, because claw is not maintained. (Xavier and Elsa will investigate offline)
- Failing cosmo plan on daint
- cray libc is not compatible with pgi.
- More Jenkins workers? (re-discuss in 2 weeks)
- CSCS asked us to lower to 6.
- Are we lacking a communication channel with all users?
- Spack v0.17.0 (clean all installs and caches) #317
- Consistent naming for packages with several git-repos (migration) #339
- There is a mailing list at C2SM that reaches technical persons. There's a slack at C2SM where everyone is in except EMPA.
- To inform users we will post on MCH slack and use the mailing list of C2SM.
- Should we add
--test=root
to all install tests? #331- is maybe different in v0.17.0
- only put it where it fits not everywhere. cosmo + int2lm atm.
- Eccodes is only used implicit. We should therefore only test it implicitly, right? #332
- serialbox and claw has a similar behavior, but it's our software.
- Review, PR, add Dominic to all review -
- PR testing almost ready : move ahead, merge and communicate
- dependency issues on daint / ICON / buildbot
- On daint buildbot always rebuild claw, eccodes, serialbox, ...
- Elsa introduced architecture, but did not solve the issue
- force people not to rebuild anything
- add the -i in the icon build documentation
- long run implement --ignore-dependencies in devbuildcosmo
- pep8 formating : Dominic will review and merge if ok
- cosmo-eccodes-ressouce: we can make it public, Xavier will do it
- new spack version :
- version 0.17 working but only with the old concretizer . Issue with dependecies taking the wrong compiler, take the gcc for mpi.
- need to create a clean PR for this only
- run the new testing before merge, wait a week to integrate more tests. Need to test the release pipeline.
- do we need a devbuildicon ?
- Security Update issue Tsa update : change java path. Even small security update need to be tested. Need to run spack monthly when there is a security update.
- Need to have an option for the spack PR and spack Test to not use the upstream
- Spack monthly : no more backup
- What about import in package : use explicit list
- Spack update: new release, not tested yet
- Spack for ICON : config wrapper ready for spack on daint. Further plan to make it more configurable.
- New time for meeting : Tuesday 13:30
- Design testing: Using jenkins file (see slides Dominic). We go for this design. Spack dev should review the test list.
- Spack workflow, to load env , etc : try to merge doc from MCH and C2Sm regarindg spack usage. Suggestion use spack load before submitting the slurm script
- Spack upgrade : still waiting for new release with bug fix
- gcc is now default
- spack monthly triggers now spack daily (to avoid having no instance on a system)
- Spack and ICON
- only support ICON on tsa with spack, plan is to use it also for daint
- still hardcoded path in config script solution : remove CSCS_APPS_PATH
- spack PR test
- currently not working (using cosmo).
- Short term, change the plan to provide the full speck (remove the daint plan)
- Long term : use jenkins file and define work
- online compression for netcdf using zlib. Faster solution use zilb_ng : prepend LD path. Try as link (in cosmo) and run (in zlib_ng) dependency
- Hack module : keep until fix.
- dycore jenkins plan : will be activated
- spack migration to c2sm : all good so far. Documentation updated
- tsa os update : merge PR update when node is going offline , and launch daily spack on tsa at the time the nodes are re-booted
- cosmo serialization : it works, can generate data : need to add slurm argument. Go for it.
- building ICON on tsa with Spack : we can merge
- Admin rights Give admin to Jonas. Always change with PR. If not affecting production, just any review If affecting MCH production Carlos or Xavier should review
- Migration to C2SM organization : Monday 21.6, spack-c2sm Xavier: check implication or rename move if possible to have limited impact on the user in a first step.
- COSMO spack, dycore testing Dycore is not tested on daint. Serial is not working on daint. All plan use either installcosmo or devbuildcosmo.
-
Daint plan
-
Migration to C2SM
- Move to C2SM can be started
-
COSMO Jenkins not working
-
Serialization is broken
-
Discussion of bugfix for installcosmo/devbuildcosmo
- Run testsuite on Jenkins, with Spack branch
-
Admin instance, normal instance
- Admin instance for Jenkins user, installing into project -> installpath is set to project
- Put documentation on IO-pages,
- project is not writable on project, therefore testsuite could not run in admin-instance
-
spack load with cosmo issue and cosmo performance benchmarks spack load is not very reliable. Workaround is to serialize the environment in a bash script from python and load it when calling. xavier will try using directly a hash (we dont know that will change the behaviour). For now we go with workaround. Create a spack command that writes down the env.
-
update of spack version two files (indices, and something else) that should go in the cache (and not the instance). Spack moved them into the misc of the cache.
-
cosmo and dependency problems Giacomo got reviews from Carlos & Ben. Adding to the original idea the functionality to use spack.yaml format. Suggestion was also to change order of precedence to put the user version before the yaml. For the rest of comments, we can keep it as it is and improve it later.
-
spack config of daint (maintain and remove unnecessary modules) what can I delete ? try for now to extend what Victor provide, if that does not work Jonas will remove options.
-
spack on dom we have a config script that works. We need to add a new host in jenkins for the spack instance. Jonas will add it to jenkins to spack_daily and spack_monthly. Jonas will maintain dom & daint.
-
proposal for new documentation Jonas asking for feedback on the doc structure. We will have a look it
- cosmo dependencies use the internal spack yaml format from Giacomo for the rest : use Carlos PR proposal spack yaml configuration will be in the package. unified devbuildcosmo and installcosmo to use the yaml spec for the default dependencies (eccodes, claw) Note this will change the behaviour spack install cosmo - need to inform user to use installcosmo - if possible protect it with an error Implementation timeline : Giacomo will add spack yaml to Carlos PR.
- new task : add autodiscovery to the package we own
- doc page : merge after Jonas try to fix the typo
- review other issues Next meeting : 22.04.2021 13:30
- cray-libsci not compatible with pgi 20.1.1. We create an issue to investigate, we dont know yet where the problem is coming from.
- build dependencies with spack: we take one more week to investigate and decide.
- dom: work in progress. Jonas is working with CSCS to provide help and support for maintaining Dom
- Aim of the spack team, how do we work
- Share knowledge about spack and discuss issues
- Take design choices, and improve the implementation, architecture
- Make sure we don't grow spack organically
- Define priorities
- Regular meeting (every 2 weeks at the start), put point in the agenda ahead of time if possible - open to any user
- Need a coordinator : Xavier ( 3 month )
- Improve documentation
- Error can be reported on spack slack channels
- Issues should be open in the repo, priority will be managed with the project
- need a licence : MIT
- Collect documentation for spack
- Currently : C2SM wiki - MeteoSwiss APN devops - in spack MCH
- => Use github documentation page + Readme / without hand compilation : Carlos
- where do we put the repo
- move to MeteoSwiss : spack-XX or C2SM : spack-c2sm / C2SM org . Jonas will investigate.
- Cosmo releases
- short term solution : create cosmo-c2sm package
- create an issue, look for the best spack way : https://github.com/MeteoSwiss-APN/spack-mch/issues/180
- all : test and discuss the branch.
- note : the dycore problem will be the same with ICON - good solution required.
- we could contribute to spack
- spack on dom
- go for it, try to involve cscs in the maintenance of the system.
- Intro Spack is package management at APN-C, try to use it for every software with non-system dependencies (eccodes) We use a central instance - all user source it.
- no configuration errors from the user
- if something go wrong in the central instance all user are affected Currently issue with spack v0.16
- Contact user in case of issue/re-installation of the shared instance Currently on the general channel in MeteoSwiss. Jonas is the contact person.
- How to work/use spack
- try to share packages (do separate version) whenever possible
-
COSMO need to find a solution for the release
-
Spack admin
- Jonas.
- Testing Need a better testing strategy
- Regular meeting by weekly. Try Monday 13:30.