-
Notifications
You must be signed in to change notification settings - Fork 106
Community calls
Vasileios Karakasis edited this page Jul 13, 2022
·
45 revisions
This page holds (temporarily) the agenda and minutes of the bi-weekly community conference calls.
Call skipped due to low participance.
- Frequency of this meeting
- Suggestion to reduce to monthly.
- ReFrame repository has been moved to https://github.com/reframe-hpc/reframe
- The CSCS checks have been separated from the main repo, in https://github.com/eth-cscs/cscs-reframe-tests
- Gpu Burn test is now a library test, as of #2503
- ReFrame 3.11.2 released (Release Notes)
- ReFrame 3.12.0 released.
- Latest features and bugfixes:
- Allow setting fixture variables from the command line #2515
- Add
--mode
option to GitLab CI pipeline command #2514 - Check that PBS output is written back to working directory before setting the job as completed #2519
- Working on making tests container-runtime agnostic #2396
- Performance improvements in test case generation #2544
- Upcoming plans (https://github.com/reframe-hpc/reframe/milestone/81):
- Move towards ReFrame 4.0 (Tentative backlog)
- Support more flexible ways of configuration #1725
- Convert more CSCS tests to library tests
- Vasileios Karakasis (CSCS)
- Kenneth Hoste (HPC-UGent)
- Victor Holanda (CSCS)
- Carlos Rosales-Fernandez (AWS)
- Theofilos Manitaras (CSCS)
Development updates
-
ReFrame 3.11.0 released on April 13.
-
Key new features:
- New
--distribute
option that allows distributing single-node jobs over a set of nodes. It can also be combined with the-J
option, for example to submit jobs to fill a reservation:--distribute=all -J reservation=cool
. The current valid partition is always taken into account. - Extended syntax for
valid_systems
andvalid_prog_environs
that allows selecting systems and environments based on features and properties. - New
CustomBuild
build backend that delegates the building of test code entirely to users. If you use it, be aware of the side effects of your build scripts! - Explicitly mark variables and parameters as loggable.
- New library tests merged in.
- New
-
Future directions:
- Move the repo out of
eth-cscs
domain and separate the CSCS tests. - Continue work on test libraries
- Backlog for 3.12 (tentative): https://github.com/eth-cscs/reframe/projects/36
- Move the repo out of
- Set up a separate meeting with EESSI community on defining common systems/environment properties and features (Victor will make a Doodle and post it in the #confcalls channel).
- Vasileios Karakasis (CSCS)
- Ake Sandgren (UMEA)
- Rafael Sarmiento (CSCS)
- Eirini Koutsaniti (CSCS)
- Theofilos Manitaras (CSCS)
- Carlos Rosales Fernandez (Amazon)
Developments updates
- OSU microbenchmarks as a library tests are merged
- Almost done with the extended syntax of
valid_systems
andvalid_prog_environs
(https://github.com/eth-cscs/reframe/pull/2479)- We had to reimplement how valid systems/environments are selected in order to make it work with fixtures
- The implementation fixes also the bug with
--skip-{system|prgenv}-check
options when using fixtures.
- Still WIP: Distributing a set of tests over multiple nodes (https://github.com/eth-cscs/reframe/pull/2458)
- v3.11.0 is planned for Wed. 13/4, since we need to have the two major features above merged.
- April 19 call will be skipped.
- Ake: When do you plan to split the repo and the site-specific tests?
- We do plan to focus on it as soon as 3.11.0 is out.
- Vasileios Karakasis (CSCS)
- Victor Holanda (CSCS)
- Theofilos Manitaras (CSCS)
- Simon Bradford (Univ. Birmingham)
- We will delay 3.11.0 for two weeks (work got stuck due to limited availability of the team), but an rc release will be done today.
- Draft PRs
- Syntax extensions for
valid_systems
andvalid_prog_environs
: https://github.com/eth-cscs/reframe/pull/2479 - OSU microbenchmarks library test (https://github.com/eth-cscs/reframe/pull/2421)
- Still requires a bit of fine tuning, but it will soon be ready to merge.
- Generating node-pinned tests (https://github.com/eth-cscs/reframe/pull/2458)
- We needed to address some limitations on how we can dynamically generate tests
- https://github.com/eth-cscs/reframe/pull/2470
- https://github.com/eth-cscs/reframe/pull/2474
- Syntax extensions for
- Vasileios Karakasis (CSCS)
- Theofilos Manitaras (CSCS)
- Eirini Koutsaniti (CSCS)
- Jg Piccinali (CSCS)
- Kenneth Hoste (HPC-UGent)
- Åke Sandgren (Umeå Univ)
- Rafael Sarmiento (CSCS)
- Carlos Rosales (Amazon)
- Richard Henwood (Arm)
- Simon Branford (Univ. of Birmingham)
- We will skip 3.10.2 and target 3.11.0 for March 22; two dev releases in-between.
- Bug fixes
- Fixed weird behaviour when overriding hooks within the same test (https://github.com/eth-cscs/reframe/pull/2436)
- Fixed sub-configuration selection when running tests (https://github.com/eth-cscs/reframe/pull/2438)
- Do not set up Spack shell support (https://github.com/eth-cscs/reframe/pull/2424)
- Enhancements
- Control which attributes, variables or parameters can be logged (https://github.com/eth-cscs/reframe/pull/2428); current behaviour can cause problems with Logstash and lose records.
- Remove pipeline timings from output.
- OSU library test and the associated CSCS tests PR (under review): https://github.com/eth-cscs/reframe/pull/2421
- Next sprint: https://github.com/eth-cscs/reframe/milestone/76
- Bug fixes
- Community feedback
-
Extension of the
valid_systems
andvalid_prog_environs
syntax is still work in progress. What if we supported basic compiler abstractions as in Spack here?- Vasileios: There are no plans for compiler auto-detection and auto-generation of the
environments
configuration section. - Kenneth: this could quickly become a time-consuming task, since also compiler versions, etc. are relevant
- Kenneth: this seems like an opportunity for a common Python library that could be leveraged by ReFrame, Spack, EasyBuild, ...
- kind of similar to
archspec
(cfr.-mtune
& co options thatarchspec
knows about, but compiler flags for OpenMP is out-of-scope there...
- kind of similar to
- Richard: Delegate the compilation task fully onto Spack and use the compiler info to generate the ReFrame config on-the-fly. Then ReFrame tests are monkey-patched to parametrise them over the various specs.
- Vasileios: There are no plans for compiler auto-detection and auto-generation of the
- Use cases of running a test session continuously until a time limit is reached: https://github.com/eth-cscs/reframe/issues/619
- could be used for burn-in testing, simulate user workload, ...
- also related to exploring range of combinations for multi-node tests, since often not enough tests are generated to actually fill a system
-
Extension of the
- Meeting frequency
- AOB
- Vasileios Karakasis (CSCS)
- Victor Holanda (CSCS)
- Theofilos Manitaras (CSCS)
- Jg Piccinali (CSCS)
- Stefan Wolfsheimer (SURF0
- Kenneth Hoste (HPC-UGent)
- Åke Sandgren (Umeå Univ.)
- Ben Fulton (Indiana Univ.)
- Caspar van Leeuwen (SURF)
- Rafael Sarmiento (CSCS)
- Carlos Rosales (Amazon)
- Development updates
- ReFrame 3.10.0 is out: https://github.com/eth-cscs/reframe/releases/tag/v3.10.0
- ReFrame 3.10.1 planned for today: https://github.com/eth-cscs/reframe/milestone/74?closed=1
- Next sprint: https://github.com/eth-cscs/reframe/milestone/75
- Added new labels to tag each issue with the framework part it refers to
- We plan to migrate the repo under
github.com/reframe-hpc
.
- Community feedback on use cases
- Do you use or plan to use ReFrame to test and deploy software stack, e.g., using Spack/EasyBuild?
- Feedback: This is an interesting feature for both Spack and EasyBuild for exploring different build configurations, but it's not likely to be used for deploying the software stack.
- Towards relaxing
valid_systems
andvalid_prog_environs
: https://github.com/eth-cscs/reframe/issues/1987- Key challenge here is to integrate also the
resources
that can be defined in the configuration, which are accessed now throughextra_rerources
inside the test. - There are three types of system-related attributes: features, key/value properties and scheduler resources.
- Key challenge here is to integrate also the
- Submit single node job automatically on every node of a reframe partition: https://github.com/eth-cscs/reframe/issues/2334
- would be very useful to find "bad nodes" in a given reservation
- automatically submit a separate copy of a test to each node
- for now, nothing combinatorial (explodes quickly after 2 nodes...)
- combinatorial combos could be pick N out of M possibilities at random, or strided throughout set of 100 nodes (1-10, 11-20, etc.)
- selection mechanism is really needed when running 16-node tests out of 100 available nodes
- Caspar: could tests somehow indicate that they want to use flexible allocation?
- example: gpuburn to check thermal throtlling of GPUs ("hardware test")
- tests that aim to validate working software are probably less interesting to run with flexible allocation
- idea:
--flex-alloc-singlenode=idle:testXYZ,testABC
=> only run these 2 specific single node tests across all nodes
- Theo: Should the tests in such scenario share a single-stage directory so as to avoid redundant builds?
- Åke: This case should be addressed by fixtures, where the build part of the test is a fixture and you only dynamically parametrise the run test.
- Do you use or plan to use ReFrame to test and deploy software stack, e.g., using Spack/EasyBuild?
- Maintenance of scheduler backends
- AOB
- Welcome and introductions
- Briefly introduce yourself and where are you using (or planning to use) ReFrame?
- Development status
- Team & contributions
- Core team (@ekouts, @rsarm, @teojgo, @vkarak, @victorusu)
- Contributions are more than welcome!
- Development model
- Release train model: A new release every two weeks; releases are not delayed; whatever is ready and merged gets released
- Semantic versioning:
<major>.<minor>.<patch>
- Patch-level bumps (every two weeks): bug fixes and new features (no deprecations)
- Minor version bumps (every 6–8 weeks): introduction of major features (deprecations are allowed, but backward compatibility is ensured)
- Major version bumps: backward compatibility may be broken.
- Upcoming major features scheduled for 3.10.
- Asynchronous builds (https://github.com/eth-cscs/reframe/pull/2194)
- New test naming scheme (https://github.com/eth-cscs/reframe/pull/2355)
- Team & contributions
- Outlook for HPC Test library
- Proof-of-concept in
hpctestlib/
(documentation: https://reframe-hpc.readthedocs.io/en/stable/hpctestlib.html) - Continue with creating library tests from our microbenchmarks
- Still unclear: community contributions, library location (different repo?), moving to stable
- Proof-of-concept in
- Discuss issues that need resolution (feature requests, bugs)
- Discuss interesting use cases