-
Notifications
You must be signed in to change notification settings - Fork 1
Level Setting: What are we running now? #1
Comments
What I'm looking for is:
|
IBM CIIBM CI machines are located behind a firewall. As a result, our backend Jenkins service polls GitHub about every 5 minutes for changes to PRs. After a test finishes running we have custom scripts that push the results back as a Gist for the community to see. Our Open MPI testing is not too specialized at this point. We do have a "virtual cluster" capability available to the community that can scale to 254 nodes on demand. We currently limit the community to 160 nodes, but that can be adjusted. Platform
Configure optionsWe run three concurrent builds. Currently, PGI is disabled but will be re-enabled soon.
Build options
Other build types
Tests runWe run across 10 machines with the GNU build, and 2 machines with the other builds. The goal of this testing is to:
We run the following tests:
All run like:
TimingA successful run through CI will take about 12-15 minutes. Most of that is building OMPI.
|
Mellanox Open MPI CIScopeMellanox Open MPI CI is intended to verify Open MPI with recent Mellanox SW components (Mellanox OFED, UCX and other HPC-X components) in the Mellanox lab environment. CI is managed by Azure Pipelines service. Mellanox Open MPI CI includes:
Related materials:
Platform
CI ScenariosConfigure optionsSpecific configure options (combinations may be used):
Build options
Build scenario:
Tests runSanity tests (over UCX/HCOLL):
TimingCI takes ~18-20 min. (mostly Open MPI building). |
Thanks @artemry-mlnx for that information. Do you test with oshmem as well?
Should we be running I'm going to be out for a week, but don't let that stop progress. |
Yes! But do it in parallel to other CI jobs, because
|
Currently, our testing includes mtt with EFA and TCP. This tests v2.x, v3.0.x, v3.1.x, v4.0.x, and master. These are the configure options: --oversubscribe --enable-picky --enable-debug --enable-mpirun-prefix-by-default --disable-dlopen --enable-io-romio --enable-mca-no-build=io-ompio,common-ompio,sharedfp,fbtl,fcoll CC=xlc_r CXX=xlC_r FC=xlf_r --with-ofi=/opt/amazon/efa/ CFLAGS=-pipe --enable-picky --enable-debug --with-ofi=/opt/amazon/efa/ --enable-static --disable-shared In our nightly, canary, and CI tests for libfabric, we always only use Open MPI 4.0.2 (Soon to be switched to 4.0.3). We use the release versions rather than pulling from the GitHub branch directly. These tests mainly run on our network optimized instance types, such as the c5n model types - https://aws.amazon.com/ec2/instance-types/ |
During the Open MPI face-to-face we discussed moving some of the CI checks to AWS which can harness some parallel instances to speed things up. Then each organization can focus on testing special configurations in their environments.
To start this discussion I'd like to see what tests various organizations are running in their CI. Once we have the list then we can work on removing duplicate efforts. We can use this repo as we need to help facilitate this coordination.
Please reply with a comment listing what you are testing now.
The text was updated successfully, but these errors were encountered: