Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building with mpi+mpitrampoline still uses MPICH #245

Open
eschnett opened this issue May 20, 2022 · 14 comments
Open

Building with mpi+mpitrampoline still uses MPICH #245

eschnett opened this issue May 20, 2022 · 14 comments
Labels
bug 🐛 Something isn't working

Comments

@eschnett
Copy link
Contributor

I am trying to build openPMD_api with this command:

julia --color=yes build_tarballs.jl --debug --verbose --deploy=local x86_64-apple-darwin-julia_version+1.7.0-mpi+mpitrampoline

Although I am using mpi+mpitrampoline, I find that $prefix/include/mpi.h is from MPICH, not from MPItrampoline.

This is my build_tarballs.jl.

Is there a mistake in my script?

The same setup is working fine for the package ADIOS2. The main difference is that ADIOS2 does not depend on any other package that also depends on MPI. Maybe BinaryBuilder installs a wrong version of ADIOS2, one which depends on MPICH instead of MPItrampoline?

@giordano
Copy link
Member

@staticfloat @vchuravy ideas?

@giordano
Copy link
Member

giordano commented May 20, 2022

So, the environment used for the target platform is

[deps]
ADIOS2_jll = "44b1415a-b224-5c99-9168-ff4febb5f37f"
CompilerSupportLibraries_jll = "e66e0078-7015-5450-92f7-15fbd957f2ae"
LibUV_jll = "183b4373-6708-53ba-ad28-60e28bb38547"
LibUnwind_jll = "745a5e78-f969-53e9-954f-d19f2f74f4e3"
MPIPreferences = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"
MPItrampoline_jll = "f1f71cc9-e9ae-5b93-9b94-4fe0e1ad3748"
Zlib_jll = "83775a58-1f1d-513f-b197-d71354ab007a"
libcxxwrap_julia_jll = "3eaa8342-bff7-56a5-9981-c04077f7cee7"
libjulia_jll = "5ad3ddd2-0711-543a-b040-befd59781bbf"

My understanding is that the file ADIOS2_jll/.pkg/platform_augmentation.jl is never actually loaded, so we don't know how to augment the platform and we basically install a random MPI implementation. Even more worrying, also OpenMPI is in the manifest of the environment and gets automatically installed (but conflicts with MPICH_jll, which accidentally was installed first).

I believe we need a mechanism to properly filter the dependencies. This is the first time we're venturing into this zone with the augmented platforms, it's a new territory for everyone.

@giordano giordano transferred this issue from JuliaPackaging/Yggdrasil May 22, 2022
@giordano
Copy link
Member

giordano commented May 22, 2022

I think the fundamental issue is that in

# Ony Julia v1.6, `Pkg.add()` doesn't mutate `dependencies`, so we can't use the `UUID`
# that was found during resolution there. Instead, we'll make use of `ctx.env` to figure
# out the UUIDs of all our packages.
dependency_uuids = Set([uuid for (uuid, pkg) in ctx.env.manifest if pkg.name dependencies_names])
# Some JLLs are also standard libraries that may be present in the manifest because
# they were pulled by other stdlibs (e.g. through dependence on `Pkg`), not beacuse
# they were actually required for this package. Filter them out if they're present
# in the manifest but aren't direct dependencies or dependencies of other JLLS.
installed_jll_uuids = collect_jll_uuids(ctx.env.manifest, dependency_uuids)
installed_jlls = [
Pkg.Types.PackageSpec(;
name=pkg.name,
uuid,
tree_hash=pkg.tree_hash,
path=pkg.path,
) for (uuid, pkg) in ctx.env.manifest if uuid installed_jll_uuids
]
we blindly collect all the dependencies in the manifest. We have no mechanism at the moment to take some of them out. With the platform keyword argument to the *Dependency structs we can remove dependencies from the project on a per-platform basis, which is great, but these per-platform relations aren't propagated through (this is also noted with a big warning box in the documentation), so if even if they don't appear at top level of the project, they may still be indirect dependencies of other dependencies.

To try and address this issue, I had a terrible idea: define a new ExcludeDependency(...; platforms=...) struct which has the sole purpose of recording that a dependency should be removed from installed_jlls from the snippet above, for the given platforms. So for example we could have something like ExcludeDependency("OpenMPI_jll"; platforms=filter(p -> p["mpi"] != "openmpi", platforms)) to say "do not even try to install OpenMPI_jll in the prefix if p["mpi"] != "openmpi"". This feels like a horrible hack for a deficiency of our packages resolution, but I believe this is the kind of thing we want to do to get this right. Another, simpler, approach could be to do a setdiff of all the dependencies and only those compatible with the current platform, with the assumption that if we didn't include them in the top-level project we probably don't want them at all, but I suspect there will be corner cases where this can break quite badly (for example if a dependency is listed multiple times, which is not impossible if it has different properties for different platforms). ExcludeDependency would basically do the same thing, but in an explicit (and overly verbose) way.

@giordano giordano added bug 🐛 Something isn't working and removed bug 🐛 labels May 22, 2022
@eschnett
Copy link
Contributor Author

That sounds like an awkward approach. In principle, I think there shouldn't be a single, overall "set of dependencies", but rather each to-be-installed package specification should have its own set of dependencies.

Another approach would be to mimic the way in which architectures (such as libgfortran5) are handled. They need to be consistent across all packages. A setting mpi+mpitrampoline would likewise need to be consistent across all installed packages.

To avoid the "installing MPICH and OpenMPI at the same time" problem, we can further require that the settings mpi+mpich and mpi+mpitrampoline cannot be installed at the same time. That could be done ad-hoc, by calling build_tarballs multiple times from the script, or by running the whole build_tarballs.jl script multiple times.

@eschnett
Copy link
Contributor Author

Another observation: Using mpi+mpitrampoline works for building ADIOS2_jll. It only fails for openPMD_api_jll, which has ADIOS2_jll as dependency. This indicates that the failure mode has to do with other Yggdrasil packages that depend on mpi+trampoline as well.

Is there a way to see how the ADIOS2 dependency is resolved? Does it correctly pick ADIOS2-mpi+mpitrampoline? Or might it accidentally pick ADIOS2-mpi+mpich?

@giordano
Copy link
Member

I think @staticfloat can answer that, the artifact resolution happens in Pkg and I don't follow very well the code there. Also, my feeling is that the .pkg/platform_augmentation.jl file isn't actually used, which would be useful to choose the right artifact.

@staticfloat
Copy link
Member

Alright, so I looked into this, and unfortunately what you're running up against is Pkg's inability for conditional dependencies. When you build ADIOS2_jll, you conditionally add dependencies on a per-platform basis, but unfortunately, when installing things via Pkg, we only have a single Project.toml and every JLL listed there gets installed.

Why isn't this a problem with other per-platform dependencies like the Xorg_* JLLs? Because in those cases, we have mutually exclusive platform mappings. E.g. it doesn't matter that Xorg_libxcb isn't available on macOS but is installed during builds of GTK3_jll; because there's no artifact mapping for Xorg_libxcb on any macOS platforms, so nothing gets installed.

MPICH has artifacts without an MPI tag though, which means that when building for x86_64-linux-gnu-mpi+trampoline, it will still install the MPICH files. I think the easiest way to fix this is to build MPICH to actually have the mpi+mpich tag, so that it only gets installed when explicitly requesting that platform. This is how we can "coordinate" installation across JLLs.

@vchuravy
Copy link
Member

vchuravy commented Jun 1, 2022

... @staticfloat we really need to get t-shirts with: "If at first it doesn't work, try again harder"

But yes I agree that's the solution here.

@eschnett
Copy link
Contributor Author

eschnett commented Jun 1, 2022

@staticfloat Thanks. Is this something we can do right now, modifying MPICH_jll? Or would this break things? Would MPI.jl have to choose the mpi+mpich version by default when declaring its dependencies?

@staticfloat
Copy link
Member

Yes, you can do it right now. I don’t THINK you need to do anything complicated; you don’t even need to add a platform augmentation hook I don’t think, as if the host platform lacks one, it should match the only valid platform in the artifacts toml file.

@giordano
Copy link
Member

giordano commented Jun 2, 2022

I'm a bit concerned though: does it mean that we need to add extra platform tags to all dependencies of such builds? That doesn't look very sustainable.

@staticfloat
Copy link
Member

I thought the whole point of mpitrampoline was that dependencies depend on that and not on a particular implementation?

@eschnett
Copy link
Contributor Author

eschnett commented Jun 2, 2022

Couldn't the platform tags be added automatically?

@eschnett
Copy link
Contributor Author

eschnett commented Jun 2, 2022

My hope is that these platform tags are necessary now to ensure a smooth transition from the current state to MPItrampoline. After some time, everything in Yggdrasil would be MPItrampoline. (So my hope.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants