Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC — Towards a new solver for libmamba #3387

Open
24 tasks
jjerphan opened this issue Aug 2, 2024 · 4 comments
Open
24 tasks

RFC — Towards a new solver for libmamba #3387

jjerphan opened this issue Aug 2, 2024 · 4 comments

Comments

@jjerphan
Copy link
Member

jjerphan commented Aug 2, 2024

Context

libsolv has been used since the creation of mamba.

While libsolv is a fundamental dependency of libmamba, its maintenance comes with challenges for mamba but also for conda-forge generally (for instance, see conda-forge/libsolv-feedstock#92 and conda-forge/openmpi-feedstock#153).

Note

For more context, see those notes.

Proposed solution

  • Redesign part of the architecture of mamba so that an alternative solver can be used
  • Introduce mamba-org/resolvo which, from a socio-technical perspective, seems to be the most relevant candidate.

Based on a POC, which shows that resolvo have more or less the same performance as libsolv.

Proposed plan

Note

This plan gives the general direction but is not exhaustive. It might evolve overtime.

Architecture redesign: inheriting classes

Motivation:

  • Solver and DataBase are exposed to Python thanks to pybind11
  • we could have a common interface and use std::variant but pybind11 does not work properly with std::variant unfortunately
  • pybind11 correctly handles inheritance

Schematically, currently we have:

before

Then we would have:
after

In this order:

  • move some abstractions from mamba::solver::libsolv to mamba::solver, namely:
  • create solver::{Solver,DataBase} base classes
  • adapt libmamba's internals so that they use solver::{Solver,DataBase}
  • adapt (micro)mamba's internals (especially the install codepath first)
  • introduce final (a priori named) solver::resolvo::{Solver,DataBase} classes
    • introduce the solver::resolvo::DataBase class (WIP in jjerphan/mamba:alt-solver branch)
      • take the currently installed package into account
      • adapt resolvo's result in to mamba::solver::Solution
    • introduce the solver::resolvo::Solver class
    • adapt python bindings, either:
      • introduce libmambapy.bindings.solver.resolvo.{Solver,DataBase}
      • introduce libmambapy.bindings.solver.{Solver,DataBase} and expose an option to select "libsolv" or "resolvo"
  • document all the changes for libmamba and libmambapy public APIs, especially changes in UX and deprecated import paths

Alternative solver

@Hind-M
Copy link
Member

Hind-M commented Aug 2, 2024

The general plan seems legit broadly speaking.

  • Regarding the Database, we could use a base class as you suggested, or try and isolate the parts specific to libsolv in a specific class to use inside Database (because resolvo's API could be very different) but at least Database is already connected to the rest of mamba and hopefully we won't need to change much.
  • Instead of moving everything in mamba::solver::libsolv to mamba::solver, we may want to leave it as it is and try in the same way to isolate what's related to libsolv and what would be common with resolvo.
    Unsolvable for example seems to be tightly linked with libsolv internals so that would be a class specific to libsolv and we should maybe wrap it into another one alongside something else for resolvo (we may need to define its API first to know what to do, I don't know if that work has been done yet).
    So maybe just a renaming would be enough to make it clear it's libsolv specific, or actually keep the same names and just have two namespaces libsolv and resolvo
  • Same for repo_info
  • parameters seem general enough, but maybe some classes/structs are specific to libsolv (no equivalent in resolvo, semantically speaking)
  • Why detemplate Database::add_repo_from_packages? It seems ok to me

@jjerphan
Copy link
Member Author

jjerphan commented Aug 2, 2024

Regarding the Database, we could use a base class as you suggested, or try and isolate the parts specific to libsolv in a specific class to use inside Database (because resolvo's API could be very different) but at least Database is already connected to the rest of mamba and hopefully we won't need to change much.

I should have mentioned that resolvo's public API exposes a DependencyProvider (similar to a Database) and a Solver. Based on the initial POC, I am quite confident that we could have similar abstractions for resolvo.

Most of the design points are mostly based on the assumption that introducing bases classes is the right pathway and that the current other abstractions than Database and Solver can be decoupled from libsolv, but we need to validate that this is the right approach (I am pretty sure there are other designs which would avoid having a set of base classes and solver-specific sets of concrete classes).


Edit: For alternatives, I wonder whether template classes with a common template type parameter for the solver could help.

@JohanMabille
Copy link
Member

The overall plan looks good, I have some minor concerns, partially because I don't know very well this part of mamba (so some of my concerns might be irrelevant or redundant):

  • From what I remember from our discussions with Antoine, everything in the solver::libsolv namespace was very specific to libsolv, does Resolvo have exactly the same representations for RepoInfo, parameters and unsolvable ? (more or less same question as Hind).
  • Depending on the answer to the previous question, it might be easier / more efficient to identify a common API from the bottom, i.e. have both implementations ready (even if not plugged yet) and figure out what is common rather than from the top (i.e. decide a public API, and then figure out how to adapt the existing implementations). You might have already done that work, so do not hesitate to drop this if you're confident in the APIs.
  • The constraints on DataBase / RepoInfo / Solver must be encoded somehow. My understanding is that Resolvo accepts different implementations of Database, but are the DataBase and the RepoInfo totally decoupled? Same question for RepoInfo and Solver
  • Why do you need to detemplate add_repo_from_packages ? I guess it's because this will become a virtual function? Having it as a template brings flexibility (instead of imposing to work with std::vector) and will allow to work with ranges when we migrate to C++20.
  • It might be interesting to have a factory for building the different implementations. This way you only need to bind the base classes, the factory function and an enum to Python

@jjerphan
Copy link
Member Author

jjerphan commented Sep 9, 2024

  1. IIRC, resolvo does not have the equivalent of a RepoInfo
  2. This was started on this branch: https://github.com/jjerphan/mamba/tree/alt-solver
  3. Resolvo exposes a DependencyProvider, which can be implemented to have something close to a DataBase but the RepoInfo isn't actually totally decoupled. My intuition is that it will be decoupleable.
  4. For virtuality, yes. Migrating to C++20 is not trivial because bits of the std have been removed in this last version of C++.
  5. Yes, we need to think about this together. I am wondering how to best adapt the Python API, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants