Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support building from sdist in a build action #2410

Open
5 tasks
aignas opened this issue Nov 14, 2024 · 10 comments
Open
5 tasks

Support building from sdist in a build action #2410

aignas opened this issue Nov 14, 2024 · 10 comments

Comments

@aignas
Copy link
Collaborator

aignas commented Nov 14, 2024

Support building from sdist as a build action instead of repository_rule.

Things needed:

Initial thoughts:

  • It would be good to create an sdist_archive rule, which would download a given sdist by URL and then would parse pyproject.toml. If the pyproject.toml is not found or it is dynamic and requires Python to correctly infer the dependencies, we may need to fail at least in the first iteration.
  • Using Python in the repository context is OK, but adds a lot of complication in the long run - we should attempt to limit its usage as much as possible.

At the moment I don't have time to work on this myself, but writing it down so that an issue for PEP621 based sdist building exists. Feel free to add thoughts here on possible design.

@aignas
Copy link
Collaborator Author

aignas commented Dec 2, 2024

Might be related: #1103

@groodt
Copy link
Collaborator

groodt commented Dec 3, 2024

I think they're quite different if I read this issue.

I think this issue title is misleading, but you can confirm what you mean. 😊

1103 is for package publishers. It's for teams who want to use bazel as a build backend to build a wheel or sdist. Similar to how meson, poetry, flit are build backends.

I read this is issue as an enabler for package consumers. Basically a mechanism that is an improvement to patching. I read this issue as a way to prepare a "build environment" for difficult packages and being able to run with "--no-build-isolation" so that sdist (packages without wheels) can be more easily imported into the Bazel build.

@aignas
Copy link
Collaborator Author

aignas commented Dec 3, 2024

Ah yes, you are correct here.

@groodt
Copy link
Collaborator

groodt commented Dec 7, 2024

I think we should then consider removing PEP 621 from the title. pyproject.toml is only a standard for storing metadata.

@aignas
Copy link
Collaborator Author

aignas commented Dec 7, 2024

I would like to restrict us looking only at sdists that comply with PEP621. We will need to somehow get the dependencies for the sdist in the loading phase and then building of the sdist in the build phase can happen correctly. Doing this for sdists that do not comply with PEP621 might be trickier.

WDYT?

@groodt
Copy link
Collaborator

groodt commented Dec 8, 2024

My 2c:

I love the intention of making "sdist" handling better. Even isolating them into the naughty corner, makes things better for users, because it highlights what is going on when there are failing builds.

If you are applying those constraints to the implementation, then it's still unrelated to PEP-621. PEP-621 is purely about storing metadata in pyproject.toml. If you want to limit the implementation to only supporting the dependencies in the [build-system] table, then it's still not related to PEP-621 to be honest. That's PEP-518.

But overall, I think that explicitly not supporting legacy setuptools based builds makes sense. Modern setuptools builds do provide a build-system in pyproject.toml

Features that I think I'd like to see from an "sdist" building facility:

  • I'd like to be able to supply my own list of locked build dependencies per sdist. The build dependencies of every sdist are typically floating
  • Similar to the first item, I'd like to provide a named "build environment" so that I can run builds with --no-isolation. This lets me have a few build environments I can use across sdists
  • I'd probably want to use build for building wheels
  • I'd like the build dependencies to be locked
  • I'd like an option for my bazel build to fail if there aren't declared settings for every sdist. It should list out the sdist names that I need to provide an environment for

@aignas
Copy link
Collaborator Author

aignas commented Dec 8, 2024

If you are applying those constraints to the implementation, then it's still unrelated to PEP-621. PEP-621 is purely about storing metadata in pyproject.toml. If you want to limit the implementation to only supporting the dependencies in the [build-system] table, then it's still not related to PEP-621 to be honest. That's PEP-518.

But overall, I think that explicitly not supporting legacy setuptools based builds makes sense. Modern setuptools builds do provide a build-system in pyproject.toml

OK, I think I see what you mean here, I'll remove the PEP621 from the title.

Features that I think I'd like to see from an "sdist" building facility:

* I'd like to be able to supply my own list of locked build dependencies per sdist. The build dependencies of every sdist are typically floating

* Similar to the first item, I'd like to provide a named "build environment" so that I can run builds with `--no-isolation`. This lets me have a few build environments I can use across sdists

* I'd probably want to use [build](https://build.pypa.io/en/stable/) for building wheels

* I'd like the build dependencies to be locked

* I'd like an option for my bazel build to fail if there aren't declared settings for every sdist. It should list out the sdist names that I need to provide an environment for

I was thinking about how we could support users injecting the env into the build process and I think we could implement this environment specification by creating a toolchain for building sdists.

  • Different environments for different sdists means different toolchain target settings.
  • The same toolchain type for building sdist means that there has to be a config setting somewhere where the sdist builder is using transitions to ensure that the right toolchain is resolved.
  • Initially we could use pip wheel to define the simplest toolchain that does similar things to what the current implementation does. The usage of build could be contributed by others later on.
  • Build dependencies would have to come through the toolchain definition and they can come from the same or a different pip hub repository - the user would have to specify and register the toolchains.
  • Bazel failing for the sdists would have to happen at the build phase (not the repo loading phase).

The toolchains that would be building sdists maybe could be defined/wired by the pip bzlmod extension, but I think the primitives for achieving the "build sdists in a build action" would have to be ironed out separately first. And before doing any coding we would probably have to have a tech proposal here outlining the design a bit better.

@aignas aignas changed the title PEP621 Support building from sdist in a build action Support building from sdist in a build action Dec 8, 2024
@rickeylev
Copy link
Collaborator

Using toolchains for the pieces that handle an sdist is a good idea, +1 to that.

The same toolchain type for building sdist means that there has to be a config setting somewhere where the sdist builder is using transitions to ensure that the right toolchain is resolved.

Alternatively each builder is its own rule with its own toolchain type. Perhaps there some common toolchain types to carry common backends can reuse. I think we can know which builder to use at repo-phase, right? Since we can look in the sdist's toml file to figure that out. Then codegen e.g. pip_sdist|build_sdist|etc

Using transition, it would have to be something like sdist(backend="pip|build|uv|..."). That seems akin to having a "mode" flag for a function (e.g. binop(mode="add") vs add()).

A variation would be to have a toolchain type per backend, and an attribute specifies with toolchain type to lookup. So there'd be pip_toolchain_type, build_toolchain_type, uv_toolchain_type, etc, all of which are optional. Not sure if I really like this idea.

Build dependencies would have to come through the toolchain definition and they can come from the same or a different pip hub repository - the user would have to specify and register the toolchains.

How would this be possible? I guess if the user is registering the toolchains, yeah, that could work. Ideally, it'd be nice if somehow the sdist metadata (pyproject.toml etc) could be processed during the repo phase to generate dependency labels. Once we have labels (of any type), we can then try to get them bound to things.

The toolchains that would be building sdists maybe could be defined/wired by the pip bzlmod extension

Yeah, this makes sense. It is those pip.parse() generated pieces that eventually cause those sdists to bulid. Are the sdist build dependencies a separate dependency closure from the pip.parse requirements input? If so, then we'd need someway for pip.parse to take a "sdist_build_requirements.txt" type of thing.

@aignas
Copy link
Collaborator Author

aignas commented Dec 9, 2024

I like your idea to have a variation per backend. The toolchain types then could be defined in rules_python. I was thinking to have a type per distribution name and then the toolchain types would have to be defined elsewhere - maybe there should be a pypi_sdist_build_toolchains hub repo with all of the toolchain type definitions.

And I think I like your idea of having a rule per builder type. There probably will be a way to easily reuse most of the plumbing across the sdist building rules for each separate backend.

Build dependencies would have to come through the toolchain definition and they can come from the same or a different pip hub repository - the user would have to specify and register the toolchains.

How would this be possible? I guess if the user is registering the toolchains, yeah, that could work. Ideally, it'd be nice if somehow the sdist metadata (pyproject.toml etc) could be processed during the repo phase to generate dependency labels. Once we have labels (of any type), we can then try to get them bound to things.

My thinking is that the toolchain definition itself could be something like:

def sdist_build_toolchain(name, build_deps, ...):
    py_library(name = "build_env", deps = build_deps)
    sdist_build_toolchain_internal(name = name, build_env = "build_env")

# and then the user uses it in their own BUILD.bazel (or we automate it in pip.parse) by
sdist_build_toolchain(
    name = "matplotlib_build_toolchain",
    deps = [
        "@pypi_build//qt5",
        "@pypi_build//numpy",
    ],
)

# MODULE.bazel
register_toolchains("//:matplotlib_build_toolchain")

@groodt
Copy link
Collaborator

groodt commented Dec 9, 2024

We can chat at next maintainers meeting, but I’m not actually sure you need a toolchain per build backend.

The way that a PEP517 client/frontend works is by installing the build dependencies declared in build-system.requires and invoking the module declared in build-system.backend. Then hooks are called to invoke the backend. See https://github.com/pypa/pyproject-hooks

The issue with that is that often it’s not enough. Often sdists require additional dependencies to build and config settings provided in READMEs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants