Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid merge conflicts in MODULE.bazel.lock #2434

Open
psalaberria002 opened this issue Nov 21, 2024 · 8 comments
Open

Avoid merge conflicts in MODULE.bazel.lock #2434

psalaberria002 opened this issue Nov 21, 2024 · 8 comments
Labels
need: upstream support An issue that needs changes in upstream code type: pip

Comments

@psalaberria002
Copy link

The digest of the requirements.txt file seems to be added to the lock file, which triggers merge conflicts.

    "@@rules_python+//python/extensions:pip.bzl%pip": {
      "general": {
        "bzlTransitiveDigest": "Iwo5aX1NCLf2xFr7Cq4NNxpYMRA7qIh3/uaa/Kk4myg=",
        "usagesDigest": "A5I0AW9kWKM1sMWDOt7FRtMoXFL0LONjpXqb43RHBrw=",
        "recordedFileInputs": {
          "@@grpc+//requirements.bazel.txt": "95a27c3f9a46b8114d464c70ba93cda18cfe8c02004db81028f9306b2691701e",
          "@@//requirements.txt": "29c6501451ce48549f1dac1b73e0cd42ee924a3b7649ed0080254de4bbcebb7f"
        },

Related thread on Slack https://bazelbuild.slack.com/archives/C014RARENH0/p1731817466761249

Is this necessary?

@aignas
Copy link
Collaborator

aignas commented Nov 22, 2024

Previous analysis/comment on why we need to have the extension as non reproducible and include all of the contents in the main lock file. Please see the comment below for just looking at the requirements.txt hash in the lock file.

  • If at least one module that the root module is pulling in as dependencies (transitive or direct) is using pip.parse with experimental_index_url flag then we cannot mark the extension as reproducible, because we are calling to the internet to get the URLs of the packages to download using the bazel-downloader. This is a limitation of bzlmod that I am not sure we can get around.
  • @fmeum's suggestion that any information in the lockfiles that already exist on the file system elsewhere is I think a good suggestion, but in rules_python case the information in the lock file is incomplete if we want to use bazel-downloader. We could call the PyPI index from each whl_library repository rule to avoid doing that in the module extension context, but from my understanding, bzlmod extensions were actually design for this usecase - look up the artifacts in some registries and construct the spoke repos.

In terms of what rules_python can do, we have the following options:

  • Create a separate extension that always writes things into the lock file - then users who don't depend on experimental_index_url can avoid any Python dependencies in the lockfiles. However, that does not really solve the problem, but limits the space where the problem manifests itself. See refactor!(bzlmod): introduce pypi.install extension #2278 for reasons for not going with that solution.
  • Move the fetching of the PyPI index data elsewhere and pass the results around as labels. This is only doable from bazel 7.4 onwards because only then we can construct valid labels of the spoke repos within the extension evaluation scope. However, this somewhat complicates the design of the PyPI extension - instead of having a simple hub-spoke pattern, we need to fetch the PyPI metadata lazily and we cannot write that into the MODULE.bazel.lock file meaning that each user (not only the one that does the updates to the requirements files) needs to fetch the metadata from PyPI. I've tried this design a long time ago and it is not a great solution to the problem.
  • Create a rules_python lock file. This means that users would need to be able to generate a requirements.txt file on demand from the rules_python lock file format or have 2 lock files in the version control. Having 2 lock file formats is what we have today. Having a rules_python specific lock file format might be a good idea, but then rules_uv usage could become more complicated because users would need to somehow convert the generated requirements.txt file into a rules_python lockfile format. So in the end we would end up with 2 lock file formats in the version control system. Again, not a great solution.
  • Do not support requirements.txt - only support uv.lock, pdm.lock and poetry.lock files for cross platform builds instead of going through the problem of calling PyPI and getting the metadata ourselves. We could have an attribute for each format in the pip extension.
  • Ask bazel to support a way to mark specific repos created via extensions as reproducible, so that they do not need to have their parameters recorded in the lock file, but they and their dependencies are still tracked in the lock file. Right now the bazel-skylib helper provides an all-or-nothing solution and that makes it problematic in our use case.

In general I don't see good alternatives here except for changes in bazel or dropping support for requirements.txt as the lockfile format that we support cross platform builds for.

@fmeum are there any plans to make the bzlmod extensions being able to control what goes into the lock file?

EDIT: Looking that this proposal has been already implemented, it does not seem that there is any other option. than to split the extension into 2

EDIT2: updated based on Richard's comment.

@aignas
Copy link
Collaborator

aignas commented Nov 22, 2024

Just looking at the requirments.txt lock file hash in the MODULE.bazel.lock file - right now it is triggered by us reading the lock file with mctx.path/read. If the requirements.txt file changes, then bzlmod knows that it needs to re-evaluate the extension.

I am not sure if there is anything better we can do - @fmeum, what would be the proposed behaviour here? If we pass things as labels in the rule, are we OK to pass watch = False when reading the files?

@rickeylev
Copy link
Collaborator

EDIT: Looking that this proposal has been already implemented, it does not seem that there is any other option than to split the extension into 2.

but earlier in your post you say that having a second extension just reduces the problem, not avoids it?

@aignas
Copy link
Collaborator

aignas commented Nov 22, 2024

FYI, just did a quick experiment to see if mctx.read(watch = 'no') would remove the requirements files hashes and it seems that no:

$ diff MODULE.bazel.lock{before,}
162c162
<         "bzlTransitiveDigest": "vHJJUty2FcJdIyZ/BT+BKmemuqJ4FOvi9k8DzEIbpQU=",
---
>         "bzlTransitiveDigest": "45+MBqepr0RThPGA9Ls9A3aC2xq8tQMQYvXNS4+yn7g=",
aignas@panda ~/src/github/aignas/rules_python exp/lock
$ gd
diff --git a/python/private/pypi/parse_requirements.bzl b/python/private/pypi/parse_requirements.bzl
index 133ed18d..bb4fe659 100644
--- a/python/private/pypi/parse_requirements.bzl
+++ b/python/private/pypi/parse_requirements.bzl
@@ -93,7 +93,7 @@ def parse_requirements(
     for file, plats in requirements_by_platform.items():
         if logger:
             logger.debug(lambda: "Using {} for {}".format(file, plats))
-        contents = ctx.read(file)
+        contents = ctx.read(file, watch = 'no')

         # Parse the requirements file directly in starlark to get the information
         # needed for the whl_library declarations later.

So it suggests me that bazel itself is adding hashes based on the input to the tag classes.

@fmeum
Copy link
Contributor

fmeum commented Dec 3, 2024

That ctx.read(label, watch = "no") still watches the file referenced by the label is a bug in Bazel that's fixed in 8.

Generally speaking, if requirements.txt is an input to the extension but not by itself sufficient to pin all downloads via hashes, I don't see a way around this issue - it's ultimately a domain problem, not a Bzlmod problem. Even if the hash was replaced by something human readable (say the contents of the file), you would still end up with merge conflicts that a human can't solve: if both the inputs and output to pip resolution change, any manual merge conflict resolution may lead to an invalid resolution.

New attributes that can point to real lockfiles (your fourth option) sounds best to me as it avoids introducing Bazelisms.

@aignas
Copy link
Collaborator

aignas commented Dec 4, 2024

@fmeum, retested with the latest RC and it is seems it is working:

$ diff MODULE.bazel.lock.{before,after}
158c158
<         "bzlTransitiveDigest": "Yn8iuI050bLARHZyLpQPR6r7Uw05FD3UAk3oLIBMnj8=",
---
>         "bzlTransitiveDigest": "Hb3GsvhnTfmwN+829Npkoch0NirLXCV6lN1RCt5gJY8=",
162,163d161
<           "@@//docs/requirements.txt": "c5973a20b0b6284d69f245d12f06be03d8e0f644465aaea81b99e523f900b938",
<           "@@//examples/wheel/requirements_server.txt": "981e09e454aa31d72f73f369436fce488e5a478717a8fac808412f4695e44823",
165,170c163
<           "@@//python/private/pypi/whl_installer/platform.py": "b944b908b25a2f97d6d9f491504ad5d2507402d7e37c802ee878783f87f2aa11",
<           "@@//tools/publish/requirements_darwin.txt": "2994136eab7e57b083c3de76faf46f70fad130bc8e7360a7fed2b288b69e79dc",
<           "@@//tools/publish/requirements_linux.txt": "8175b4c8df50ae2f22d1706961884beeb54e7da27bd2447018314a175981997d",
<           "@@//tools/publish/requirements_windows.txt": "7673adc71dc1a81d3661b90924d7a7c0fc998cd508b3cb4174337cef3f2de556",
<           "@@protobuf+//python/requirements.txt": "983be60d3cec4b319dcab6d48aeb3f5b2f7c3350f26b3a9e97486c37967c73c5",
<           "@@rules_fuzzing+//fuzzing/requirements.txt": "ab04664be026b632a0d2a2446c4f65982b7654f5b6851d2f9d399a19b7242a5b"
---
>           "@@//python/private/pypi/whl_installer/platform.py": "b944b908b25a2f97d6d9f491504ad5d2507402d7e37c802ee878783f87f2aa11"

Will the fix be backported to 7.x branch? If we have watch = "no" in there, how can we force the re-evaluation of the extension if the file referenced by the label changes? Could we have an env var that does this?

If we cannot get it to work with watch = "no" solution, then I guess that #2271 could be a way around the problem - requirements.txt is something that has incomplete information and if users were able to specify the dependencies in-line, then the hash of a file would not be in the lock file and the merge conflicts might be rarer. If we had support for converting a requirements.txt to a requirements.MODULE.bazel which could be included in the main MODULE.bazel, then the hash of the file would not be in the lock file and thus it could work.

However that requires:

  • Support for specifying the dependencies directly as attributes without going through requirements.txt. I think that feature in itself is extremely complex:
    • We cannot depend on files that have lists of deps (e.g. pyproject.toml), because we will have the same issue as here.
    • We need to ensure that we have a way to handle transitive dependencies or the user needs to specify them. If the transitive dependencies are not specified then we would need the bzlmod extension to do an equivalent of uv lock to lock down the dependencies and that could take time or might require some extra wiring.
    • Once we have that, we can call out to PyPI index to fetch the URL metadata.
    • ... many other things
  • Being able to generate the said file through some sort of new locking mechanism, which is what we were trying to avoid in the first place.

I think just supporting different lock files which have the URLs in them might be easier.

@aignas
Copy link
Collaborator

aignas commented Dec 5, 2024

Just checked the MODULE.bazel.lock behaviour when using watch = "no" and what happens if the file is changed. It seems that watch = "no" means that the extension is not re-evaluated when the requirements.txt file changes. It seems that it does not matter that pip.parse receives the file as a label - if the file is not read with watch = "yes" then it will be ignored by the extension when it comes to extension re-evaluation.

So just to reiterate all available options here in order to not have merge conflicts:

  • rules_python has its own lock file format and then pip.parse is using reproducible = True in the extension metadata. This means that MODULE.bazel.lock will not contain anything related to the pip dependencies and rules_python can store the url and sha256 information for each python package that is used. The merge conflicts within the rules_python lock file are possible, but at least there is no single line that constantly changes. For this we need to make changes to the requirement lockers to fetch additional information. This needs to be consistent with the requirements.txt or users will see strange behaviour.
  • rules_python accepts the dependencies only by supporting Allow specifying dependencies directly in pip.parse #2271.
  • rules_python does not support "Cross compilation" of py_binary/py_image/py_library targets #260 with requirements.txt and only supports that using uv.lock, poetry.lock and/or pdm.lock. This means that all of the extensions would have to use reproducible = True in the extension_metadata.

My current thinking:

  • I think I am tempted to go for the last option as the consistency issues are hard to debug, but I haven't thought about the transition plan.
  • I have checked the Add multi-platform torch example #2449 example and it seems that there is currently no way to easily generate requirements.txt files with extra pip arguments as mentioned in https://rules-python.readthedocs.io/en/latest/pypi-dependencies.html#multi-platform-support so that users could have multi-platform support out of the box. So we would have to provide some solution to users.
  • I think it would be easy to make the internal models of creating whl_library repositories compatible with uv.lock and friends, but haven't done any prototyping lately.

I got to go now.

@aignas
Copy link
Collaborator

aignas commented Dec 22, 2024

FYI, I did some experimentation and it seems that the best way to solve this would be to leverage what was previously presented in MODULE.bazel.lock design doc. I have created an issue bazelbuild/bazel#24777.

This might also help to store extra facts about which distributions are on which index servers to help with #2100.

@aignas aignas added type: pip need: upstream support An issue that needs changes in upstream code labels Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need: upstream support An issue that needs changes in upstream code type: pip
Projects
None yet
Development

No branches or pull requests

4 participants