Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] Rewrite rpmautospec in a compiled language #217

Open
sgallagher opened this issue Dec 2, 2024 · 20 comments
Open

[RFE] Rewrite rpmautospec in a compiled language #217

sgallagher opened this issue Dec 2, 2024 · 20 comments
Labels
buildsystem Something affecting build systems dependencies Affecting dependencies enhancement New feature or request performance Something is slow

Comments

@sgallagher
Copy link
Contributor

rpmautospec was rewritten to run inside the mock environment for several good reasons, not least of which being that it's the only realistic way to ensure that the version of rpm that it's using for parsing matches the version on the target OS. However, as things have progressed, we've discovered several problems.

  1. rpmautospec needs to be re-bootstrapped for every new Python release
  2. Every new dependency makes the bootstrap more complicated
  3. The dependency on Python and numerous Python packages makes it difficult to create rpmautospec for new EPEL releases

This is largely solvable by rewriting rpmautospec in a compiled language, consuming libgit2 (possibly bundled, to avoid bootstrapping issues due to its erratic backwards-compatibility history) and librpm.

I propose that we make a plan for "rpmautospec 1.0" to be written in C, since that will require no additional bindings for libgit2 or librpm and aim to implement this no later than May 2025 (so as to avoid the Python 3.14 bootstrapping problem).

cc @hroncok @praiskup

@sgallagher sgallagher added enhancement New feature or request dependencies Affecting dependencies performance Something is slow buildsystem Something affecting build systems labels Dec 2, 2024
@hroncok
Copy link
Contributor

hroncok commented Dec 2, 2024

Wouldn't a memeory-safe language be a better choice for a new project in 2025?

Even if it is in Rust and uses several dependencies, those are statically compiled and the tools would still have no (or very little) runtime dependencies. There is a precedence for using Rust for such things, https://github.com/keszybz/add-determinism.

There's https://src.fedoraproject.org/rpms/rust-git2 and possibly https://github.com/rpm-software-management/librpm.rs (which does not seem quite there yet)

cc @decathorpe @keszybz

@hroncok
Copy link
Contributor

hroncok commented Dec 2, 2024

Alternatively, if this thing would be written as a single no-dependency Python script possibly using ctypes to interact with libgit2 and librpm, it would still solve the problem + it might be easier to hack on by casual contributors.

@decathorpe
Copy link

"rpmautospec 1.0" to be written in C

vade-retro-satanas

@sgallagher
Copy link
Contributor Author

Alternatively, if this thing would be written as a single no-dependency Python script possibly using ctypes to interact with libgit2 and librpm, it would still solve the problem + it might be easier to hack on by casual contributors.

Maybe easier to hack on, sure. I don't think it entirely solves the original problem, however. It still creates the potential for a bootstrapping issue with Python itself (or we disallow the python interpreter from using rpmautospec).

Regarding the use of rust: while I'm not necessarily opposed to it, there's a huge learning curve there; I'm currently volunteering to do this rewrite, but I am not versed in rust. I cannot commit to having the necessary cycles to both learn a new language and completely rewrite this project in it.

@hroncok
Copy link
Contributor

hroncok commented Dec 2, 2024

It still creates the potential for a bootstrapping issue with Python itself.

It doesn't. When we build Python 3.N+1 as python3 for the first time, we still have Python 3.N as python3.

I'm currently volunteering to do this rewrite...

That was not obvious to me from I propose that we make a plan for "rpmautospec 1.0" to be written in C...

...but I am not versed in rust. I cannot commit to having the necessary cycles to both learn a new language and completely rewrite this project in it.

I understand this argument. However, it also means you will be the only one available to maintain this1 . While if we actually choose the Python route, perhaps more folks can contribute (I know I certainly can).

Footnotes

  1. Anecdotal evidence: rpminspect is written in C and is mostly only ever contributed to by the author.

@hroncok
Copy link
Contributor

hroncok commented Dec 2, 2024

Either way, we should probably also figure out how to provide the importable Python module that is now available in rpmautospec-core and used by various Fedora Python projects (mock, rpkg, copr) trough the Python interface.

Would it continue to exist (and not be used by the new "standalone" rpmautospec at all)?

Similarly, would this "standalone" rpmautospec offer perks for packagers (such as the convert command or bash completion) or would that live in a separate Python project built on top of it?

@sgallagher
Copy link
Contributor Author

It still creates the potential for a bootstrapping issue with Python itself.

It doesn't. When we build Python 3.N+1 as python3 for the first time, we still have Python 3.N as python3.

Sorry, I meant there's still a bootstrapping loop with Python 3.N+1 and python-rpmautospec. While it's not the end of the world, it would be nice if we could use %autorelease and %autochangelog in rpmautospec itself.

I'm currently volunteering to do this rewrite...

That was not obvious to me from I propose that we make a plan for "rpmautospec 1.0" to be written in C...

Yeah, sorry. That information was lost in rewrites as I reworked the initial description.

@hroncok
Copy link
Contributor

hroncok commented Dec 2, 2024

It still creates the potential for a bootstrapping issue with Python itself.

It doesn't. When we build Python 3.N+1 as python3 for the first time, we still have Python 3.N as python3.

Sorry, I meant there's still a bootstrapping loop with Python 3.N+1 and python-rpmautospec.

I am sorry, but I still don't see it.

While it's not the end of the world, it would be nice if we could use %autorelease and %autochangelog in rpmautospec itself.

We could.

@sgallagher
Copy link
Contributor Author

Either way, we should probably also figure out how to provide the importable Python module that is now available in rpmautospec-core and used by various Fedora Python projects (mock, rpkg, copr) trough the Python interface.

rpmautospec-core is already pure-python and should have no other dependencies. All it does is, basically, run a regex across the specfile to see if %autorelease or %autochangelog appears in the file anywhere. It doesn't even use RPM parsing to do this. I don't see any real need to rewrite it at all.

Would it continue to exist (and not be used by the new "standalone" rpmautospec at all)?

Its behavior is so trivial that I'd probably reimplement it in the compiled rpmautospec and leave the python module there for convenience.

Good questions!

Similarly, would this "standalone" rpmautospec offer perks for packagers (such as the convert command or bash completion) or would that live in a separate Python project built on top of it?

I'll confess that I haven't made a final determination on that yet; it may come down to the available time I have. I may just write librpmautospec in C (or, I guess Rust if I'm feeling ambitious...) and then split off the existing CLI functionality into a python-rpmautospec-cli tool to avoid rewriting it. That split was already being considered to trim dependencies, so it might be best to do that first in any case.

@sgallagher
Copy link
Contributor Author

It still creates the potential for a bootstrapping issue with Python itself.

It doesn't. When we build Python 3.N+1 as python3 for the first time, we still have Python 3.N as python3.

Sorry, I meant there's still a bootstrapping loop with Python 3.N+1 and python-rpmautospec.

I am sorry, but I still don't see it.

python-rpmautospec has a runtime requirement on python(abi) = 3.13. It also uses %autorelease in the Release: field, which means that rpmautospec-core will tell mock to pull in rpmautospec. Once we build python3.14 into a buildroot, we won't be able to satisfy the python(abi) = 3.13 dependency that would be required to rebuild python-rpmautospec against python3.14, unless I'm missing something. (Yes, we could do a bootstrap dance and build it once without %autorelease, but I'd prefer to avoid special handling where possible).

@hroncok
Copy link
Contributor

hroncok commented Dec 2, 2024

python-rpmautospec has a runtime requirement on python(abi) = 3.13.

Yes, it currently has. If we rewrite it to standalone Python script to be installed in /usr/bin/ (as if it was written in C or Rust or whatever), it wouldn't have.

It also uses %autorelease in the Release: field, which means that rpmautospec-core will tell mock to pull in rpmautospec. Once we build python3.14 into a buildroot, we won't be able to satisfy the python(abi) = 3.13 dependency that would be required to rebuild python-rpmautospec against python3.14, unless I'm missing something. (Yes, we could do a bootstrap dance and build it once without %autorelease, but I'd prefer to avoid special handling where possible).

Yes, this is in essence what I reported as #169

We are trying to find a solution. You said "rewrite it in C", I said "rewrite it in Rust or a standalone Python script". All of the three mentioned solutions remove the python(abi) requirement, solving #169

@sgallagher
Copy link
Contributor Author

OK, I think I see the missing piece of the puzzle. I didn't understand the term "standalone Python script" to mean "it doesn't need the interpreter". I wasn't actually aware that was an option. Definitely worth considering.

@hroncok
Copy link
Contributor

hroncok commented Dec 2, 2024

I think we still don't understand each other. A standalone Python script still needs the interpreter. It just doesn't need python(abi). See for example usbutils package and /usr/bin/lsusb.py

@decathorpe
Copy link

FWIW I had previously semi-seriously offered to work on a Rust port, when a similar problem was discussed regarding EPEL 10. I don't know how much time I could devote to this, but it would be an option.

@sgallagher
Copy link
Contributor Author

@hroncok I think I need a more concrete explanation of what you are suggesting. Do you mean that we should disable automatic Requires generation and just specify Requires: /usr/bin/python3 explicitly?

I could certainly do that for python3-rpmautospec and python3-rpmautospec-core.

We still have to figure out what to do about the dependencies:

  • babel: Looks like this is being used for a single function, babel.dates.format_datetime() which I'm reasonably sure we could rewrite using the standard lib datetime.strftime() function, which I'll look into. There's no need for internationalization here.
  • click: This was added to make the CLI argument handling easier and to automate the creation of shell-completion scripts. I think we actually want to split out the CLI tool into an rpmautospec-cli package, which I think doesn't need to be part of the basic rpmautospec module.
  • pygit2 and libgit2: It turns out that pygit2 has no dependencies aside from libgit2, python3-cffi and the interpreter. I'm not sure if there's any value in trying to reimplement the pygit2 layer ourselves.

@hroncok
Copy link
Contributor

hroncok commented Dec 5, 2024

@hroncok I think I need a more concrete explanation of what you are suggesting. Do you mean that we should disable automatic Requires generation and just specify Requires: /usr/bin/python3 explicitly?

No disabling of any kind of automatic Requires generation is necessary.

  1. Drop all 3rd party dependencies outside of the Python standard library.
  2. Replace the Python bindings for RPM and libgit2 with ctypes calls.
  3. Stick everything into a single script living in /usr/bin. This script has a #!/usr/bin/python3 -sP shebang, the dependency on /usr/bin/python3 is generated. No dependency on python(abi) is generated.

I could certainly do that for python3-rpmautospec and python3-rpmautospec-core.

This would no longer be called python3-rpmautospec, it would be called just rpmautospec. Tha fact that it uses Python is an implementation detail.

(python3-rpmautospec-core is importable from Python. It is a different story. This single script thing would not be able to use it.)

We still have to figure out what to do about the dependencies:

  • babel: Looks like this is being used for a single function, babel.dates.format_datetime() which I'm reasonably sure we could rewrite using the standard lib datetime.strftime() function, which I'll look into. There's no need for internationalization here.

Agreed.

  • click: This was added to make the CLI argument handling easier and to automate the creation of shell-completion scripts. I think we actually want to split out the CLI tool into an rpmautospec-cli package, which I think doesn't need to be part of the basic rpmautospec module.

Or, it could be replaced by the standard library argparse module (this was in fact used here before) and the shell completion scripts could be in an optional subpackage (using for example argcomplete), or generated on build time somehow.

  • pygit2 and libgit2: It turns out that pygit2 has no dependencies aside from libgit2, python3-cffi and the interpreter. I'm not sure if there's any value in trying to reimplement the pygit2 layer ourselves.

There is value: No dependency on python(abi). No need to bootstrap pygit2.

@sgallagher
Copy link
Contributor Author

sgallagher commented Dec 5, 2024

babel: Looks like this is being used for a single function, babel.dates.format_datetime() which I'm reasonably sure we could rewrite using the standard lib datetime.strftime() function, which I'll look into. There's no need for internationalization here.

Hmm, I think I figured out why Babel is in use here; datetime.strftime() relies on the global locale setting. I could override it temporarily to force it to en_US, but that's not going to be thread-safe. (If rpmautospec is imported into a threaded Python script, there will be a brief window where the global locale differs from the end-user's locale). I could spawn a separate process (with its own locale) for the sole purpose of getting this string, but I feel like there has to be a better way.

@hroncok
Copy link
Contributor

hroncok commented Dec 5, 2024

If rpmautospec is imported into a threaded Python script...

That could not happen if it was just a standalone script in /usr/bin

@sgallagher
Copy link
Contributor Author

So, a lot of discussion has happened in various places in the last ten days and I want to capture as much of a summary as I can before disappearing on PTO at the end of the week (at least in part so I can refresh my own memory when I come back).

Important feedback, in no particular order:

  • Some strong arguments have been made against changing the language away from Python, notably for accessibility of contributions: Python has the largest pool of potential contributors. This is a really good point, and something that should not be discounted out of hand. Though few outside contributions to rpmautospec have occurred thus far, the potential is important.
  • Rewriting in a compiled language (C, C++, Rust, etc.) will be exactly that: a complete rewrite. It should not be the first choice if we can help it.
  • The most critical problems to solve right now are the bootstrapping issues around Python and EPEL (though EPEL has worked around it for this cycle).

With that all in mind, I pivoted to seeing what options we have for mitigating our issues with Python rather than completely rebuilding rpmautospec in a new language. So the next things to explore are solutions to the main issues with bootstrapping rpmautospec with new Python releases:

Problem 1) rpmautospec itself and several of its Python dependencies rely on a functional rpmautospec in the build environment.

Problem 2) rpmautospec's dependency chain is of moderate length; it's not enormous, but neither is it tightly contained (and dependencies have a tendency to grow more dependencies over time). Each of these dependencies that also use rpmautospec adds to the complexity of bootstrapping.
The rpmautospec Python chain today is (redundancies excluded):

python3-rpmautospec-core
|
- python3.X (interpreter)
|
- python3-rpmautospec
  |
  - python3-babel
  |
  - python3-click
  |
  - python3-click-plugins
  |
  - python3-pygit2
    |
    - python3-cffi
    |
    - python3-ply
    |
    - libgit2 (C library)

Of these dependencies, we are working to drop several of them and potentially rework others. We've already dropped python3-babel upstream as it was being used for a single feature. We are also planning to drop the python3-click and python3-click-plugins dependency for the main functionality of rpmautospec. Those core pieces needed for build operations will be provided with Python standard library argument handling only. We may implement a new rpmautospec-cli package for the more fully-featured CLI tool, potentially using Click, but this won't need to be part of the bootstrap set.

We are also exploring the possibility of replacing python3-pygit2 with a pure-Python implementation that we can embed in the rpmautospec script. This would allow us to extricate rpmautospec from the /usr/lib/python3.X filesystem hierarchy and remove the bootstrap-impeding dependency on a matching version of Python in the buildroot. It would be able to run against whatever /usr/bin/python3 binary calls it.
This work will be ongoing in January.

@hroncok
Copy link
Contributor

hroncok commented Dec 13, 2024

FWIW this is how we could replace the rpm module:

import ctypes
from ctypes.util import find_library


class RPMError(RuntimeError):
    pass


# all the RPM state is global and not thread safe
# if need to be, we could create a class that holds rpmMacroContext
_library_name = find_library("rpm")
if _library_name:
    LIBRPM = ctypes.cdll.LoadLibrary(_library_name)
else:
    raise RPMError("could not locate librpm")
LIBRPM.rpmExpandMacros.restype = ctypes.c_int
LIBRPM.rpmlogSetFile.argtypes = [ctypes.c_void_p]
LIBC = ctypes.cdll.LoadLibrary(find_library("c"))
LIBC.fdopen.argtypes = [ctypes.c_int, ctypes.c_char_p]
LIBC.fdopen.restype = ctypes.c_void_p


def rpm_reload_config():
    LIBRPM.rpmFreeMacros(None)
    LIBRPM.rpmFreeRpmrc()
    if LIBRPM.rpmReadConfigFiles(LIBRPM.rpmcliRcfile, None) != 0:
        raise RPMError("could not rpmReadConfigFiles")


# we call this immediately
rpm_reload_config()


def rpm_expand(source):
    source_b = source.encode("utf-8")
    result_c = ctypes.c_char_p()
    if (rc := LIBRPM.rpmExpandMacros(None, source_b, ctypes.byref(result_c), 0)) < 0:
        raise RPMError(f"could not expand: {source!r}, got {rc}")
    result = result_c.value.decode("utf-8")
    LIBC.free(result_c)
    return result


def rpm_define(macro, value):
    LIBRPM.rpmPushMacro(None, macro.encode("utf-8"), None, value.encode("utf-8"), -1)


def rpm_set_log_file(fileobj=None):
    if fileobj is None:
        LIBRPM.rpmlogSetFile(None)
    else:
        LIBRPM.rpmlogSetFile(LIBC.fdopen(fileobj.fileno(), b"a"))

Replacing rpm.spec(spec_candidate).sourceHeader.format(query) should be similarly possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buildsystem Something affecting build systems dependencies Affecting dependencies enhancement New feature or request performance Something is slow
Projects
None yet
Development

No branches or pull requests

3 participants