Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python typestubs #496

Open
ddn0 opened this issue May 31, 2024 · 11 comments
Open

Python typestubs #496

ddn0 opened this issue May 31, 2024 · 11 comments

Comments

@ddn0
Copy link

ddn0 commented May 31, 2024

I'm wondering if this project already maintains Python typestubs (pyi) for the google-re2 package.

If not, I've written typestubs for my own personal use, would this project be interested in accepting them? Otherwise, I can maintain a separate typestub library.

@junyer
Copy link
Contributor

junyer commented Jun 1, 2024

Thanks for asking! RE2 doesn't have a stub file, but it really should. Unfortunately, I have no experience with Python type checkers apart from https://github.com/google/pytype, so I will need some time to read about (and tinker with) https://github.com/python/mypy before it would be safe to say that I know what I'm doing. (stubgen and stubtest already sound like they will be very useful.) I also need to give some thought to plumbing everything into our Bazel CI workflow...

@ddn0
Copy link
Author

ddn0 commented Jun 3, 2024

I have a stub file written, which you can use as a starting point.

My plan this week was to package it up to be pip installable. This might be a useful basis for you too? I can share the repo when it is ready (and I'm happy to sign any CLA required for you to take the code).

@junyer
Copy link
Contributor

junyer commented Jun 3, 2024

Many thanks for offering! I did spend some time looking into this on Saturday and, as I had feared, the situation with Bazel with regard to Python type checkers is... truly dire. Distributing the stub file as part of google-re2 remains something that I want to do... eventually. In the short term, however, you will need to DIY. :(

@ddn0
Copy link
Author

ddn0 commented Jun 6, 2024

For others, here's the stub package I will be maintaining: https://github.com/ddn0/google-re2-stubs

@junyer
Copy link
Contributor

junyer commented Jun 7, 2024

Thanks, @ddn0!

@cosmicexplorer, I'm thinking to give up trying to work entirely within Bazel and, instead, to factor out generate_python_toolchains() in order to make the Bazel CI workflow use the local Python environment. That way, Bazel-built artifacts will be compatible with the local Python environment.

copybara-service bot pushed a commit that referenced this issue Jun 7, 2024
This is setting up to run mypy as per #496.

Change-Id: I48e67387313ac01d493086be8481b34f26bafe82
copybara-service bot pushed a commit that referenced this issue Jun 8, 2024
This is setting up to run mypy as per #496.

Change-Id: I48e67387313ac01d493086be8481b34f26bafe82
Reviewed-on: https://code-review.googlesource.com/c/re2/+/63230
Reviewed-by: Alex Chernyakhovsky <[email protected]>
Reviewed-by: Paul Wankadia <[email protected]>
@cosmicexplorer
Copy link

Have been working a pip-only solution for now, and realized after much trial and error that:

  1. Namespace packages (declared with packages=[...]) are the only allowed way to bundle type stubs with a wheel (see https://peps.python.org/pep-0561/ and https://typing.readthedocs.io/en/latest/spec/distributing.html#packaging-typed-libraries). Our current py_modules=['re2'] won't work, because package_data and data_files all stick things into a subdirectory (this seems intentional, and very frustrating). So we won't be able to stick with a single-file package. This alone would be fine, except...
  2. Namespace packages are not able to import setuptools.Extension modules for some reason I cannot understand. I have tried to place our _re2 python module in e.g. _re2._re2 instead, to mimic my other project https://github.com/cosmicexplorer/medusa-zip, but this fails for some reason. I noticed that unlike medusa-zip, setuptools.Extension will always place _re2 into top_level.txt in the generated wheel, whereas e.g. setuptools_rust() for medusa-zip does not do that. So for whatever reason, the c/c++ setuptools.Extension interface does not have feature parity with setuptools_rust(). It's possible this may be related to multi-phased initialization, and maybe pyo3 from medusa-zip does that but our pybind11 doesn't?
  3. However, I was able to get a terrible prototype to initialize our poor _re2 module by hand, using the changes from this branch (see 5235d0f or https://github.com/google/re2/compare/main...cosmicexplorer:internal-type-stubs?expand=1):
; cd python
; python setup.py bdist_wheel
; ls build/lib.linux-x86_64-cpython-310/re2/_re2.cpython-310-x86_64-linux-gnu.so 
-rwxr-xr-x 1 cosmicexplorer wheel 5.5M Jun  8 16:37 build/lib.linux-x86_64-cpython-310/re2/_re2.cpython-310-x86_64-linux-gnu.so*
(bbb) # cosmicexplorer@terrestrial-gamma-ray-flash: ~/tools/re2/python 16:43:51 
; python
Python 3.10.13 (main, Jan 10 2024, 22:22:23) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from importlib.machinery import ExtensionFileLoader as EFL
>>> EFL('_re2', 'build/lib.linux-x86_64-cpython-310/re2/_re2.cpython-310-x86_64-linux-gnu.so')
<_frozen_importlib_external.ExtensionFileLoader object at 0x70ae72e98250>
>>> zzz = EFL('_re2', 'build/lib.linux-x86_64-cpython-310/re2/_re2.cpython-310-x86_64-linux-gnu.so')
>>> zzz
<_frozen_importlib_external.ExtensionFileLoader object at 0x70ae72e98be0>
>>> dir(zzz)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'create_module', 'exec_module', 'get_code', 'get_data', 'get_filename', 'get_resource_reader', 'get_source', 'is_package', 'load_module', 'name', 'path']
>>> zzz.load_module()
<module '_re2' from 'build/lib.linux-x86_64-cpython-310/re2/_re2.cpython-310-x86_64-linux-gnu.so'>
>>> _re2 = zzz.load_module()
>>> dir(_re2)
['BytesToCharLen', 'CharLenToBytes', 'Error', 'Filter', 'RE2', 'Set', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__']

However, trying to use ExtensionFileLoader seems to fail when providing it any input that's not an unpacked physical file path, even though this syntax for indexing into a zip file works with other importers:

(bbb) # cosmicexplorer@terrestrial-gamma-ray-flash: ~/tools/re2/python 16:58:54 
; ls dist
total 1.7M
-rw-r--r-- 1 cosmicexplorer wheel 1.7M Jun  8 16:37 google_re2-1.1.20240601-cp310-cp310-linux_x86_64.whl
(bbb) # cosmicexplorer@terrestrial-gamma-ray-flash: ~/tools/re2/python 16:58:55 
; python 
Python 3.10.13 (main, Jan 10 2024, 22:22:23) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> dir(_re2)
KeyboardInterrupt
>>> from importlib.machinery import ExtensionFileLoader as EFL
>>> zzz = EFL('_re2', 'dist/google_re2-1.1.20240601-cp310-cp310-linux_x86_64.whl/re2/_re2.cpython-310-x86_64-linux-gnu.so')
>>> zzz.load_module()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<frozen importlib._bootstrap_external>", line 548, in _check_name_wrapper
  File "<frozen importlib._bootstrap_external>", line 1063, in load_module
  File "<frozen importlib._bootstrap_external>", line 888, in load_module
  File "<frozen importlib._bootstrap>", line 290, in _load_module_shim
  File "<frozen importlib._bootstrap>", line 719, in _load
  File "<frozen importlib._bootstrap>", line 674, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 571, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1176, in create_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ImportError: dist/google_re2-1.1.20240601-cp310-cp310-linux_x86_64.whl/re2/_re2.cpython-310-x86_64-linux-gnu.so: cannot open shared object file: Not a directory
>>> 

I also tried to get this working by executing the module straight from its bytes, but it appears loading a native extension requires the special handling of the ExtensionFileLoader, which cannot receive anything other than a physical file path.

However, the branch I linked (https://github.com/google/re2/compare/main...cosmicexplorer:internal-type-stubs?expand=1) does at least demonstrate that we can keep using setuptools.Extension to build our _re2 module, and then keep type stubs in a new re2 "package", which shouldn't break any APIs and should also provide typing info. So if we wanted to take this approach (instead of completely reworking the build system, which sounds even less promising), we would need to:

  • Extract the _re2.cpython-...-.so into some known (checksummed?) location every time the re2 module is loaded.
    • I thought this was already done automatically at this point? but even marking zip_safe=False fails to ensure the wheel is unzipped before use, so it seems that we need to get that native module somewhere onto the physical filesystem.
      • I believe pip installs will automatically unzip, so in the typical case we probably won't have to unzip by hand. In that case, it might be reasonable to extract to a temp file each time. I would really prefer not to do this, or to unzip to a checksummed location somehow -- hopefully someone else knows a reliable way to do this.
    • I used __spec__.loader.archive to see whether the dist is currently being loaded from a zip file e.g. wheel -- I believe that we can avoid the unzipping if we get executed from an actual file path. We would need some extra logic in python/re2/native.py to check if we're being loaded from a zip file, and if so, unzip to some file path.
  • As above, use ExtensionFileLoader.load_module() to load the _re2 native module from the raw file path.

There were a whole host of difficulties I think we really shouldn't have had to deal with here, but if we can figure out a way to unzip the native _re2 module to a checksummed location so it only ever occurs once, I think we will be able to achieve this restructuring of the package contents to enable our users to get access to our type stubs (the generics I've used are actually quite useful to avoid str/bytes mixups). I am relatively confident that our current approach with single-file modules via py_modules=['re2'] will not allow us to distribute typing information, since type stubs seem to only be supported with the packages=[...] argument, but if someone else figures out how to avoid that drastic restructuring, I would love to hear about it.

copybara-service bot pushed a commit that referenced this issue Jun 9, 2024
We need `re2` to be a package, not a module, because it appears that
modules can't have `.pyi` files, so munge the module into a package.

Change-Id: I1a268875743390c32c0fb9cd58f6d83a670ce928
copybara-service bot pushed a commit that referenced this issue Jun 9, 2024
Uploaded patch set 1.

Patch-set: 1
Change-id: I1a268875743390c32c0fb9cd58f6d83a670ce928
Subject: More setting up to run mypy as per #496.
Branch: refs/heads/main
Status: new
Topic: 
Commit: e938270
Tag: autogenerated:gerrit:newPatchSet
Groups: e938270
Private: false
Work-in-progress: false
@junyer
Copy link
Contributor

junyer commented Jun 9, 2024

I spent a lot of tonight glaring at the setuptools documentation until I finally happened to notice ext_package, which does the needful. I now have a change out for review that will munge the module into a package.

copybara-service bot pushed a commit that referenced this issue Jun 9, 2024
We need `re2` to be a package, not a module, because it appears that
modules can't have `.pyi` files, so munge the module into a package.

Change-Id: I1a268875743390c32c0fb9cd58f6d83a670ce928
copybara-service bot pushed a commit that referenced this issue Jun 9, 2024
Uploaded patch set 2.

Patch-set: 2
Subject: More setting up to run mypy as per #496.
Commit: aadebe5
Tag: autogenerated:gerrit:newPatchSet
Groups: aadebe5
copybara-service bot pushed a commit that referenced this issue Jun 10, 2024
Change has been successfully cherry-picked as 10f876d

Patch-set: 3
Subject: More setting up to run mypy as per #496.
Status: merged
Commit: 10f876d
Tag: autogenerated:gerrit:merged
Groups: aadebe5
Label: Code-Review=+2, ad8c2965909d1475610e379f93c9659b7d559a7c
Label: Code-Review=+1, 92dba7285c5342283bec690b6e1168d01e7be543 Gerrit User 44612 <44612@c958e1eb-c711-3e17-a1d0-c94d35b2e5aa>
Label: SUBM=+1, 32d94f2fe9b9d15bd3b8ce1e35892fa4da113e25
Submission-id: 63270
Submitted-with: OK
Submitted-with: Rule-Name: gerrit~DefaultSubmitRule
Submitted-with: MAY: Code-Review: Gerrit User 44612 <44612@c958e1eb-c711-3e17-a1d0-c94d35b2e5aa>
copybara-service bot pushed a commit that referenced this issue Jun 10, 2024
We need `re2` to be a package, not a module, because it appears that
modules can't have `.pyi` files, so munge the module into a package.

Change-Id: I1a268875743390c32c0fb9cd58f6d83a670ce928
Reviewed-on: https://code-review.googlesource.com/c/re2/+/63270
Reviewed-by: Paul Wankadia <[email protected]>
Reviewed-by: Alex Chernyakhovsky <[email protected]>
@cosmicexplorer
Copy link

cosmicexplorer commented Jun 13, 2024

I have some type stubs available at https://github.com/google/re2/compare/main...cosmicexplorer:type-stubs?expand=1 along with a change to setup.py which inserts them into the wheel, but will need to figure out how to change the github workflow to run mypy EDIT: done!. @junyer how should I test github CI? Does that get tested on gerrit? I have just disabled github pro so actions no longer run on my own account. I was also going to try making bazel run mypy but would prefer to do that in a followup change @_@.

EDIT: on gerrit as https://code-review.googlesource.com/c/re2/+/63310!

@junyer
Copy link
Contributor

junyer commented Jun 13, 2024

@junyer how should I test github CI? Does that get tested on gerrit? I have just disabled github pro so actions no longer run on my own account.

All of the CI workflows are "post-submit" only: they trigger when Copybara mirrors https://code.googlesource.com/re2 to GitHub. No need to concern yourself with anything more than local testing. :)

I was also going to try making bazel run mypy but would prefer to do that in a followup change @_@.

If the Bazel community ever decides to make Python type checkers Just Work™, we can revisit this then. In the meantime, I have no desire to (co-)author and subsequently maintain whatever Starlark would be needed, so please don't spend any (more!) time worrying about it. ;)

copybara-service bot pushed a commit that referenced this issue Jun 21, 2024
Change-Id: Ib1250ec954c0fecdaed606c41bbbbe00ff7bc1de
Reviewed-on: https://code-review.googlesource.com/c/re2/+/63371
Reviewed-by: Paul Wankadia <[email protected]>
Reviewed-by: Alex Chernyakhovsky <[email protected]>
@ddn0
Copy link
Author

ddn0 commented Oct 4, 2024

Hello!

Wondering if this issue is still being worked on. I'm getting PRs on my stub package ddn0/google-re2-stubs#4 and I'm wondering if I should signpost that my repo should be sunsetted or if I should continue my maintenance.

@trim21
Copy link

trim21 commented Oct 6, 2024

also we should consider export _Match and _Regexp so they can be used as typing

for example:

pattern: re2.Regexp[str] = re2.compile(...)

def __repl(m: re2.Match[str]) -> str: ...

pattern.sub(__repl, ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants