-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use re2 instead of re for match expressions #67
base: main
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will require a comment to bypass mypy type hint checking.
src/celpy/evaluation.py
Outdated
@@ -39,6 +39,7 @@ | |||
import logging | |||
import operator | |||
import re | |||
import re2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing mypy errors. I think this needs a # type: ignore [import-untyped]
to ignore the lack of type hints.
This module may be a little too immature if it lacks type hints. Is there another choice?
py38: commands[6]> poetry run mypy --show-error-codes src
src/celpy/evaluation.py:42: error: Skipping analyzing "re2": module is installed, but missing library stubs or py.typed marker [import-untyped]
src/celpy/evaluation.py:42: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
Found 1 error in 1 file (checked 9 source files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on re2 libraries, but in the interest of showing my work, here's what I found about the top results when searching for "re2 python":
- The proposed re2 library in this PR: https://github.com/google/re2
pip install google-re2
(last update: 6 hours ago) https://snyk.io/advisor/python/google-re2 (Snyk health score: 96/100) - https://github.com/facebook/pyre2/
pip install pyre2
(last update: 5 years ago) https://snyk.io/advisor/python/pyre2 (Snyk health score: 57/100) - https://github.com/axiak/pyre2/
pip install re2
(last update: 8 years ago) https://snyk.io/advisor/python/re2 (Snyk health score: 45/100) - https://github.com/freepn/py-re2
pip install py-re2
(last update: 3 years ago) https://snyk.io/advisor/python/py-re2 (Snyk health score: 33/100)
I don't know how much to weigh the Snyk scores but they do give another data point in addition to the most recent commit date.
My thinking is that google-re2
is the best of the bunch.
Regarding mypy, the various re2 packages are intended to be drop-in replacements for re
so they should have compatible interfaces. I don't know how to tell mypy to use the same typestubs as re
for re2
. If you happen to know if there is an easy way to do that, please let me know.
Otherwise, I can add support in this repo for typestubs just for the portion of google-re2
that we use, using https://mypy.readthedocs.io/en/stable/stubgen.html.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid research. One that has type hints is the bare minimum for me. After that, you've ranked them nicely.
The standard library re
package relies on the typeshed stub definitions. (https://github.com/python/typeshed/blob/main/stdlib/re.pyi) With a name change, this file could be repurposed to attempt to cover an RE2 implementation without hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've drafted some typestubs and now I'm checking to see if google-re2 will take them: google/re2#496
If not, I'm inclined to just publish them separately (e.g., google-re2-stubs
) and they can be pulled into this project.
I can also just vendor in the typestubs just into this project, but that doesn't seem as community-oriented as just publishing typestubs for every one.
Although it's obvious that the language definition specifies RE2 syntax for regular expressions, it's possibly less obvious that all of the aforementioned Python wrappers are precisely that: Python wrappers... around RE2, the C++ library. Given that this project aims to be pure Python and to have minimal dependencies, it's worth noting what might be a necessary compromise. |
I don't have much context about the north star of "pure Python and minimal dependencies", but if we want to maintain that, I can add the re2 capability as a package extra. |
@slott56 ready for re-review. Some questions: when I ran Anyway, I manually applied a style which I hope is compatible with the repo style, but let me know if you have any guidance on checking the formatting style automatically. |
Black has (almost) no configuration, which is why it's widely used. We don't include a format check on this repository right now, but we will in the future. I need to confirm Kapil's preference (Ruff Format vs. Black) before I enable it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow is this good. I have a few small requests.
@@ -44,6 +44,10 @@ python-dateutil = "^2.9.0.post0" | |||
pyyaml = "^6.0.1" | |||
types-pyyaml = "^6.0.12.20240311" | |||
types-python-dateutil = "^2.9.0.20240316" | |||
google-re2 = { version = "^1.0", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Excellent. As is the presence of the stubs.
src/celpy/evaluation.py
Outdated
import re2 | ||
except ImportError: | ||
pass | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit convoluted. Can we put the _USE_RE2 = True
after the import
and lose this else:
block?
src/celpy/evaluation.py
Outdated
return celpy.celtypes.BoolType(m is not None) | ||
|
||
|
||
def function_matches(text: str, pattern: str) -> Result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we hoist the if
outside this function?
if _USE_RE2:
def function_matches(...
... using _function_matches_re2()
else:
def function_matches(...
... using _function_matches_re()
The idea is to avoid evaluating the if
statement more than once at import time.
tests/test_evaluation.py
Outdated
@@ -158,6 +158,13 @@ def test_operator_in(): | |||
assert isinstance(operator_in(celtypes.IntType(-1), container_2), CELEvalError) | |||
|
|||
|
|||
def test_function_matches(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to assume that RE2 is installed as part of the testing. I'm not sure that's best. I may be wrong about this, but I can imagine someone declining the RE2 add-on and then being unable to test CEL.
Maybe there should be a @pytest.skipif()
option to test the two low-level functions depending on the _USE_RE2
global, as well as a separate generic test of function_matches()
?
(I love the test of an RE2-specific feature, that's fabulous!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone testing cel-python
will have to install google-re2-stubs
if they want to type-check the code and ...
google-re2-stubs
depends on google-re2
because that is the way [1] for it to indicate which versions of google-re2
it is compatible with and also that it is intended to be a stub for the google-re2
distribution of re2
and not one of the other distributions of re2
which all claim to be "drop-in" replacements for re
but in practice have slightly different APIs.
So, if we need to type-check we need the stubs and installing the stubs installs the implementation, and we cannot conditionally install the stubs because the code will not type-check unless we install the stubs.
I happen to be the maintainer of google-re2-stubs
so I can obviously drop the dependency on google-re2
so that cel-python
can install the stubs without installing google-re2
but given the weirdness in the re2
ecosystem noted above, I think that is not optimal for other reasons.
I can't think of a good way to conditionally type-check that isn't just the same as not type-checking a file at all. If you can think of a way, I'm happy to add it to the PR.
References
[1]
In addition, stub-only distributions MAY indicate which version(s) of the runtime package are targeted by indicating the runtime distribution’s version(s) through normal dependency data. For example, the stub package flyingcircus-stubs can indicate the versions of the runtime flyingcircus distribution it supports through dependencies field in pyproject.toml.
https://typing.readthedocs.io/en/latest/spec/distributing.html#stub-only-packages
Cool. If it helps, in my other projects, we transitioned from |
CEL specifies that match should use re2 pattern matching and not re semantics.
@slott56 ping. Any changes you want to request? |
@ddn0 thanks for the updates and making a re2 stub package. |
CEL specifies that match should use re2 pattern matching and not re semantics.
Closes #58