-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Regex Support in Standard Library for JS Backend #592
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! :)
I can't review it fully, but I have a few nitpicks I pointed out in the review: most of them are just stylistic.
What I would really like to see before merging are a few relevant tests in effekt/examples/stdlib/
, feel free to make a new folder.
Our tests are always regression tests, so there's an .effekt
file and a corresponding .check
file. You can either just print out the results (see here) or use the test suite written in Effekt (see here).
The tests are especially important if you're planning to depend on this library going forward -- otherwise we won't notice a regression.
The tests are failing since |
I refactored the lexer example to account for the changes. The other occurrences shouldn't be affected since I'm only changing the JS backend? Please correct me if I oversaw anything. |
Thank you for your thorough review. I've implemented your suggestions, making the code more idiomatic. The addition of typed flags should notably enhance usability. I plan to add the test cases tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through the PR again and LGTM
Thank you for the contribution :)
I ran into some problems designing the test. While the LSP server and I cannot find any error in the code, the compiler fails with the error @jiribenes If you see any obvious problems why it doesn't type check I really would appreciate your help :) |
I'll take a look. This error usually means that UFCS ( |
Is there a way to run certain tests only for the JS-Backend? I would like to add test for the new features, but it seems that the chez-backend is not happy with that. At least I have no trouble running the test succesful on my machine using only the JS-Backend. |
It looks like this test even fails on JS: https://github.com/effekt-lang/effekt/actions/runs/10919082132/job/30306075187?pr=592#step:11:402 |
You can add the file into So for example for Chez-$something backends: abstract class StdlibChezTests extends StdlibTests {
override def ignored: List[File] = List(
// Not implemented yet
examplesDir / "stdlib" / "bytes",
examplesDir / "stdlib" / "io",
// Not implemented: advanced regex features
examplesDir / "stdlib" / "string" / "regex.effekt",
)
} |
It seems to work for me locally so it might be a NodeJS version issue? EDIT: it seems that the last part of the test labelled "capture groups" throws some error somewhere. EDIT 2: I think that NodeJS 12.x does not support the |
Don't know how to resolve this if we want to keep the functionality. We could update NodeJS: that might be a wise move anyway since even the extended support for NodeJS 12.x ended 2 years and 4 months ago (30 Apr 2022). Current oldest somewhat supported LTS version is NodeJS 18.x (where security support ends in 7 months). |
I just tested it locally and can confirm that. I will try to add a condition that the Do you see any problems with that? |
We can just require a newer node version, that shouldn't be a problem. |
libraries/js/text/regex.effekt
Outdated
function process$version() { | ||
const v = process.version | ||
const parts = v.split('.') | ||
return parseInt(parts[0].replace('v', '')) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a tiny nitpick: according to StackOverflow, process.versions.node
should have the version without v
:)
https://stackoverflow.com/questions/6656324/check-for-current-node-version
(though I haven't tried it on NodeJS 12.x, it might be too new 🤔)
Thanks for the PR! I think it is great to have improved regex support in the language. I have one major concern, which is the compatibility with the LLVM backend. In general, I do not care so much anymore about the Chez and ML backends. However, we should aim for feature completeness of the LLVM backend. Adding JS-only features now will make this more and more difficult in the future. @phischu what is your opinion when viewing this from the LLVM perspective? |
Which of the extensions that are proposed in this PR are compatible with the regular expressions in POSIX? |
|
To clarify: I am not proposing that you should implement the LLVM part. I just want to know what the intersection between posix regex and your proposed features is. |
libraries/js/text/regex.effekt
Outdated
record Match(matched: String, index: Int) | ||
// matches and groups | ||
record Range(start: Int, end: Int) | ||
record Group(matched: String, index: Option[Range]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just always generate indices and turn this into index: Range
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely possible and should decrease complexity
libraries/js/text/regex.effekt
Outdated
namespace internal { | ||
extern io def unsafeExec(reg: Regex, str: String): RegexMatch = js "regex$exec(${reg}, ${str})" | ||
|
||
extern pure def unsafeRegex(str: String, flags: String): Regex = js "new RegExp(${str}, ${flags})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be io
(or global
, not sure which one this should be), because otherwise the same regex will potentially be inlined over and over again and not have the "stateful" behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification. I sometimes still struggle with FFI boundaries in effekt
libraries/js/text/regex.effekt
Outdated
* The returned Match object contains: | ||
* - matched: The full matched string. | ||
* - start: The starting index of the match. | ||
* - end: The ending index of the match if `Global` flag is set, otherwise 0 | ||
* - groups: A list of Group objects for each capturing group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this documentation to Match
and refer to it here?
Also, why is end 0
if global isn' set?
The POSIX standard for Regex does only include basic support for capture groups. Only the extraction of the content is possible here, but generating indices for those is not. Stateful matching is definitely not known in the POSIX world. Flags are a bit more difficult in compatibility. Usually flags are passed as additional command line arguments which makes it dependent on the used program if they are compatible or not. Does that answer the question? |
Thank you :) |
e981f37
to
5d34ec2
Compare
Update libraries/js/text/regex.effekt Update libraries/common/option.effekt Co-authored-by: Jonathan Immanuel Brachthäuser <[email protected]>
6459232
to
aab54e3
Compare
I have implemented the following changes as suggested previously:
Please feel free to share your thoughts on the current design. |
Currently, the tests fail for both LLVM and the Chez backend because they lack the flags interface. To proceed, I will disable the regex tests on those backends. As long as the regex engine of a backend supports flags, implementing those should be straightforward, as the only stateful Flag ( |
95eebc4
to
c6997f0
Compare
This pull request significantly improves regex support in Effekt's standard library for the JavaScript backend. It introduces capture groups, group indices, and regex flags, addressing current limitations and providing a more powerful regex interface.
Key Enhancements:
Current Limitations:
Effekt's regex support is currently limited, lacking support for capture groups and advanced regex features. This implementation bridges this gap for JavaScript environments.
New Features:
Backwards compatibility
This change is not backwards compatible.
Feedback on the API design and usability from experienced Effekt developers is highly appreciated.