Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement String.prototype.matchAll and related #1731

Merged
merged 2 commits into from
Nov 27, 2024

Conversation

andreabergia
Copy link
Contributor

@andreabergia andreabergia commented Nov 25, 2024

This PR implements:

All test262 cases pass; those that do not, fail for pre-existing reasons unrelated to these changes:

  • unicode flag 'u' not supported - looks similar to Support ES2018 Regex Unicode Property Escapes #959 but I'm not 100% sure
    • RegExp/prototype/Symbol.matchAll/species-constructor.js
    • String/prototype/matchAll/flags-undefined-throws.js
  • flag property on regexp is not modifiable - tracked by Cannot set property flags on regexp #1730
    • String/prototype/matchAll/flags-nonglobal-throws.js
    • RegExp/prototype/Symbol.matchAll/this-get-flags.js
    • RegExp/prototype/Symbol.matchAll/this-tostring-flags.js
    • RegExp/prototype/Symbol.matchAll/this-tostring-flags-throws.js
  • pre-existing problem with computed property on getter/setter and symbol keys - tracked by Cannot define an object literal with a getter for a symbol property #1729
    • Regexp/prototype/Symbol.matchAll/isregexp-this-throws.js
    • Regexp/prototype/Symbol.matchAll/isregexp-called-once.js
    • Regexp/prototype/Symbol.matchAll/species-constructor-get-species-throws.js
  • we autobox the this object, so we get a NativeString instead of the raw string literal and the last assertion fails
    • String/prototype/matchAll/regexp-matchAll-invocation.js
    • String/prototype/matchAll/regexp-prototype-matchAll-invocation.js
  • we autobox arguments to function calls, so f.call(true) receivedes a NativeBoolean which is an object
    • Regexp/prototype/Symbol.matchAll/this-not-object-throws.js

Please squash if accepted!

Closes #933

@rbri
Copy link
Collaborator

rbri commented Nov 25, 2024

@andreabergia Great ❤️

the architecture tests are failing with jdk 21 - will have a look later if you like

@andreabergia
Copy link
Contributor Author

@andreabergia Great ❤️

the architecture tests are failing with jdk 21 - will have a look later if you like

Noticed. There's two ways to make this pass:

  1. move the code that registers the NativeRegExpStringIterator in the context into the RegExpProxy, which I have done in 03303eb
  2. modify the ArchitectureTest to allow the usage from ScriptRuntime.

I have implemented 1), but honestly I think 2) would be a better choice. I couldn't figure out the correct incantation for archunit though.

@rbri
Copy link
Collaborator

rbri commented Nov 25, 2024

There's two ways to make this pass

Hi @andreabergia,

had a quick look and i like your commited solution more than changing the architecture test. For my understanding this RegExpProxy is a way to decouple the regexp impl from the rhino core and make it exchangeable. And therefore i think having this init method in the interface is correct.

And something like

...
.doNotHaveFullyQualifiedName("org.mozilla.javascript.RegExpProxy")
.or().doNotHaveFullyQualifiedName("org.mozilla.javascript.ScriptRuntime")

should work... (untested)

@andreabergia
Copy link
Contributor Author

had a quick look and i like your commited solution more than changing the architecture test. For my understanding this RegExpProxy is a way to decouple the regexp impl from the rhino core and make it exchangeable.

Sounds good. Thanks for checking!

@rbri
Copy link
Collaborator

rbri commented Nov 25, 2024

@andreabergia did a quick smoke test with HtmlUnit and it looks good!

Copy link
Collaborator

@gbrail gbrail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- this is great progress.

I have a few specific comments, the big one being whether this is going to break the (admittedly theoretical) ability for the code to function if "RegExpImpl" isn't in the classpath.

@@ -188,6 +188,7 @@ public static ScriptableObject initSafeStandardObjects(

NativeArrayIterator.init(scope, sealed);
NativeStringIterator.init(scope, sealed);
getRegExpProxy(cx).register(scope, sealed);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that by doing this, we're changing the implementation so that it'll bomb if "RegExpImpl" isn't in the classpath, because "getRegExpProxy" will return null in that case. I'm not sure that we have ever taken advantage of that fact, but I think that with the current lazy constructor, a less catastrophic failure will result, and like all things Rhino, we don't really know what people are doing with the general capability to have the reg exp implementation be optional. Is there a different way to do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed that in 314de9e

return ITERATOR_TAG;
}

private Scriptable regexp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the Rhino creators in the Old Days seemed to like to put variables at the end of the class, but for new stuff I think it's OK (and preferable) if we act like modern-day Java developers and put them at the beginning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -2761,7 +2768,7 @@ public Object execIdCall(
return realThis(thisObj, f).toString();

case Id_exec:
return realThis(thisObj, f).execSub(cx, scope, args, MATCH);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we're also changing stuff in IdScriptableObject that's been there a while, so I'm trying to follow this. What is changing here that's requiring us to treat the name of the function object differently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that, in IdScriptableObject, the method ensureType (used by realThis) took an IdFunctionObject. However, that parameter was only used to get the function name. So, I've refactored it and added an overload that only takes the function name.
This allows me to have a method NativeRegExp::js_exec that does not go through the IdFunctionObject at all, and so it can easily be called from NativeRegExpStringIterator to implement point 3 of the spec algorithm RegExpExec.

This includes:
- `Symbol.matchAll`
- `String.prototype.matchAll`
- `RegExp.prototype[Symbol.matchAll]`

All test262 cases pass; those that do not, fail for pre-existing reasons
unrelated to these changes. Added also some manual tests.
@andreabergia
Copy link
Contributor Author

Rebased onto the latest master branch and squashed a bit

@rbri
Copy link
Collaborator

rbri commented Nov 26, 2024

@gbrail - will be great if you can merge this soon - i plan a new HtmlUnit release during the next days and i like to have this in

@gbrail
Copy link
Collaborator

gbrail commented Nov 27, 2024

OK, works for me, glad you all like it ;-)

@gbrail gbrail merged commit c915727 into mozilla:master Nov 27, 2024
3 checks passed
@andreabergia andreabergia deleted the string-match-all branch November 27, 2024 12:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ES2020 String.prototype.matchAll
3 participants