Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re.allchar might be missing from the string formula API #281

Closed
Swalkyn opened this issue Sep 13, 2022 · 4 comments · Fixed by #282
Closed

re.allchar might be missing from the string formula API #281

Swalkyn opened this issue Sep 13, 2022 · 4 comments · Fixed by #282
Milestone

Comments

@Swalkyn
Copy link
Contributor

Swalkyn commented Sep 13, 2022

Hello there,

I'm working with the Unicode strings theory, and I want to create a RegexFormula for the regular expression accepting any single character (usually written as .). This is a function provided by the theory (search for re.allchar).

I did not find the corresponding function in StringFormulaManager's JavaDoc. Searching through the source code did not yield anything. I am quite sure that using makeRegex(".") matches a single dot character, and not any character.

Is this function omitted on purpose ? Was it simply confused with re.all ? Or am I missing something ?

If it turns out that re.allchar is not omitted on purpose, I would be glad to submit a PR implementing it (for Z3 at least).

@Swalkyn
Copy link
Contributor Author

Swalkyn commented Sep 13, 2022

Right after posting, I found the source of the problem. It seems that allchar was removed and a function to directly convert from Pattern object was instead planned. However, it was never implemented but allChar was not added back. See #250, in particular this comment and the following.

@Swalkyn
Copy link
Contributor Author

Swalkyn commented Sep 14, 2022

I am working on a PR to solve this issue. It should be ready by the end of the day.

Swalkyn added a commit to Swalkyn/java-smt that referenced this issue Sep 14, 2022
This function corresponds to `re.allchar` in
http://smtlib.cs.uiowa.edu/theories-UnicodeStrings.shtml.

It is implemented in the Z3 and CVC4 solvers.
@kfriedberger
Copy link
Member

kfriedberger commented Sep 14, 2022

Hi @Swalkyn
first of all, thanks for your effort to improve support for String and Regex theory in JavaSMT.

The reason for removing allChar was its replacement with range over a specified alphabet like a-z. This is described in this comment.
I am unsure how many chars are currently supported by different SMT solvers, it might be UTF8 or ASCII, and might depend on the solver. The SMTLIB standard specifies a concrete range of values in the alphabet: all the Unicode code points ... ranging from 0x00000 to 0x2FFFF.

At that point in time, we seem to have forgotten to check all other regex- and string-based methods for this issue,
We should first determine the range of allowed chars for String and Regex theory, and then take a look at whether merging your pull request #282 brings an improvement.

@Swalkyn
Copy link
Contributor Author

Swalkyn commented Sep 15, 2022

Ah yes my mistake, I should have read the discussion more carefully. I did not realize CVC4 was not adhering to the SMTLIB standard.

  • Z3 supports Unicode ranges, but returns unknown when solving with full range.
  • CVC4 seems to only accept Ascii, and crashes with full Unicode range.

I believe this has not changed, see http://cvc4.cs.stanford.edu/wiki/Strings and https://microsoft.github.io/z3guide/docs/theories/Strings. Also, CVC4 will probably always have this difference since it is being replaced by CVC5 (which works with Unicode as specified in the SMTLIB standard).

However, Z3 and CVC4 both provide a function corresponding to re.allChar in their API. Even though it does not have the exact same behavior, it is intended to be used for the same purpose: match any single character.

For that reason, I would argue that having allChar in java-smt is important. When the user wants to match any character, they should not have to specify a different range based on which solver they are using.

kfriedberger added a commit that referenced this issue Sep 16, 2022
@kfriedberger kfriedberger added this to the 3.14.0 milestone Sep 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants