regexfinder

Introduction

Regexfinder is a set of tools to evaluate and manipulate regular expressions. In particular, the package can:

optimize a size/complexity tradeoff for a regular expression
visualize a regular expression as a graph
compute the number of string paths allowed by a regular expression
generate random strings satisfying a regular expression

Size/complexity tradeoff

A regular expression defines a set of strings. Our general intuition is that we want the regex to include as few strings as possible (i.e. be as small as possible) but also not be too long or complex. Regexfinder formalizes this tradeoff by using two ideas from information theory. The size of the regular expression is the information theoretic entropy of a distribution on the regular expression (we start with the uniform distribution). And the complexity of the regular expression is the length of the regular expression which is motivated by Kolmogorov Complexity.

Using a weight parameter α, ent for entropy and K for complexity, regexfinder minimizes

over regular expressions that contain a provided set of strings.

For example, if the set of strings all have length 1 and contain the digits 0,1,2,4,5,7,8,9, then an admissible regular expression is RE1 = [01245789]. If α = 1, then

φ(RE1) = log2(8) + 10 = 13.

Another regular expression that contains all of these strings is RE2 = \d, and

φ(RE2) = log2(10) + 2 = 5.52.

So, in this case, \d would be chosen over [01245789].

Graph Representation

The tool creates a graph representation of a regular expression. For example, the regular expression

r'\d{2}([a-m]{2}[n-z]{2}|[A-Z]{4})\d'

is represented as:

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
images		images
LICENSE		LICENSE
README.md		README.md
example.ipynb		example.ipynb
regexfinder.py		regexfinder.py
requirements.txt		requirements.txt
unitTestExample.py		unitTestExample.py
unitTests.py		unitTests.py
unit_test.py		unit_test.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

regexfinder

Introduction

Size/complexity tradeoff

Graph Representation

About

Releases

Packages

Contributors 2

Languages

License

clockworkmicro/regexfinder

Folders and files

Latest commit

History

Repository files navigation

regexfinder

Introduction

Size/complexity tradeoff

Graph Representation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages