Skip to content

This project provides a word hierarchy builder. It builds a tree out of a set of words which can then be navigated by a WordProcessor to generate e.g. Regexps that match any of the words in the given set.

License

Notifications You must be signed in to change notification settings

bwagner/wordhierarchy

Repository files navigation

wordhierarchy

This project provides a word hierarchy builder. It builds a tree out of a set of words which can then be navigated by a WordProcessor to generate e.g. Regexps that match any of the words in the given set.

Example:

java -jar dist/wordhierarchy.jar Euch Euer Eure Eurer
 Eu -
  er 
  ch 
  re 
   r 

(?:Eu(?:er|ch|re(?:r)?))

The output of the command line program is the input partitioned into common parts of words. If a part of a word does not complete a word, a - is appended (above: Eu -). If a part of a word does indeed complete a word, no - is appended (above: er, ch, re, r).

The last line of the output is a Java regexp that matches the set of words. It is built by the included RegexWordProcessor, which can easily be adapted to other regexp dialects (e.g. perl, etc.).

This example shows the command line use which is merely intended for demonstration purposes. It's mainly to be used is as a library.

Todo

  • works best for words with common substrings from the left. Could be improved to work with substrings anywhere.

  • look at Trie

  • simplify using ideas from this post

  • improve when there's a row of single characters: Instead of e.g. (?:3|8|1|6|4) it could generate [38164].

  • improve command line: offer options to generate different regexp dialects.

External Dependendencies

Authors

Bernhard Wagner

License

Copyright 2011 Bernhard Wagner.

Licensed under GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

About

This project provides a word hierarchy builder. It builds a tree out of a set of words which can then be navigated by a WordProcessor to generate e.g. Regexps that match any of the words in the given set.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages