RuiFlex is a light abstraction over Flex. It tries to make the typical
Flex usage really simple and quick. User creates a RuiFlex control
file, which includes Ruby commands that create lexer token
descriptions. There is possibility for some Flex configuration as
well. RuiFlex generates, based on the control file, a Flex input file
(*.l
) and a corresponding C header file (*.h
), where tokens have
symbolic C names (i.e. defines).
Control file includes mainly token descriptions. For example:
Token.new( %q{calc}, :keyword )
Token.new( %q{"+"}, :longop, "OP_PLUS" )
Token.new( %q{[ \t\v\f]}, :space ).action( "/* Skip space */" )
The first description creates a lexer entry which matches to regexp
calc
and returns a C define value TOK_KEY_CALC
, i.e. token
id. Token id values are greater than 300, so they don't get mixed with
ASCII values, e.g. if some characters are used literally.
The regexp is a string using Ruby single quoting rules in order to make the special char handling as easy as possible. You can use whatever is good for the particular entry, since in the end it is just a Ruby String and passed to Flex as is.
The second description creates a lexer entry which matches an operator
(+
), and returns a C define value TOK_OP_PLUS
.
The third is for eating white space (excluding newlines). The default
action, which is to return the token id, is skipped, and the lexer
will continue with further characters after :space
.
In *.l
file these look like:
"calc" { return TOK_KEY_CALC; }
"+" { return TOK_OP_PLUS; }
[ \t\v\f] { /* Skip space */ }
The second argument for Token.new
is a token class, which defines
its behavior. Keywords are unique, so the token id can be directly
constructed from the regexp.
For the operator we want to specify the token id explicitly.
For the complete list of token classes, run:
shell> ruiflex -d
Flex options are set with FlexOpt.set
commands:
FlexOpt.set( :reentrant, true )
This will set the %option reentrant
option for Flex. There is also
:lineno
, and literal :flexopt
which is a list of custom entries to
%option
.
You can add your own code to *.l
header (with :l_header
), and to
*.l
footer (with :l_footer
). Likewise you can add own code to
*.h
C file (with :h_header
) and to footer (with :h_footer
).
For example:
FlexOpt.set( :l_header, "\n#include <my_defs.h>\n" )
See example
directory and README.md
for two simple examples.
The generated Flex file (*.l
) includes all set options and token
descriptions. Additionally it includes generated C code which can be
used to get information about the used tokens at runtime.
Some convenient character classes are defined in the top of the file.
User have access to ruiflex_token_desc
function, which returns a
string describing the token class.
There is also function ruiflex_token_id
, which returns token id in
text format (as string).
These token info functions are useful in parser error reporting.
The generated C header file (*.h
) includes prototypes of used Flex
API functions, and all the token id defines. There are also prototypes
for token info functions.
RuiFlex assumes that it can use basename of the control file for generating the other files.
For example if control file is my_tokens.rb
, the files
my_tokens.l
, my_tokens.h
will be created, and optionally also
my_tokens.c
is created, if ruiflex
is run with -t
command line
option.
RuiFlex generates Flex file for C programmers. However, it is sometimes useful to provide other targets. For example if user wants to mix C-parser with MRuby backend, the MRuby backend might need to have the same token info as C.
Custom generators are specified on command line for RuiFlex. Custom generator is a user implemented Ruby class, which is inherited from RuiFlex internal base class. User provides custom implementation for active method, and creates one "dummy" instance of the custom class, in order to register the generator to RuiFlex.
Ruiflex calls methods in this order:
-
"open" - Used to open the output file(s).
-
"tokens" - List of all Token objects are passed here.
-
"token_ids" - Hash of { idstr => [id, descstr] } for all tokens are passed here.
-
"close" - Used to close the output file(s).
See example/ruigen for an example of custom class and example/doit for example RuiFlex execution.
RuiFlex intents to target the very basic Flex usage. If you need
something fancy, you can suggest a new feature, or you can just tweak
the ruiflex
command on your own.