Skip to content

Latest commit

 

History

History

parser

SystemVerilog Lexer and Parser

This directory contains the SystemVerilog lexer and parser implementations. The goal for the parser is to be able to accept all valid SystemVerilog (IEEE 1800-2017), as defined in the SV-LRM. As of 2019, it accepts the vast majority of SystemVerilog syntax, but there is work ahead to reach 100%. Progress towards this goal is measured against open-source language-compliance tests at https://symbiflow.github.io/sv-tests/.

Unlike conventional toolchains' parsers that expect preprocessed forms as input, this parser accepts unpreprocessed code with some limitations. Thus, preprocessing directives are accommodated in the implemented grammar.

Decoupled Design

The lexer and parser are decoupled, which means that the lexer can be used standalone to tokenize text, and the parser is adapted to accept tokens from sources other than the direct use of the lexer. This separation enables the insertion of different passes between the lexer and parser, such as integrated preprocessing, and context-based lexical disambiguation (with arbitrary lookahead) where required by the language.

Lexer

The lexer is generated by Flex. Token enumerations come from the parser. The generated lexer implementation is wrapped in an adapter that makes it return tokens (instead of just an int). The stream of tokens returned by the lexer have the following properties:

  • Continuity: The text range end of one token is equal to the text range start of the next token.
  • Completeness: The text range spanned by the first and last tokens is equal to that of the original text that was scanned.

This also means insignficant whitespace text (spaces, newlines) are represented as tokens, and non-syntax tokens such as comments are included. Such tokens are easy to filter out before passing them onto the parsing phase.

This lexer follows SystemVerilog lexical definitions, including that of the preprocessing sub-language, because it is targeted at unpreprocessed code.

We provide a standalone tool for examining tokens for any valid SV source file.

Token Classifications

Token classification functions provides functions that logically group together sets of token enumerations, so that client code does not have to repeat the same logic in different places.

Parser

The parser is generated by Bison, an LALR(1) parser generator. The generated parser implementation is wrapped in an adapter that allows it to work on tokens from any source, not just the lexer. This gives developers the opportunity to inserter filtering or transformation passes between the lexer and parser.

The parser outputs a concrete syntax tree (CST) whose generic nodes are "typed" using enumerations.

We provide a standalone tool for examining the CST for any valid SV source file.

Lexical Context

Parsing SystemVerilog is wrought with challenges that defy conventional LR- grammars. LexicalContext is a token transformation pass that aims to help the Bison-generated parser implementation by:

  1. Disambiguating tokens that are used in multiple syntactic contexts.
  2. Performing distance lookahead that would otherwise not be expressible in an LR grammar.

It operates like a composition of state-machines that scan and mutate tokens.