source/main.civet
: wrapper for both parser and compiler
source/compiler.civet
: Transforms Hera rules into an executable JS file.
source/old_main.coffee
: used for benchmarking and remembering simpler times
source/hera.hera
: Parser source file
source/machine.ts
: PEG machinery included in parser
source/rules.json
: json structure containing Hera rules, generated by parsing
source/hera.hera
.
./dist/hera --ast < source/hera.hera > source/rules.json
source/parser.js
: generated by compiling the source grammar into a commonjs module
./dist/hera --libPath ./machine.js < source/hera.hera > source/parser.js
--libPath
tells where the generated parser should look to find the PEG
machinery. Default is @danielx/hera/lib
but that won't work when building
internally.
Currently there is an automatic tokenization in Hera passed in as a parser option. It doesn't allow for handlers to return $skip or other handler code to run.
The transforms were previously skipped by checking the flag in the transform handler wrappers $T, $TR, $TV but instead I plan to have the parser generation handle it.
We have to skip the handlers because we are returning token nodes instead of what the handler expects. In theory we may be able to build a parallel token tree at the same time to completely match the behavior of the non-tokenize parse.
$EXPECT is used for tracking failures to match literals or regexes by giving a more friendly name and grouping and discarding by farthest matched position.
We may be able to remove it and pass context down to the leaf handlers. This will be necessary to make rules importable across Hera grammars.
Adding types to the parser would be cool so that at each node you could have access to great intellisense and auto completions.
Some challenges:
Indirect circular reference causes Arrow$0
to be typed as any.
Even though fail doesn't contribute any type information it is still returned
from the call to parserState
which is passed Arrow
which has
Arrow$0(state)
as a return value.
const { parse, fail } = parserState({
// ... <snip> ...
Arrow: Arrow,
})
const Arrow$0 = $S($EXPECT($L6, fail, "->", "Arrow"), $Q(_))
function Arrow(state: ParseState) {
return Arrow$0(state);
}
Pulling fail out of the parser constructor fixes it.
https://kyleshevlin.com/discriminated-unions-and-destructuring-in-typescript
Generate types for small RegExp character classes(~100 ish)?
const $R0 = $R(new RegExp("[$&!]", 'suy'));
const $R1 = $R(new RegExp("[+?*]", 'suy'));
Could have types like:
type RegExpCharacterClass<T> = [T, T] // $0 and $1 are the same
type $R0_T = Parser<RegExpCharacterClass<"$" | "&" | "!">>
type $R1_T = Parser<RegExpCharacterClass<"+" | "?" | "*">>
It would be nice to have really good code coverage regardless of JS, TS, or CoffeeScript. CoffeeScript has good coverage reports when using @danielx/coffeecoverage. CoffeeScript coverage is not good when using sourcemaps, but TS coverage seems ok with them.
Istanbul/ncy works ok with either CoffeeScript or TS but combining both is a challenge because it is difficult to split out which files should be instrumented by nyc and which should be instrumented by coffeecoverage.
It seems promising to use a custom istanbul instrumentor to handle instrumenting TS/JS and use coffeecoverage for CoffeeScript. This should give reliabel reports with minor changes to configuration and setup.
An alternative approach would be to use sourcemaps and istanbuls built in instrumentation for everything. This has the benefit of testing the actuall dist/main.js bundle but the source maps aren't great and become much worse after minification even for TS code. For high quality coverage, especially at thresholds around 100%, this seems like too much work to get all sourcemaps to be viable.
Thought: since nyc + istanbul uses babel underneath anyway, why not use a babel config to handle this for testing? There are probably many more documented cases of people mixing and matching languages and plugins using babel and since we're already using babel via nyc it's not clogging up our dependencies any extra.
What is the differenc between presets and plugins?
Presets are collections of plugins. Plugins run before presets in the order they are listed. Presets run after in the reverse order they are listed.
What is @babel/preset-env?
@babel/preset-env transpiles JavaScript language features to different runtime environments. It does this automatically without you needing to specify individual babel plugins but only specifing the target runtime (browsers, node, es*, etc.).
What is @babel/plugin-transform-runtime?
Babel polyfills runtime code like _extend in each file that uses those features. Using transform-runtime will make all those references use @babel/runtime to cut down on duplication across files.
@babel/register installs source-map-support which hooks into Error.prepareStackTrace the first time a .ts or .js file is compiled with babel. This override the CoffeeScript hook to Error.prepareStackTrace.
Not sure how to get these to play nice yet so just going with the Coffee stack traces for now.
esbuild resolves and transpiles .civet with @danielx/civet/esbuild-plugin
but can't build types.
ts-node resolves and transpiles .civet with @danielx/civet/esm
loader and works fine for tests with --transpileOnly
VSCode resolves and has editor integration when Civet Language Server is installed.
tsc can't resolve .civet microsoft/TypeScript#16607
To work around tsc limitations create a an additional folder types
and add it to rootDirs
in build/tconfig.json
Create .d.ts files for modules that tsc needs to resolve. Name ones for .civet
files .civet.d.ts
Now tsc will be able to use that additional root to resolve. Also works for parser.js
.
These can be updated manually or transpiled one at a time as needed using tsc.
Packrat Parsing
2022-09-22 | 1.50 | Exploring rewriting in Civet 2022-09-23 | 1.25 | manually converting compiler.civet 2022-09-24 | 3.00 | tests working with compiler.civet
2022-10-21 | 1.50 | figuring out .civet code coverage 2022-10-22 | 1.00 | util.coffee -> .civet
2022-11-11 | 3.00 | rebuilding parser; events 2022-11-22 | 1.00 | cacheable events; __dirname 2022-11-24 | 1.00 | publish v0.8.0 with events
2023-01-08 | 2.00 | update civet; pass data between enter/exit events; annotated errors 2023-01-15 | 2.50 | EBNF -> railroad diagram
TODO: