Skip to content

Latest commit

 

History

History
213 lines (147 loc) · 7.02 KB

NOTES.md

File metadata and controls

213 lines (147 loc) · 7.02 KB

Hera

Building

Source files

source/main.civet: wrapper for both parser and compiler source/compiler.civet: Transforms Hera rules into an executable JS file. source/old_main.coffee: used for benchmarking and remembering simpler times source/hera.hera: Parser source file source/machine.ts: PEG machinery included in parser

Generated files

source/rules.json: json structure containing Hera rules, generated by parsing source/hera.hera.

./dist/hera --ast < source/hera.hera > source/rules.json

source/parser.js: generated by compiling the source grammar into a commonjs module

./dist/hera --libPath ./machine.js < source/hera.hera > source/parser.js

--libPath tells where the generated parser should look to find the PEG machinery. Default is @danielx/hera/lib but that won't work when building internally.

Tokenize Mode

Currently there is an automatic tokenization in Hera passed in as a parser option. It doesn't allow for handlers to return $skip or other handler code to run.

The transforms were previously skipped by checking the flag in the transform handler wrappers $T, $TR, $TV but instead I plan to have the parser generation handle it.

We have to skip the handlers because we are returning token nodes instead of what the handler expects. In theory we may be able to build a parallel token tree at the same time to completely match the behavior of the non-tokenize parse.

$EXPECT

$EXPECT is used for tracking failures to match literals or regexes by giving a more friendly name and grouping and discarding by farthest matched position.

We may be able to remove it and pass context down to the leaf handlers. This will be necessary to make rules importable across Hera grammars.

TypeScript

Adding types to the parser would be cool so that at each node you could have access to great intellisense and auto completions.

Some challenges:

Indirect circular reference causes Arrow$0 to be typed as any.

Even though fail doesn't contribute any type information it is still returned from the call to parserState which is passed Arrow which has Arrow$0(state) as a return value.

const { parse, fail } = parserState({
  // ... <snip> ...
  Arrow: Arrow,
})

const Arrow$0 = $S($EXPECT($L6, fail, "->", "Arrow"), $Q(_))
function Arrow(state: ParseState) {
  return Arrow$0(state);
}

Pulling fail out of the parser constructor fixes it.

https://kyleshevlin.com/discriminated-unions-and-destructuring-in-typescript

Cool Ideas

Generate types for small RegExp character classes(~100 ish)?

const $R0 = $R(new RegExp("[$&!]", 'suy'));
const $R1 = $R(new RegExp("[+?*]", 'suy'));

Could have types like:

type RegExpCharacterClass<T> = [T, T] // $0 and $1 are the same

type $R0_T = Parser<RegExpCharacterClass<"$" | "&" | "!">>
type $R1_T = Parser<RegExpCharacterClass<"+" | "?" | "*">>

Istanbul TypeScript + CoffeeScript code coverage

It would be nice to have really good code coverage regardless of JS, TS, or CoffeeScript. CoffeeScript has good coverage reports when using @danielx/coffeecoverage. CoffeeScript coverage is not good when using sourcemaps, but TS coverage seems ok with them.

Istanbul/ncy works ok with either CoffeeScript or TS but combining both is a challenge because it is difficult to split out which files should be instrumented by nyc and which should be instrumented by coffeecoverage.

It seems promising to use a custom istanbul instrumentor to handle instrumenting TS/JS and use coffeecoverage for CoffeeScript. This should give reliabel reports with minor changes to configuration and setup.

An alternative approach would be to use sourcemaps and istanbuls built in instrumentation for everything. This has the benefit of testing the actuall dist/main.js bundle but the source maps aren't great and become much worse after minification even for TS code. For high quality coverage, especially at thresholds around 100%, this seems like too much work to get all sourcemaps to be viable.

Thought: since nyc + istanbul uses babel underneath anyway, why not use a babel config to handle this for testing? There are probably many more documented cases of people mixing and matching languages and plugins using babel and since we're already using babel via nyc it's not clogging up our dependencies any extra.

Babel

What is the differenc between presets and plugins?

Presets are collections of plugins. Plugins run before presets in the order they are listed. Presets run after in the reverse order they are listed.

What is @babel/preset-env?

@babel/preset-env transpiles JavaScript language features to different runtime environments. It does this automatically without you needing to specify individual babel plugins but only specifing the target runtime (browsers, node, es*, etc.).

What is @babel/plugin-transform-runtime?

Babel polyfills runtime code like _extend in each file that uses those features. Using transform-runtime will make all those references use @babel/runtime to cut down on duplication across files.

Babel + CoffeeCoverage Line numbers

@babel/register installs source-map-support which hooks into Error.prepareStackTrace the first time a .ts or .js file is compiled with babel. This override the CoffeeScript hook to Error.prepareStackTrace.

Not sure how to get these to play nice yet so just going with the Coffee stack traces for now.

How to build types with .civet files when using esbuild?

esbuild resolves and transpiles .civet with @danielx/civet/esbuild-plugin but can't build types.

ts-node resolves and transpiles .civet with @danielx/civet/esm loader and works fine for tests with --transpileOnly

VSCode resolves and has editor integration when Civet Language Server is installed.

tsc can't resolve .civet microsoft/TypeScript#16607

To work around tsc limitations create a an additional folder types and add it to rootDirs in build/tconfig.json

Create .d.ts files for modules that tsc needs to resolve. Name ones for .civet files .civet.d.ts Now tsc will be able to use that additional root to resolve. Also works for parser.js.

These can be updated manually or transpiled one at a time as needed using tsc.

References

Packrat Parsing

Timesheet

2022-09-22 | 1.50 | Exploring rewriting in Civet 2022-09-23 | 1.25 | manually converting compiler.civet 2022-09-24 | 3.00 | tests working with compiler.civet

2022-10-21 | 1.50 | figuring out .civet code coverage 2022-10-22 | 1.00 | util.coffee -> .civet

2022-11-11 | 3.00 | rebuilding parser; events 2022-11-22 | 1.00 | cacheable events; __dirname 2022-11-24 | 1.00 | publish v0.8.0 with events

2023-01-08 | 2.00 | update civet; pass data between enter/exit events; annotated errors 2023-01-15 | 2.50 | EBNF -> railroad diagram

TODO: