Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider alternatives to tree-sitter #41

Open
eyelidlessness opened this issue Feb 20, 2024 · 5 comments
Open

Consider alternatives to tree-sitter #41

eyelidlessness opened this issue Feb 20, 2024 · 5 comments

Comments

@eyelidlessness
Copy link
Member

I’m filing quickly on mobile ahead of our check in, but briefly:

  • tree-sitter adds a fair bit of tooling complexity
  • And page weight
  • It’s unclear that we benefit from its most compelling distinct features like incremental parsing
  • While the grammar is stable, it’s a constant build time drag in CI
  • Other tools might be better suited for eg TypeScript integration (as in type gen for aspects of the syntax we’re concerned with)

I briefly explored Ohm over the weekend and it’s a pretty compelling alternative option (albeit with some reservations around maturity). Beyond its TS support, I was encouraged by how easy it was to build a cromulent XPath grammar, and how much more closely it matches the grammar from the spec.

@sadiqkhoja
Copy link
Contributor

ANTLR4 seems to be quite popular, it has typescript support as well and XPath grammar already written.

@eyelidlessness
Copy link
Member Author

eyelidlessness commented May 23, 2024

ANTLR4 seems to be quite popular, it has typescript support as well and XPath grammar already written.

Specifically I’m interested in support for generating (or otherwise producing) TypeScript types corresponding to the parsed syntax tree nodes. We use dts-tree-sitter for this currently. It’s not amazing but we rely on it and supplemental types built on that to keep the parser and evaluator in sync. That’s something that Ohm does OOTB, and my impression from the spike does it better than dts-tree-sitter.

I’m not sure if that’s something ANTLR4 supports, but I didn’t find anything about that when I looked last. It’s not a deal breaker by any means, as the grammar will likely be stable. But it’s something I’m particularly nervous about trying to augment for an existing parser without really being familiar with its output.

@eyelidlessness
Copy link
Member Author

It's worth mentioning now that #166 is a really good frame of reference for why a typed grammar is valuable for us. Quite a lot of that change would be much more complicated without the types we have to express which syntax nodes can occur where in the syntax tree.

It's also worth mentioning that I recently did a spike into ANTLR4's XPath grammar. It seems like it partially addresses these concerns: ANTLR4 has a TypeScript target which produces some grammar-specific types. Unfortunately, what I found was that the generated parse API would require significant rework for us to use... basically across the board, anywhere we use XPath syntax as anything other than an opaque string. The API is heavily influenced by its Java underpinnings, and heavily generalized around OOP-ish concepts that don't really fit our current model of parsed syntax (as a tree of nodes which are basically plain objects).

@eyelidlessness
Copy link
Member Author

eyelidlessness commented Sep 5, 2024

Another dependency update PR #209, another round of tree-sitter pain. I chose to work through the issues that came up, despite a strong inclination to put off the update. Mainly because it’s another data point we can consider when we think about the cost/benefit of investigating alternatives and potential rework to adopt one.

My instinct today (largely unchanged): tree-sitter was a great choice for prototyping and early iterations, but its stability challenges are an outsized maintenance burden; and it continues to have an outsized impact in terms of bytes on the wire. I’m not in a big rush to look at replacements today, but I do recommend we investigate other options before we release 1.0.

@eyelidlessness
Copy link
Member Author

Build woes. @brontolosone has had them. I think we all have at this point.

tree-sitter requires either of these to be available on $PATH:

  • Docker
  • emscripten

The docs/CLI are pretty vague about anything more detailed than that, but it is actually more detailed than that. This makes it hard for us to document our own requirements, and makes for a big barrier to contribution. This is especially problematic because that barrier applies to something that's otherwise very stable (I'm pretty sure we've made one grammar change since this project was very much a prototype!).

I think these build woes should be weighed as we consider this quesiton.

In the meantime, some shorter term mitigations worth considering:

  1. Commit the tree-sitter build artifacts. This is actually the idiomatic setup for tree-sitter grammar projects. We're doing the different thing, which is part of the hassle here. If we do this, though, I would like to add a check to CI to make sure that build artifacts are up to date.

  2. Introduce a Dockerfile to handle the tree-sitter build stuff, and make that an explicit part of our build process. I'm not fond of this option, but I think most other contributors will likely find it less offputting than I do. We can also consider adding a companion Nix config for the handful of us more of the Nix persuasion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants