Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an LSP for yuck #1241

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Fejiberglibstein
Copy link

@Fejiberglibstein Fejiberglibstein commented Dec 9, 2024

Description

Closes #1232.

Features of the LSP (will) include

  • autocomplete for widgets and widget's variables
  • hover documentation for builtin widgets
  • diagnostic messages for any errors
  • goto definition for user-made widgets
  • auto formatting
  • Document aware syntax highlighting
    • Treesitter grammar exists, this would allow for more syntax highlighting possibilities, perhaps even allowing for syntax highlighting inside (literal) expressions.

Additional Notes

I am far from finished for now.

The goals of this draft PR is

  1. To make sure my handling of yuck configuration is correct.
  2. To discuss future implementation of LSP features, as I believe some things will need to be altered to allow the LSP to have sufficient data to operate.

Check the top comment for further discussion

Checklist

Please make sure you can check all the boxes that apply to this PR.

  • All widgets I've added are correctly documented.
  • I added my changes to CHANGELOG.md, if appropriate.
  • The documentation in the docs/content/main directory has been adjusted to reflect my changes.
  • I used cargo fmt to automatically format all code before committing

@Fejiberglibstein
Copy link
Author

Fejiberglibstein commented Dec 9, 2024

Current implementation details

For my current implementation, I am creating a config using yuck::config::Config::generate() for the LSP, but perhaps a different method would be better.

I needed to do really hacky stuff to avoid mutating self.documents in Backend::change_document, and ideally this should not be done this way, since this function will be getting called any time a change is made to the yuck file.

Please let me know what direction I should go in for handling yuck files so I may begin working on the actual LSP.


Future implementation details

For implementing things like syntax highlighting and hover docs, I may need more information than just the start and end of individual widgets.I will also need to know the tokens of each ast node: (defvar foo "h") -> [(, defvar, foo, "h", )] which I do not believe is possible when using Config. I will need the AST nodes of each widget in addition to the semantic meaning that Config provides, which I am unsure how to accomplish at the moment.

@Fejiberglibstein
Copy link
Author

Upon further inspection of the codebase, I have noticed that there exists some global state to hold the file database.

A few questions:

  • Should this global state be used instead of creating my own EwwConfig and FileDatabase?
  • Overall, how should configs be handled for the LSP?
  • Does it make sense that the LSP wouldn't work unless you have a daemon started for the workspace you're working in?
    • If you don't have a daemon started for the workspace you're in (ex. you're working in ~/custom_eww_workspace/ and you have to start the daemon with eww daemon -c ~/custom_eww_workspace) should the LSP initialize the server daemon?

imo it makes sense to have the daemon and LSP share the same files & config, since this reduces memory usage & improves performance. I am just wondering how I should go about sharing this state.

Another thing of note is that currently I have the LSP as a separate executable. Should the LSP instead be started with something like eww lsp -c path/to/config/, and then LSP clients such as neovim or vscode can simply run this command in the current working directory?

@DOD-101
Copy link

DOD-101 commented Dec 15, 2024

Should this global state be used instead of creating my own EwwConfig and FileDatabase?

This is something I'm also curious about for the formatter.

I have written up a basic implementation (for which I will also open a draft PR soon), but this is one part I have yet to touch.

It seems like almost everything I need regarding file handling is already implemented, just not exposed publicly.

@DOD-101
Copy link

DOD-101 commented Dec 15, 2024

Future implementation details

This is something that it would make sense for us to coordinate closer on, since the formatter will also require adjustments to the AST.

@elkowar
Copy link
Owner

elkowar commented Dec 16, 2024

I'll need to look into this a bit further, but just as a general heads up already:

In most cases, a good LSP will still work when there are parsing errors (otherwise, autocomplete etc might be quite hard, and highlighting more than one error is very difficult as well). The eww parser is currently somewhat unfit for this, I'd fear. A lot of information is lost, and it definitely isn't super error-tolerant.

Thus, and especially given the relative simplicity of yuck, I'd recommend trying to work a bit further disjunct from the main eww codebase for this. I could see some sort of connection to a running eww daemon be quite valuable for use cases such as showing errors from the daemon, but even that would require a good bit of work to get the required info.

Writing a simple-ish CST-parser (maybe using something like tree-sitter, or basing on Rowan, which many LSPs use) would probably be a lot more flexible in the long run, even if it's of course a lot of initial effort.
The same thing applies for an autoformatter as well – the current eww AST is very unlikely to be sufficient there.

@DOD-101
Copy link

DOD-101 commented Dec 22, 2024

Writing a simple-ish CST-parser (maybe using something like tree-sitter, or basing on Rowan, which many LSPs use) would probably be a lot more flexible in the long run, even if it's of course a lot of initial effort.
The same thing applies for an autoformatter as well – the current eww AST is very unlikely to be sufficient there.

I don't have much experience in this area, so take everything I am saying with a grain of salt.

But it seems to me that, if I am understanding you correctly, you think it would be best if both the LSP and the formatter implement their own parsers.

Why?

It seems to me that it would be a lot of additional work to try and maintain 2 or 3 different parsers. Would it not make more sense to collaborate on improving the parsers used in eww itself, so that eww, the LSP and the formatter all use the same one? This is what most languages seem to do, at least for the most part.

I can't speak to the LSP side of things, but for the formatter, the eww AST seems to mostly fulfill my requirements. Perhaps later down the line there would need to be some more significant changes made to facilitate some more complicated changes, but as it stands now I have a formatter that can handle about 80% of yuck code properly, with no changes to the AST.

It seems to be that if we were all to go out and implement our own parser solutions we would end up with 3 parsers, that all need to be maintained, which I foresee causing headaches down the line.

@Fejiberglibstein
Copy link
Author

I can understand where @elkowar is coming from. His implementation lacks error recovery, a huge part of what's needed for LSPs to work. Right now if there is any error at all, I believe the parser will just quit.

I ran into a similar issue when I was working on another LSP for a language my friends and I made. We had to add a bit of extra code to the parser to make it so it could try its best to get all the tokens out if there was an error.

That being said, I initially was writing some comment that was against using the current yuck parser. However, I realized that for a language like this, it would be very difficult to have correct autocomplete and syntax highlighting working when there are syntax errors. Take for example this code

(foo
    (bar
)

(something-else
    (foo))

it would be impossible to tell if that something else is its own thing or if its inside foo, making whatever autocomplete the LSP has useless.

Therefore, I don't think we need to worry very much about errors. Perhaps I am missing something obvious here, and I haven't dug too deep into the parser's code, but could it be as simple as returning everything that was parsed up until the syntax error? That way we could have syntax highlighting for everything up until the syntax error. This is essentially what was done for the language my friends and I made, and it worked fine.

@bbb651
Copy link

bbb651 commented Dec 22, 2024

I think using tree-sitter as a parser can be a good approach - It already exists and needs to exist for good editor support (e.g. zed and neovim), it handles error recovery and support incremental updates. I found this discussion showing how this was done in asm-lsp

@Eugenenoble2005
Copy link

(foo
    (bar
)

(something-else
    (foo))

it would be impossible to tell if that something else is its own thing or if its inside foo, making whatever autocomplete the LSP has useless.

Not quite, like elkowar said yuck is relatively simple. In this case (something-else is simple a child of the first parent (foo while (bar is a sibling. For working completions in yuck, i think we really only need simple context up to the point of the cursor. I achieved this in my implementation with simple parenthesis matching to find the immediate parent regardless of syntax errors(errors probably might be dealt with later)
Screenshot_24-Dec_08-27-53_1594
The screenshot above is kind of like the example you posted, Tree sitter knows the syntax is not correct, but we're ignoring that. The LSP tries to suggest widgets because the cursor is currently a child of box which inturn is a child of defwindow.

@Eugenenoble2005
Copy link

But it seems to me that, if I am understanding you correctly, you think it would be best if both the LSP and the formatter implement their own parsers.

Why?

It seems to me that it would be a lot of additional work to try and maintain 2 or 3 different parsers. Would it not make more sense to collaborate on improving the parsers used in eww itself, so that eww, the LSP and the formatter all use the same one? This is what most languages seem to do, at least for the most part.

My guess is that it would require a lot of rewrites on eww's side on an already stable product for a feature which is not sacrosanct to it's functionality at least for now.

@elkowar
Copy link
Owner

elkowar commented Dec 27, 2024

Yea, making the eww AST verbose enough to be convenient for a proper LSP would require quite a bit of internal changes, for reasons of error recovery as well as the fact that currently, we don't preserve all information (i.e., if theres two different ways to write something, we don't necessarily retain all information needed to reconstruct the original code, we don't retain all comments, etc)

The eww grammar has not changed in over two years, so there's not that much "maintenance" that'd be necessary here -- just write the grammar once, and that's probably it. The goals of an LSP/formatter parser and the main eww parser are just very different. I do think doing something with treesitter is probably a good first idea here.

Having your fully own AST would allow you to retain any additional metadata you might want to store, as well.

From all I gather, there isn't really a one-size-fits-all strategy for this across other languages. many languages have multi-step parsers and grammars, where things go through a tokenizer, then a CST, then an AST, etc etc. some languages might re-use part of the original parser for their LSPs. Rust-analyzer, from what I gather, has a completely separate CST and AST structure based on rowan for its parser, which also, again, just has very different requirements and restrictions from the parsing logic used in the compiler.

@elkowar
Copy link
Owner

elkowar commented Dec 27, 2024

However, if you're able to make the eww-parser and AST usable for these usecases, without having to re-structure the entire codebase, I'd definitely be willing to consider this as an option as well. I do, however, think that error recovery is very important for an LSP, and even a formatter. There's more "hints" for them than you might think:
i.e., it's very clear when a top-level element starts, because they all start with a clearly defined "defvar", "defwidget", "defwindow", etc. So you can at least use these top-level nodes as error recovery boundaries. Then, in many other cases, the parentheses should suffice to group errors by other means -- not trivial, but definitely not conceptually super complex either, and the added value there could be huge.

@DOD-101
Copy link

DOD-101 commented Dec 27, 2024

Thanks, @elkowar, for the detailed reply.

As for the formatter:

I will finish experimenting around with using the eww AST and then go ahead and write another implementation using tree sitter, as suggested.

At that point, I will go forward with whatever feels more well suited / capable. Hopefully I will then have something presentable as a 0.0.1 in the next few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Make an LSP for yuck
5 participants