Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix go to definition and hover for files containing multibyte characters #2021

Conversation

NotFounds
Copy link

Motivation

Closes #1251

When a Ruby file contains multibyte characters (like Japanese, Chinese, emoji, etc), the go to definition and hover features do not work correctly. The definition location or hover documentation will be incorrect.
ref:

This is because the current implementation assumes single-byte characters when calculating offsets during index building and document referencing. We need to properly handle multibyte characters to ensure these features work reliably for all users.

Implementation

  1. Modified RubyLsp::Document and RubyLsp::Requests::Request to calculate locations considering multibyte characters. This change utilizes the API implemented in Prism by Add code unit APIs to location ruby/prism#2406.
  2. Added encoding to Entry and updated the logic to handle multibyte characters when storing locations. Additionally, since the location object has been changed from Prism::Location to RubyIndexer::Location by Refactor global usage of Prism::Location to minimize memory usage #1917, a new field has been added to RubyIndexer::Location.
  3. By passing the encoding when creating the Index, the index will be built taking multibyte characters into account.

Automated Tests

I added a simple expectation test for the definition.

Manual Tests

When a Ruby file contains multibyte characters, the go to definition and
hover features do not work correctly. This is because the index building
and document referencing logic does not properly handle multibyte
characters when calculating offsets.
This commit fixes the issue by:
•Updating index building to use character offsets instead of byte
offsets
•Modifying document referencing to properly handle multibyte characters
when mapping between positions and offsets
•Adding test cases to verify go to definition work with multibyte
characters
@NotFounds NotFounds requested a review from a team as a code owner May 8, 2024 11:34
@NotFounds NotFounds requested review from andyw8 and st0012 May 8, 2024 11:34
@NotFounds
Copy link
Author

I have signed the CLA!

@andyw8
Copy link
Contributor

andyw8 commented May 13, 2024

Thank for you the contribution. From a quick read, it seems the PR is doing two main things:

  • fixing handling for multibyte characters
  • introducing support for encodings other than UTF-8

Since UTF-8 is almost always used, is it possible that the first can be addressed without the second?

@andyw8 andyw8 added server This pull request should be included in the server gem's release notes bugfix This PR will fix an existing bug labels May 13, 2024
@NotFounds
Copy link
Author

@andyw8
Thank you for your review!
As you pointed out, the changes can be broken down into multiple tasks.
I'll close this Pull Request for now and focus on fixing the handling of multibyte characters first.

When a Ruby file contains multibyte characters (like Japanese, Chinese,
emoji, etc), the go to definition and hover features do not work
correctly. Because the document referencing logic does not properly handle
multibyte characters when calculating offsets.

This commit fixes the issue by:
*Modifying document referencing to properly handle multibyte characters
when mapping between positions and offsets
*Adding test cases to verify go to definition work with multibyte
characters
@NotFounds
Copy link
Author

I opened a new PullRequest.
#2051

@NotFounds NotFounds closed this May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This PR will fix an existing bug server This pull request should be included in the server gem's release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better handle multibyte character locations
2 participants