Fix go to definition and hover for files containing multibyte characters #2021

NotFounds · 2024-05-08T11:34:08Z

Motivation

When a Ruby file contains multibyte characters (like Japanese, Chinese, emoji, etc), the go to definition and hover features do not work correctly. The definition location or hover documentation will be incorrect.
ref:

Definition jumps are not possible with files containing Japanese characters. #1347

This is because the current implementation assumes single-byte characters when calculating offsets during index building and document referencing. We need to properly handle multibyte characters to ensure these features work reliably for all users.

Implementation

Modified RubyLsp::Document and RubyLsp::Requests::Request to calculate locations considering multibyte characters. This change utilizes the API implemented in Prism by Add code unit APIs to location ruby/prism#2406.
Added encoding to Entry and updated the logic to handle multibyte characters when storing locations. Additionally, since the location object has been changed from Prism::Location to RubyIndexer::Location by Refactor global usage of Prism::Location to minimize memory usage #1917, a new field has been added to RubyIndexer::Location.
By passing the encoding when creating the Index, the index will be built taking multibyte characters into account.

Automated Tests

I added a simple expectation test for the definition.

Manual Tests

When a Ruby file contains multibyte characters, the go to definition and hover features do not work correctly. This is because the index building and document referencing logic does not properly handle multibyte characters when calculating offsets. This commit fixes the issue by: •Updating index building to use character offsets instead of byte offsets •Modifying document referencing to properly handle multibyte characters when mapping between positions and offsets •Adding test cases to verify go to definition work with multibyte characters

NotFounds · 2024-05-09T09:58:27Z

I have signed the CLA!

andyw8 · 2024-05-13T20:57:48Z

Thank for you the contribution. From a quick read, it seems the PR is doing two main things:

fixing handling for multibyte characters
introducing support for encodings other than UTF-8

Since UTF-8 is almost always used, is it possible that the first can be addressed without the second?

NotFounds · 2024-05-14T14:27:49Z

@andyw8
Thank you for your review!
As you pointed out, the changes can be broken down into multiple tasks.
I'll close this Pull Request for now and focus on fixing the handling of multibyte characters first.

When a Ruby file contains multibyte characters (like Japanese, Chinese, emoji, etc), the go to definition and hover features do not work correctly. Because the document referencing logic does not properly handle multibyte characters when calculating offsets. This commit fixes the issue by: *Modifying document referencing to properly handle multibyte characters when mapping between positions and offsets *Adding test cases to verify go to definition work with multibyte characters

NotFounds · 2024-05-15T16:44:20Z

I opened a new PullRequest.
#2051

NotFounds added 4 commits May 8, 2024 19:37

Merge branch 'main' into fix-definition-and-hover-for-multibyte-chars

2474c17

Remove unnecessary comment

bd5895c

Format

1121fa7

NotFounds requested a review from a team as a code owner May 8, 2024 11:34

NotFounds requested review from andyw8 and st0012 May 8, 2024 11:34

github-actions bot added the cla-needed label May 8, 2024

github-actions bot removed the cla-needed label May 9, 2024

NotFounds added 2 commits May 10, 2024 14:15

Fix bug

e38a505

Merge branch 'main' into fix-definition-and-hover-for-multibyte-chars

97498bf

andyw8 added server This pull request should be included in the server gem's release notes bugfix This PR will fix an existing bug labels May 13, 2024

NotFounds mentioned this pull request May 15, 2024

Fix the handling of multibyte characters #2051

Open

NotFounds closed this May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix go to definition and hover for files containing multibyte characters #2021

Fix go to definition and hover for files containing multibyte characters #2021

NotFounds commented May 8, 2024

NotFounds commented May 9, 2024

andyw8 commented May 13, 2024

NotFounds commented May 14, 2024

NotFounds commented May 15, 2024

Fix go to definition and hover for files containing multibyte characters #2021

Fix go to definition and hover for files containing multibyte characters #2021

Conversation

NotFounds commented May 8, 2024

Motivation

Implementation

Automated Tests

Manual Tests

NotFounds commented May 9, 2024

andyw8 commented May 13, 2024

NotFounds commented May 14, 2024

NotFounds commented May 15, 2024