Improve parsing of HTML anchor content #54
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This attempts to address #53.
I'm not sure if this has any significant performance impact that should be evaluated. I tried to paste some large content and at least I didn't notice it being any slower than before. Calling a node filter on every element in the tree could be expensive, but maybe not too much?
This does not address the
<br>
case. I guess the only way to deal with it would be to traverse the children of the anchor and collect the text, but even that would probably not cover all the cases. And I suspect it might be too expensive if somehow it happens that the tree under the anchor element is very complex.I also realized that using
innerText
instead oftextContent
doesn't work here (while normally it makes a difference), because when the element is not being rendered the DOM doesn't care about the semantics of<br>
.Please just take this as a suggestion. I mostly don't understand what I'm doing but wanted to give this a try anyway.