Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General cleanup / bug prevention #535

Open
1 of 6 tasks
jonasfj opened this issue Apr 25, 2023 · 5 comments
Open
1 of 6 tasks

General cleanup / bug prevention #535

jonasfj opened this issue Apr 25, 2023 · 5 comments

Comments

@jonasfj
Copy link
Member

jonasfj commented Apr 25, 2023

Following #531 (infinite loop) which caused some hard to debug issues. We've been wondering if we could reduce the risk of infinite loops, since these are hard to recover from gracefully (and doing so eats a lot of CPU that affects a server negatively).

But in general, reducing the risk of bugs would be nice. Perhaps, we can use this topic to discuss possible cleanups / tweaks to prevent/reduce bugs.

Some ideas include:

  • Throw, if BlockSyntax.parseLines loops indefinitely #533
  • Remove BlockParser.canParse() in favor of BlockParser.parse() returning null
  • Return null from BlockParser.parse() whenever it doesn't advance _pos
  • Review if other while (!isDone) { loops could benefit from a limit on iterations?
  • Determine if we can avoid BlockParser.retreat and BlockParser.retreatBy
  • Other ideas...?

I'm not really sure how everything in this package works, maybe there is a need to retreat, and we can't just peek.

@chenzhiguang
Copy link
Contributor

I think we can avoid retread. We can parse each syntax in a isolated context and never run the same syntax at the same position if the parser returns null.

@jonasfj
Copy link
Member Author

jonasfj commented Apr 25, 2023

We can parse each syntax in a isolated context and never run the same syntax at the same position if the parser returns null.

Wouldn't it be a lot safer to do this. I mean, we probably don't even have to create an isolate context for each run, we can probably just remove the calls to retreat and use some sort of peek mechanism instead (or simply do advance() later).

For example: TableSyntax uses parser.retreat() after calling TableSyntax._parseRow which calls parser.advance(). But we could probably just choose to not call advance() inside _parseRow, and just call advance() when we've decided not to return null.
Then there is never a need to retreat.


To get rid of canParse we'd probably also want to get rid of BlockSyntax.interruptedBy, but from what I can see, it's only used by ParagraphSyntax. So maybe it's possible to just do a hack like if (parser.blockSyntaxes.any((s) => s is SetextHeaderSyntax)) { ... check...

And maybe we'd have to do something about BlockSyntax.isAtBlockEnd which also uses canParse. I don't know if something smarter can be done, or if calling canParse for every syntax is a good idea.


@chenzhiguang do you have other ideas for things that might be worth while doing?

@chenzhiguang
Copy link
Contributor

@jonasfj Yes, like you said, we can also do not advance the position in the TableSyntax._parseRow, it means we isolate a segment of the source text and parse it independently of the main parser.

I have created a TextParser to simplify the task, you can see how it is implemented in the ListSyntax : https://github.com/dart-lang/markdown/blob/master/lib/src/block_syntaxes/list_syntax.dart#L156.

A TextParser instance takes a portion of the source text and runs it separately without changing the main reading position.

@jonasfj
Copy link
Member Author

jonasfj commented Apr 26, 2023

Ah, so TextParser is a thing for reading a single line. Makes sense.

We could also look at reusing package string_scanner (or extending it), it already has support for source_span..

note. both source_span and string_scanner are in the Dart SDK.

@chenzhiguang
Copy link
Contributor

Ah, so TextParser is a thing for reading a single line. Makes sense.

The source to be parsed does not necessarily have to be a single line; it can be a string that contains multiple lines.

We could also look at reusing package string_scanner (or extending it), it already has support for source_span..

I have taken a quick look at this package, but I'm not sure about its performance. Normally, the Markdown parser simply walks through the text character by character, avoiding the use of patterns intentionally. Additionally, if we have implemented source_span, we will convert the source into SourceSpan from the very beginning. However, in each syntax, there are cases where the source to be parsed might not be a single SourceSpan instance; it could be a list of SourceSpans which locations are disjoint. For example, for this string # _a\_b_, for the inline syntaxes it hits the EscapeSyntax first. Then, the input for the EmphasisSyntax will be two disjoint SourceSpans containing _a and _b_, the string_scanner might not meet our requirements in this case in a simple way.

The TextParser is a simplified version of the SourceParser, the SourceParser works for a list of SourceSpans as the source input. You can find the SourceParser implementation here: https://github.com/tagnote-app/dart_markdown/blob/master/lib/src/parsers/source_parser.dart.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants