-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to ignore unbalanced tokens (**, *) #599
Comments
Can you cite where this is standard? I'd like to play with it.
Generally, no. We adhere to the CommonMark spec, and match our implementation to that standard. There isn't really a notion of "unbalanced tokens" so I think this would be a sizeable undertaking, to specify a lot of behavior. CommonMark has a notion of "delimiter runs" that are used in some syntaxes like emphasis and strong emphasis. Someone could maybe implement some notion of "unbalanced tokens" in that code. Otherwise, I think you'd want to take an un-rendered Markdown concrete syntax tree, where you might have enough information to determine that the delimiter runs are unbalanced. But this package doesn't currently export a Markdown syntax tree. |
It's not a standard, but you can go try inline Markdown styles with Notion, Linear, Slack, etc. There's a variety of behavior, but you'll see a number of behaviors that are difficult/impossible to implement with this package.
Is there another package you would suggest for this purpose? This package seems to have enshrined itself as the go-to place for parsing Markdown in the Dart/Flutter ecosystem. I don't think I'm aware of meaningful alternative. Moreover, is the described goal truly outside the mission of this package? I understand that historically this package has been used as a batch parser for a blob of Markdown. But in so doing, it enshrines all sorts of syntax and protocol details. It sure seems like a waste to go build a new package and re-invent all of that just to be able to apply inline Markdown the way numerous products do today. Can this package introduce a second top-level parser that uses the existing internals but is designed for use on text as the user types it? That wouldn't need to mess with the batch parser. |
Hmm, I don't seem to be able to try any of these without signing up.
No, afaik this is the best-supported markdown parser/renderer package in Dart.
No I don't think so. I think that using the package more programatically, like getting a Markdown syntax tree, is well within the scope of this package. We just don't have that feature yet. There is the flutter_markdown package which I think treats this package's output, a tree of HTML nodes, as if it were a tree of Markdown nodes. I haven't looked at the code, but I have to imagine this would be error prone, or a real pain to implement. If you had access to the Markdown tree, you could maybe take
And say, "Ah there is an Emphasis node that follows Text with a "*"; that should disappear. Or, I guess your request from the top is that these should be treated as two Text nodes, with the delimiters put back where they were:
I'm not sure. I don't have a sense of what output you would want. I think it would need a lot of specification in order to see how you'd implement it. The CommonMark spec says that But maybe the CommonMark examples can give us examples of what you're going for, in a "user is typing" mode. I can look at adjacent examples of "This text would render as this syntax, but this text would not." Like for ATX headings, examples 71, 72, and 73 show that you can include a closing sequence of delimiters, like Or example 79 shows that an empty ATX heading, like |
I do want to clarify that there are definitely a variety of legal Markdown tokens that are ignored under certain circumstances by these various apps. So the parsing goals that I'm implementing aren't really about legal vs erroneous syntax. Instead, it's about the UX of typing Markdown as you go. UX considerationsThe fully isolated style cases are handled as expected in other apps, e.g.,
Then, taking the above From a holistic parsing perspective, these examples likely seem strange. Sometimes a legal syntax is applied and other times it's not. But when you're the person typing the syntax the desired rules are a bit different. For example, as I mentioned in the original post, when you're typing out the characters PerformancePerformance may also be noteworthy here. Given that this parsing is taking place as the user types, it's probably not possible to re-parse the whole document on every edit. For this reason, for example, I'm only implementing recognition of Markdown syntax within a single paragraph/node. I'm not considering something like bold "**" spanning across paragraphs (I do apply bold across paragraphs when full deserializing a document - but I don't look for it as the user types). Second, even a single paragraph might be quite long, and might include a number of other styles and perhaps inline widget content. It may not be acceptable to re-parse even a full paragraph on every key stroke. But if a parser begins at a reported caret position and then only considers a closing style token immediately upstream from the caret, such as "bold**|", "italics*|", "strikethrough~|", then the parser can quickly bail in most cases, and even in the nominal Markdown syntax case, the parser won't consume more than a dozen characters in the typical case. Multiple ParsersGiven that these rules are not about legal vs erroneous Markdown syntax, it's very possible that apps will want different rules. One way to handle that in this package would be to build a few different parsers with different policies. Or, something like an As the User TypesTo be clear about what I mean with "as the user types", I just mean a policy that understands a caret position within the text. It essentially means "hey parser, look upstream from offset X". So, to be clear, there's no suggestion in this proposal that this package have any knowledge of editing systems, such as the IME. It's just about who/when/where the parser does its work. |
Consider a text segment like
**something*
- currently, parsing that text with this package yields *something.I'm using this package to implement Markdown serialization as the user types. In the case of the user typing, the typical industry practice is to ignore non-matching Markdown tokens. Therefore,
**something*
would remain as-is - the Markdown wouldn't be applied.Is it possible to tell this package not to apply unbalanced tokens? If that's not possible, can that be added to the syntax options? Perhaps this option should be made available in the constructor for
TagSyntax
or something like that.The text was updated successfully, but these errors were encountered: