Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message parsing: Replace Markdown with HTML #281

Open
kiliankoe opened this issue Mar 24, 2021 · 7 comments
Open

Message parsing: Replace Markdown with HTML #281

kiliankoe opened this issue Mar 24, 2021 · 7 comments

Comments

@kiliankoe
Copy link
Member

kiliankoe commented Mar 24, 2021

As recently learned in the chat, Matrix messages cannot be interpreted directly as markdown. Such formatting is purely coincidental, just as users directly using BBCode would be.

See this message content for example.

"content": {
  "msgtype": "m.text",
  "body": "*test*",
  "format": "org.matrix.custom.html",
  "formatted_body": "<em>test</em>"
}

The body contains markdown formatting, but this cannot be used directly. Instead we have to check the format, which will likely be non-existent for plaintext or org.matrix.custom.html like here. In that case we can interpret the formatted_body as HTML and render that.

It's totally up to clients to specify how users can markup their messages. For outgoing messages it would likely make sense for Nio to just assume markdown (and show a live formatting preview in the message composer), turn that into HTML and format the message content as above.

@helje5
Copy link
Contributor

helje5 commented Mar 24, 2021

You might want to split this issue into two, as it addresses two distinct topics:
a) message parsing (change from Markdown to HTML)
b) message composition (still parse Markdown, but emit HTML on send)

@helje5
Copy link
Contributor

helje5 commented Mar 24, 2021

If we keep this one for a), there are multiple options. The attributed string parser Nio is currently using can build quite complex stuff, e.g. paragraph formats for quotes. We could parse the HTML and build a similar one.

Another option is to parse the HTML into an own AST which we directly render as SwiftUI. E.g. this can be useful for block level elements (having them as separate View's, e.g. a source highlighting View for code blocks). Like:

struct Message {
  enum Block {
    case paragraphs([Runs])
    case quote([Runs])
    case code(String, language: String?)
  }
  let blocks : [ Block ]
}

But both options have their pro's and cons. E.g. a disadvantage of SwiftUI Text is that it isn't selectable.

For the HTML it would be interesting to know whether the "custom.html" is well formed, i.e. whether we could use NSXMLParser, or whether we'd have to use libxml2 directly.
Originally I though we could use the HTML parser, but that only seems to be exposed as NSXMLDocument, which is not available on iOS.

@kiliankoe
Copy link
Member Author

Splitting this up definitely sounds sensible 👍 I'll open a new issue for message composition.

For the HTML it would be interesting to know whether the "custom.html" is well formed

I would very much hope it to be, but can we be sure? It might very well be for Element, but other clients could be sending malformed HTML (the format will be the same), so I don't think we'll get around covering that.

@kiliankoe kiliankoe changed the title Replace Markdown rendering Message parsing: Replace Markdown with HTML Mar 24, 2021
@helje5
Copy link
Contributor

helje5 commented Mar 24, 2021

It is documented in here: https://matrix.org/docs/spec/client_server/r0.6.1#id335

So that seems to allow open tags, at least it doesn't mention otherwise.

The strongly suggested set of HTML tags to permit, denying the use and rendering of anything else, is: font, del, h1, h2, h3, h4, h5, h6, blockquote, p, a, ul, ol, sup, sub, li, b, i, u, strong, em, strike, code, hr, br, div, table, thead, tbody, tr, th, td, caption, pre, span, img.

@helje5
Copy link
Contributor

helje5 commented Mar 24, 2021

BTW: This also poses special challenges when editing messages. One might want to warn the user when the client can't deal w/ special content (e.g. if it contains a table).

@kiliankoe
Copy link
Member Author

Oh god, tables are possible? If I try and edit a message with a table in Element it just breaks down and lists all cells as a list, nice 😅

@helje5
Copy link
Contributor

helje5 commented Mar 24, 2021

Major feature of Mattermost over Slack ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants