Update Emojis to include Unicode 15.0+ #621

anaclumos · 2024-06-03T06:49:54Z

Bug report

Description / Observed Behavior

What kind of issues did you encounter with Satori?

It doesn't render Unicode 15.0 emojis, such as 🪈

Vizards · 2024-07-05T14:08:23Z

I've made a investigation for this issue and found it seems to be a bug from linebreak not only because the default Emoji Providers does not yet support all emojis of Unicode 15.

I created a simple playground to demonstrate this more clearly:

Playground Preview

🪈: A simple emoji, Code Point U+1FA88, correctly identified as { languageCode: 'emoji' }, and correctly rendered as <image />. The reason it may not display in the playground is likely because the Emoji Providers in the Playground have not yet been updated to support this Emoji.
🫸🏽: An Emoji ZWJ Sequence, Code Point U+1FAF8 U+1F3FD, correctly identified as { languageCode: 'emoji' }, but incorrectly rendered as <path /> instead of <image />.
🫸🏽 with style wordBreak: 'break-all': Correctly identified as { languageCode: 'emoji' }, and correctly rendered as <image />.

I found that the default wordBreak logic in src/utils.ts#L285 causes the Emoji ZWJ Sequence to be incorrectly recognized：

  if (wordBreak === 'break-all') {
    return { words: segment(content, 'grapheme'), requiredBreaks: [] }
  }

  if (wordBreak === 'keep-all') {
    return { words: segment(content, 'word'), requiredBreaks: [] }
  }

  const breaker = new LineBreaker(content)

Only when wordBreak === 'break-all' or wordBreak === 'keep-all' is specified, Intl.Segmenter will be called to handle text segmentation. When wordBreak is not specified, linebreak is called to handle. And linebreak currently supports Unicode version 13. It splits 🫸🏽 to ['🫸', '🏽'] that Satori couldn’t render the emoji correctly.

A probably workaround, hope this helps those experiencing similar issues:

Specify the style wordBreak: 'break-all' or wordBreak: 'keep-all' on the text container that needs to display Unicode 13+ Emoji ZWJ Sequence
Customize loadAdditionalAsset or graphemeImages (The Emoji Providers in the Playground do not support 🪈 or 🫸🏽)

But when wordBreak is not specified, satori cannot correctly segment the emoji (🫸🏽) in the example. Wondering if there is consideration to replace the default wordBreak with Intl.Segmenter for text segmentation? I'm willing to help with further investigation if needed. @shuding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Emojis to include Unicode 15.0+ #621

Update Emojis to include Unicode 15.0+ #621

anaclumos commented Jun 3, 2024

Vizards commented Jul 5, 2024 •

edited

Loading

Update Emojis to include Unicode 15.0+ #621

Update Emojis to include Unicode 15.0+ #621

Comments

anaclumos commented Jun 3, 2024

Bug report

Description / Observed Behavior

Vizards commented Jul 5, 2024 • edited Loading

Vizards commented Jul 5, 2024 •

edited

Loading