Replies: 3 comments 5 replies
-
Can you post an example of the input PDF? |
Beta Was this translation helpful? Give feedback.
-
You can encrypt it with my public key as described here: |
Beta Was this translation helpful? Give feedback.
-
Figma constructs a Type 3 font inside the PDF. A Type 3 font is a format that exists only inside a PDF. There's a library of character procedures that describe how to render each glyph. Figma is used vectors to render them, but that may change if the input font differs. It's also appears to be a subset font, meaning any glyph not used in the document is omitted. Internally in PDF, calls to render text are actually calls to render a specific glyph number in a specific font. Naturally, sometimes people use the encoding of glyph number = ASCII or Unicode and the mapping is transparent. But in this case, there is no correlation between the glyph numbers in the font and Unicode. There is supposed to be a lookup table that defines the mapping, and it seems to be present, but it is clearly not working correctly or some other piece of information is missing. Because of that some PDF viewers depending on their bugs, features and heuristics, are able to read the text in the Figma PDF, while others are not. The Figma PDF is also generated with an invalid xref table. There are problems in their PDF generation. I tested Foxit and while the text is selectable, it copy-pastes as mojibake. poppler and evince are capable of reading it (but that's not necessarily correct behavior).
Note Ghostscript may give more trouble, as found here #1439 |
Beta Was this translation helpful? Give feedback.
-
Since the PDF export of the design software Figma generate a ton of issues, mentioned often online, I thought it's reasonable to ask if there are known optimal settings for OCRmyPDF that work well (maybe perfectly) with Figma text.
For some reason I don't really understand, only very few PDF readers are able to actually search/copy the text generated by Figma. PDF-XChange can do it, Adobe can not. Even though it looks like real text when zooming and it can be highlighted. I just want to use the Figma default export, maintain all visual aspects including the text which seems to be vector-based, without a degradation of quality, but also make it searchable in Adobe and Foxit.
I already tried "ocrmypdf -l eng+deu --redo-ocr --optimize 0 --no-tesseract-downsample-large-images", which actually seems to do what I want regarding text (gotta test a bit more), but the background image quality is severely reduced.
Edit: It seems like a big part of the text wasn't recognized anyway, which would unfortunately make this unusable.
Beta Was this translation helpful? Give feedback.
All reactions