OCR only the images? #1171
Answered
by
jbarlow83
thibaultmol
asked this question in
Q&A
-
Hi, I have pdf's that are exports from PowerPoint. They contain the actual slides as images, and then the notes for each slide as actual text in the pdf. I would like only process the images and keep the existing text. I assume ocrmypdf is converting the entire page to an image and then doing ocr on the entire page? |
Beta Was this translation helpful? Give feedback.
Answered by
jbarlow83
Oct 19, 2023
Replies: 1 comment 2 replies
-
|
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
thibaultmol
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
--redo-ocr
just might do what you need despite the name. It attempts to hide existing text (if any) and then OCR whatever is leftover.