Replies: 7 comments 2 replies
-
Thanks for reporting 👍 Would it be possible to share such an pdf that we can reproduce the issue ?
|
Beta Was this translation helpful? Give feedback.
-
Here's the pdf link: https://github.com/user-attachments/files/17991370/po-r.pdf I ran it through both DocTR and OnnxTR and ran into same problem. |
Beta Was this translation helpful? Give feedback.
-
Hey @vikasr111 👋, I tested it with both docTR and OnnxTR without success to reproduce the bug ..Could you try to provide the absolute path to the pdf ? I used the same args as provided in your snippet - only changed the pdf path to absolute |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
@felixdittrich92 That's odd. When I run this directly in my system it works fine. But when I run the same code in docker container I get this error. Will investigate more as this error is coming from Pillow where it's getting blank image for the page. |
Beta Was this translation helpful? Give feedback.
-
Maybe an issue with pypdfium2 ? |
Beta Was this translation helpful? Give feedback.
-
@felixdittrich92 I use python:3.11-slim-bullseye Here's my full Dockerfile:
I found some lead. When I set |
Beta Was this translation helpful? Give feedback.
-
Bug description
I am trying to perform OCR using DocTR on a PDF document. I have noticed that the OCR for the whole document is failing because of one page in the document. When I ran that individual one-page pdf, I got the full error log. Here's the PDF:
po-r.pdf
Here's the error log:
I printed the docs of DocTR to debug further and here's the result:
Code snippet to reproduce the bug
Error traceback
Here's the error log:
Environment
Python - 3.11.7
Deep Learning backend
Torch
Beta Was this translation helpful? Give feedback.
All reactions