Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pdf] Switch PDF page rendering to an iterator format #1001

Closed
frgfm opened this issue Jul 29, 2022 · 2 comments
Closed

[pdf] Switch PDF page rendering to an iterator format #1001

frgfm opened this issue Jul 29, 2022 · 2 comments
Labels
module: io Related to doctr.io type: new feature New feature
Milestone

Comments

@frgfm
Copy link
Collaborator

frgfm commented Jul 29, 2022

As suggested by @mara004 in #1000, the PDF rendering is using a list comprehension which holds page processing while the rendering isn't complete:

return [np.asarray(img) for img in pdf.render_topil(scale=scale, **kwargs)]

This could be fixed by modifying it to yield the page and limit RAM usage:

with pdfium.PdfDocument(file, password=password) as pdf:
    for img in pdf.render_topil(scale=scale, **kwargs): yield np.asarray(img)
@frgfm frgfm added module: io Related to doctr.io type: new feature New feature labels Jul 29, 2022
@frgfm frgfm added this to the 0.6.0 milestone Jul 29, 2022
@felixdittrich92 felixdittrich92 linked a pull request Sep 1, 2022 that will close this issue
@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Sep 1, 2022

Waiting until #1032 is available

@felixdittrich92 felixdittrich92 modified the milestones: 0.6.0, 0.7.0 Sep 26, 2022
@felixdittrich92
Copy link
Contributor

Outdated by #1240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: io Related to doctr.io type: new feature New feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants