Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage when decompressing tarballs #31

Open
dom96 opened this issue Oct 2, 2021 · 3 comments
Open

High memory usage when decompressing tarballs #31

dom96 opened this issue Oct 2, 2021 · 3 comments

Comments

@dom96
Copy link
Contributor

dom96 commented Oct 2, 2021

I have seen memory usage as high as 4gb for certain tarballs, for example https://github.com/nim-lang/csources/archive/64e34778fa7e114b4afc753c7845dee250584167.tar.gz.

@guzba
Copy link
Owner

guzba commented Oct 2, 2021

Zippy works in memory only at this point and the tarball implementation stores the entire contents of the tarball in memory after being opened (since it just got fully unzipped in memory). A streaming implementation of Zippy would enable this to work without the same memory requirements. This is something I want and intend to work on, but am working on other things for now.

@guzba
Copy link
Owner

guzba commented Jan 29, 2022

Update here (released in zippy >= 0.9.3)

I have reworked a lot of Zippy's internals lately and rewritten how tarball extractAll is done. This has enabled some significant improvement here:

(From echo GC_getStatistics() right after uncompressing)

previous impl + arc + release:
[GC] total memory: 3909815487
[GC] occupied memory: 3582487439

previous impl + default gc + release:
[GC] total memory: 4169059543
[GC] occupied memory: 3595771911

* current impl + arc + release:
* [GC] total memory: 1195044921
* [GC] occupied memory: 1194516745

* current impl + default gc + release:
* [GC] total memory: 1195044945
* [GC] occupied memory: 1194578289

The above csources archive is 187 MB compressed and uncompresses to 1.2 GB. Since I do still inflate everything into memory, that sets the floor for memory usage at this point. I still want to get to a fully streamed version someday but progress is progress.

@Clonkk
Copy link

Clonkk commented Oct 4, 2023

Following up on this issues : I encounter very high memory usage when creating tarball with ".tar.gz" extension when comparing to bash "tar -czf file.tar.gz".

"tar -czf " also creates tarball smaller in size. I'm not sure what the difference is :)

Did you made any progress on the streaming version ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants