Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix corruption of large files when zip64 is used #113

Merged
merged 1 commit into from
Apr 20, 2024
Merged

Conversation

mrkkrp
Copy link
Owner

@mrkkrp mrkkrp commented Apr 19, 2024

Close #111.

Previously the code did not account for the fact that the initial stub local header (with uncompressed and compressed sizes set to 0) could not serve for correct estimation of the final local header size due to the fact that the local header size was determined by the uncompressed and compressed sizes of the corresponding data, which are only known after streaming of the data. These sizes dictated whether or not a zip64 extra field entry should be included in the header or not. Thus, before this fix there would be cases of corruption where the final (longer) local header written by seeking back to the beginning of the initial stub local header after the data had been streamed would overwrite the beginning of the data.

This is fixed by

  • always writing a zip64 entry in local headers, which does not violate the spec and will be safely ignored in the case of smaller entries, and
  • respecting the spec more precisely where it says that whenever there is a zip64 extra field entry in a local header both uncompressed and compressed sizes must always be written.

This is deemed safe because the only source of size variation for local headers is the uncompressed and compressed sizes of the corresponding data.

Previously the code did not account for the fact that the initial stub local
header (with uncompressed and compressed sizes set to 0) could not serve for
correct estimation of the final local header size due to the fact that the
local header size was determined by the uncompressed and compressed sizes of
the corresponding data, which are only known after streaming of the data.
These sizes dictated whether or not a zip64 extra field entry should be
included in the header or not. Thus, before this fix there would be cases of
corruption where the final (longer) local header written by seeking back to
the beginning of the initial stub local header after the data had been
streamed would overwrite the beginning of the data.

This is fixed by

* always writing a zip64 entry in local headers, which does not violate the
  spec and will be safely ignored in the case of smaller entries, and
* respecting the spec more precisely where it says that whenever there is a
  zip64 extra field entry in a local header both uncompressed and compressed
  sizes must always be written.

This is deemed safe because the only source of size variation for local
headers is the uncompressed and compressed sizes of the corresponding data.
@mrkkrp mrkkrp merged commit da5df36 into master Apr 20, 2024
4 checks passed
@mrkkrp mrkkrp deleted the issue-111 branch April 20, 2024 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to extract 4GB+ files from archive.
1 participant