Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable zip files, but first decide how to handle multi-file zips... #7

Open
mccalluc opened this issue Aug 6, 2018 · 1 comment

Comments

@mccalluc
Copy link
Member

mccalluc commented Aug 6, 2018

If we do zip files, we need to figure out what should happen when multiple files are zipped up together. On the back burner until we have a usecase that will clarify this.

dataframer.py:

    compression = {
        b'\x1f\x8b': 'gzip',
        # TODO:
        # b'\x50\x4b': 'zip'
    }.get(file.read(2))
    ...
        # elif compression == 'zip':
        #     zf = zipfile.ZipFile(file)
        #     files = zf.namelist()
        #     first_bytes = zf.open(files[0]).peek(peek_window)

test:

    # No use-case for zip files right now, but it could be brought back.
    # def test_read_zip(self):
    #     self.assert_file_read(
    #         b'PK\x03\x04\n\x00\x00\x00\x00\x00\x8dZML\xfb\x9a\xc9\xa6\n\x00\x00\x00\n\x00\x00\x00\x08\x00\x1c\x00fake.csvUT\t\x00\x03J\x10\x83Zk\x11\x83Zux\x0b\x00\x01\x04\xf6\x01\x00\x00\x04\x14\x00\x00\x00,b,c\n1,2,3PK\x01\x02\x1e\x03\n\x00\x00\x00\x00\x00\x8dZML\xfb\x9a\xc9\xa6\n\x00\x00\x00\n\x00\x00\x00\x08\x00\x18\x00\x00\x00\x00\x00\x01\x00\x00\x00\xa4\x81\x00\x00\x00\x00fake.csvUT\x05\x00\x03J\x10\x83Zux\x0b\x00\x01\x04\xf6\x01\x00\x00\x04\x14\x00\x00\x00PK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x00N\x00\x00\x00L\x00\x00\x00\x00\x00', self.target  # noqa: E501
    #     )
@mccalluc mccalluc changed the title Re-enable zip files, but decide how to handle multi-file zips Re-enable zip files, but first decide how to handle multi-file zips... Aug 6, 2018
@mccalluc
Copy link
Member Author

@gmnelson : Do we want to support zip files? In contrast to gzip, there could be multiple files bundled together: What should we do in that case? Possible ways forward:

  • Close this, support only gz
  • Error out if zip contains multiple files
  • Concatenate them all, in some arbitrary order
  • Sniff for tabular data files, and ignore others

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant