Skip to content
This repository has been archived by the owner on Aug 23, 2021. It is now read-only.

support storing using zip archives, split into files, with random-access and forward error correction #1

Open
jimpick opened this issue May 6, 2019 · 0 comments

Comments

@jimpick
Copy link
Member

jimpick commented May 6, 2019

Use case:

I'd like to be able to import directories with large numbers of files and subdirectories (for example, source code directories ... eg. the Linux source code). Filecoin would prefer to work with larger files rather than lots of small files, so it makes sense to use a file archive format (eg. tar, zip, etc.)

By splitting the archive file into multiple chunks, it can be stored in multiple places, and uploads/downloads can happen in parallel. The chunk size can be selected to be a size that is optimal for the Filecoin network.

If we re-encoded the chunks using Forward Error Correction before storing, we can introduce some extra resiliency against chunks that are lost and can't be retrieved (eg. only 3 of 5 chunks are needed to restore the original). The zfec tool from tahoe-lafs is an easy-to-user tool for encoding and decoding written in Python and C.

It would be nice to be able to retrieve a file from the archive via random-access so only the chunk containing it would need to be retrieved from the Filecoin network instead of having to download all chunks for the archive.

Tar archives do not have an index. Zip files have an index appended at the end of the archive.

Datpedia uses javascript and the zip index to provide random access to wikipedia entries stored in a single zip file.

It would be nice to store the zip index separately for extra redundancy, optionally using zfec. I'd like to have the ability to still retrieve files from chunks, even if other chunks have been lost.

Some compression formats work better for random access. I've added some links to the wiki:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant