support storing using zip archives, split into files, with random-access and forward error correction #1

jimpick · 2019-05-06T17:30:31Z

Use case:

I'd like to be able to import directories with large numbers of files and subdirectories (for example, source code directories ... eg. the Linux source code). Filecoin would prefer to work with larger files rather than lots of small files, so it makes sense to use a file archive format (eg. tar, zip, etc.)

By splitting the archive file into multiple chunks, it can be stored in multiple places, and uploads/downloads can happen in parallel. The chunk size can be selected to be a size that is optimal for the Filecoin network.

If we re-encoded the chunks using Forward Error Correction before storing, we can introduce some extra resiliency against chunks that are lost and can't be retrieved (eg. only 3 of 5 chunks are needed to restore the original). The zfec tool from tahoe-lafs is an easy-to-user tool for encoding and decoding written in Python and C.

It would be nice to be able to retrieve a file from the archive via random-access so only the chunk containing it would need to be retrieved from the Filecoin network instead of having to download all chunks for the archive.

Tar archives do not have an index. Zip files have an index appended at the end of the archive.

Datpedia uses javascript and the zip index to provide random access to wikipedia entries stored in a single zip file.

https://github.com/dcposch/datpedia/blob/master/lib/unzip.js#L109

It would be nice to store the zip index separately for extra redundancy, optionally using zfec. I'd like to have the ability to still retrieve files from chunks, even if other chunks have been lost.

Some compression formats work better for random access. I've added some links to the wiki:

https://github.com/jimpick/filecoin-pickaxe/wiki/Dev-Links

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support storing using zip archives, split into files, with random-access and forward error correction #1

support storing using zip archives, split into files, with random-access and forward error correction #1

jimpick commented May 6, 2019

support storing using zip archives, split into files, with random-access and forward error correction #1

support storing using zip archives, split into files, with random-access and forward error correction #1

Comments

jimpick commented May 6, 2019