You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 23, 2021. It is now read-only.
I'd like to be able to import directories with large numbers of files and subdirectories (for example, source code directories ... eg. the Linux source code). Filecoin would prefer to work with larger files rather than lots of small files, so it makes sense to use a file archive format (eg. tar, zip, etc.)
By splitting the archive file into multiple chunks, it can be stored in multiple places, and uploads/downloads can happen in parallel. The chunk size can be selected to be a size that is optimal for the Filecoin network.
If we re-encoded the chunks using Forward Error Correction before storing, we can introduce some extra resiliency against chunks that are lost and can't be retrieved (eg. only 3 of 5 chunks are needed to restore the original). The zfec tool from tahoe-lafs is an easy-to-user tool for encoding and decoding written in Python and C.
It would be nice to be able to retrieve a file from the archive via random-access so only the chunk containing it would need to be retrieved from the Filecoin network instead of having to download all chunks for the archive.
Tar archives do not have an index. Zip files have an index appended at the end of the archive.
Datpedia uses javascript and the zip index to provide random access to wikipedia entries stored in a single zip file.
It would be nice to store the zip index separately for extra redundancy, optionally using zfec. I'd like to have the ability to still retrieve files from chunks, even if other chunks have been lost.
Some compression formats work better for random access. I've added some links to the wiki:
Use case:
I'd like to be able to import directories with large numbers of files and subdirectories (for example, source code directories ... eg. the Linux source code). Filecoin would prefer to work with larger files rather than lots of small files, so it makes sense to use a file archive format (eg. tar, zip, etc.)
By splitting the archive file into multiple chunks, it can be stored in multiple places, and uploads/downloads can happen in parallel. The chunk size can be selected to be a size that is optimal for the Filecoin network.
If we re-encoded the chunks using Forward Error Correction before storing, we can introduce some extra resiliency against chunks that are lost and can't be retrieved (eg. only 3 of 5 chunks are needed to restore the original). The zfec tool from tahoe-lafs is an easy-to-user tool for encoding and decoding written in Python and C.
It would be nice to be able to retrieve a file from the archive via random-access so only the chunk containing it would need to be retrieved from the Filecoin network instead of having to download all chunks for the archive.
Tar archives do not have an index. Zip files have an index appended at the end of the archive.
Datpedia uses javascript and the zip index to provide random access to wikipedia entries stored in a single zip file.
It would be nice to store the zip index separately for extra redundancy, optionally using zfec. I'd like to have the ability to still retrieve files from chunks, even if other chunks have been lost.
Some compression formats work better for random access. I've added some links to the wiki:
The text was updated successfully, but these errors were encountered: