Optimizations for Chunk's hash validation #204
Labels
Any Idea
Any problem/ideas/suggestions
BDT
bucky data transfer protocol
CYFS Stack
This is CYFS Stack
feature
New feature
Performance
about performance issues
Chunk storage design
Currently, chunks managed by cyfs-stack are divided into two types based on the location of the managed data storage: external and internal:
For example, the problem mentioned as follows:
#158
#201
Solutions currently in use
At present, in order to handle chunk data errors, a "validation during reading" mode is used. Each time a chunk is requested, the data is validated when it is read from the target disk file. This approach is simple but also has some problems:
If there are partial reads, it cannot be handled correctly.
Every time a chunk is read, it is validated, which is not necessary for the same chunk. In most cases, the corresponding chunk file may not change or have errors, and frequently requesting the same chunk will add a lot of extra overhead.
Possible improvements
So, considering several aspects, relevant optimizations and improvements can be made:
1. Add a regular local chunk validation mechanism
Based on this, when the corresponding chunk is requested, if it is found that the last validation result was incorrect, the "data mismatch" error can be directly returned to the caller without further validation.
2. Add validation at the BDT layer on the requester side
Currently, BDT does not have a step to validate the chunk hash during the transfer process of file and chunks. According to the design principle, the cyfs-stack layer should ensure that the chunk data requested from elsewhere is correct (similar to the download operation in Web 2 browsers). Therefore, it seems necessary for the BDT layer to provide this validation mechanism, at least as an optional option. @photosssa
The text was updated successfully, but these errors were encountered: