You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, chunks managed by cyfs-stack are divided into two types based on the location of the managed data storage: external and internal:
External: stored as files in any directory, with the cyfs-stack's tracker responsible for recording these associations. The probability of data invalidation and errors is higher.
Internal: placed in the data/chunk-cache directory, managed by the chunk-manager. The probability of data invalidation and errors is lower.
For example, the problem mentioned as follows: #158 #201
Solutions currently in use
At present, in order to handle chunk data errors, a "validation during reading" mode is used. Each time a chunk is requested, the data is validated when it is read from the target disk file. This approach is simple but also has some problems:
Various special cases during reading
If there are partial reads, it cannot be handled correctly.
Performance waste
Every time a chunk is read, it is validated, which is not necessary for the same chunk. In most cases, the corresponding chunk file may not change or have errors, and frequently requesting the same chunk will add a lot of extra overhead.
Possible improvements
So, considering several aspects, relevant optimizations and improvements can be made:
1. Add a regular local chunk validation mechanism
Similar to the GC mechanism, it periodically scans the chunks recorded in the NDC and tracker, and attempts to validate them, updating the status after validation, such as the last validation time and validation result.
Perform validation in a trigger point mode, for example, when a chunk is requested and it is found that it has not been validated for a long time, the validation operation can be immediately triggered.
Based on this, when the corresponding chunk is requested, if it is found that the last validation result was incorrect, the "data mismatch" error can be directly returned to the caller without further validation.
2. Add validation at the BDT layer on the requester side
Currently, BDT does not have a step to validate the chunk hash during the transfer process of file and chunks. According to the design principle, the cyfs-stack layer should ensure that the chunk data requested from elsewhere is correct (similar to the download operation in Web 2 browsers). Therefore, it seems necessary for the BDT layer to provide this validation mechanism, at least as an optional option. @photosssa
The text was updated successfully, but these errors were encountered:
Chunk storage design
Currently, chunks managed by cyfs-stack are divided into two types based on the location of the managed data storage: external and internal:
For example, the problem mentioned as follows:
#158
#201
Solutions currently in use
At present, in order to handle chunk data errors, a "validation during reading" mode is used. Each time a chunk is requested, the data is validated when it is read from the target disk file. This approach is simple but also has some problems:
If there are partial reads, it cannot be handled correctly.
Every time a chunk is read, it is validated, which is not necessary for the same chunk. In most cases, the corresponding chunk file may not change or have errors, and frequently requesting the same chunk will add a lot of extra overhead.
Possible improvements
So, considering several aspects, relevant optimizations and improvements can be made:
1. Add a regular local chunk validation mechanism
Based on this, when the corresponding chunk is requested, if it is found that the last validation result was incorrect, the "data mismatch" error can be directly returned to the caller without further validation.
2. Add validation at the BDT layer on the requester side
Currently, BDT does not have a step to validate the chunk hash during the transfer process of file and chunks. According to the design principle, the cyfs-stack layer should ensure that the chunk data requested from elsewhere is correct (similar to the download operation in Web 2 browsers). Therefore, it seems necessary for the BDT layer to provide this validation mechanism, at least as an optional option. @photosssa
The text was updated successfully, but these errors were encountered: