-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raw vectors data layer in HNSW + move to base class #523
base: main
Are you sure you want to change the base?
Conversation
…estruction to base
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #523 +/- ##
==========================================
+ Coverage 96.93% 97.01% +0.07%
==========================================
Files 100 100
Lines 5287 5291 +4
==========================================
+ Hits 5125 5133 +8
+ Misses 162 158 -4 ☔ View full report in Codecov by Sentry. |
@@ -1673,8 +1660,7 @@ void HNSWIndex<DataType, DistType>::removeAndSwap(idType internalId) { | |||
|
|||
// Get the last element's metadata and data. | |||
// If we are deleting the last element, we already destroyed it's metadata. | |||
DataBlock &last_vector_block = vectorBlocks.back(); | |||
auto last_element_data = last_vector_block.removeAndFetchLastElement(); | |||
auto *last_element_data = this->vectors->getElement(curElementCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using the API here
auto *last_element_data = this->vectors->getElement(curElementCount); | |
auto *last_element_data = getDataByInternalId(curElementCount); |
void DataBlocksContainer::saveBlocks(std::ostream &output) const { | ||
// Save number of blocks | ||
unsigned int num_blocks = this->numBlocks(); | ||
Serializer::writeBinaryPOD(output, num_blocks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider only saving the vectors without the metadata about the number of blocks and their sizes, so we can load them into other containers (or to different block sizes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also means we don't need to add serialization to the container class, keeping it on the algorithm level
assert(VecSimType_sizeof(vecType)); | ||
this->vectors = new (this->allocator) DataBlocksContainer( | ||
this->blockSize, this->dataSize, this->allocator, this->preprocessors->getAlignment()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this->blockSize, this->dataSize, this->allocator, this->preprocessors->getAlignment()); | |
this->blockSize, this->dataSize, this->allocator, this->getAlignment()); |
Describe the changes in the pull request
Use the new
RawDataContainer
interface in HNSW, currently with an explicitDataBlocksContainer
implementation, and move the abstractvectors
member to the base class.This includes:
DataBlocksContainer
responsibility, as we should not access the blocks directly anymore (should be applied for the graph data blocks later on as well).Mark if applicable