Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw vectors data layer in HNSW + move to base class #523

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

alonre24
Copy link
Collaborator

@alonre24 alonre24 commented Aug 12, 2024

Describe the changes in the pull request

Use the new RawDataContainer interface in HNSW, currently with an explicit DataBlocksContainer implementation, and move the abstract vectors member to the base class.

This includes:

  • Moving the relevant serialization part (save/restore) of the vectors in HNSW into the DataBlocksContainer responsibility, as we should not access the blocks directly anymore (should be applied for the graph data blocks later on as well).

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Copy link

codecov bot commented Nov 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.01%. Comparing base (1381f64) to head (a2c9145).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #523      +/-   ##
==========================================
+ Coverage   96.93%   97.01%   +0.07%     
==========================================
  Files         100      100              
  Lines        5287     5291       +4     
==========================================
+ Hits         5125     5133       +8     
+ Misses        162      158       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -1673,8 +1660,7 @@ void HNSWIndex<DataType, DistType>::removeAndSwap(idType internalId) {

// Get the last element's metadata and data.
// If we are deleting the last element, we already destroyed it's metadata.
DataBlock &last_vector_block = vectorBlocks.back();
auto last_element_data = last_vector_block.removeAndFetchLastElement();
auto *last_element_data = this->vectors->getElement(curElementCount);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the API here

Suggested change
auto *last_element_data = this->vectors->getElement(curElementCount);
auto *last_element_data = getDataByInternalId(curElementCount);

void DataBlocksContainer::saveBlocks(std::ostream &output) const {
// Save number of blocks
unsigned int num_blocks = this->numBlocks();
Serializer::writeBinaryPOD(output, num_blocks);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider only saving the vectors without the metadata about the number of blocks and their sizes, so we can load them into other containers (or to different block sizes)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means we don't need to add serialization to the container class, keeping it on the algorithm level

assert(VecSimType_sizeof(vecType));
this->vectors = new (this->allocator) DataBlocksContainer(
this->blockSize, this->dataSize, this->allocator, this->preprocessors->getAlignment());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this->blockSize, this->dataSize, this->allocator, this->preprocessors->getAlignment());
this->blockSize, this->dataSize, this->allocator, this->getAlignment());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants