High memory usage despite memory mapping #36412
Replies: 4 comments 1 reply
-
I have read other related discussions such as #33721, but it is not clear what needs to be done to bring the memory usage down. |
Beta Was this translation helpful? Give feedback.
-
Basically, milvus is an in-memory database. In-memory index can get the best search performance. |
Beta Was this translation helpful? Give feedback.
-
Hey @yhmo,
This link says that either or both of data and index files can be memory-mapped.
Out application is extremely sensitive to precision, so cannot go below
So, does that mean that in order to see benefits - both data and index need to be memory mapped? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Thanks, I am able to make memory mapping work on larger collections. I added the following section in extraConfigFiles:
user.yaml: |+
queryNode:
enableDisk: true
cache:
memoryLimit: 536870912
mmap:
mmapEnabled: true
vectorField: true
vectorIndex: true
scalarField: true
scalarIndex: true
growingMmapEnabled: true Secondly, while loading the collection, I make sure that both the collection and the index is loaded in memory-mapped fashion: collection = Collection(name=collection_name)
collection.set_properties({'mmap.enabled': True})
collection.alter_index(
index_name="embedding",
extra_params={"mmap.enabled": True}
)
client.load_collection(collection_name=collection_name) By setting Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi,
I have a sample dataset of
2M
vectors with512
dimensions, each vector being afloat32
. I also have anid
field that isint64
.So, the size of raw vectors is 2M * 512 * 4 = 4GB.
The collection is memory-mapped.
I have 3 such collections with
FLAT
,IVF_FLAT
, andHNSW
indexes on the vector field.Releasing the collection does clear the memory - so clearly the data resides on the disk.
Pleaes note
queryNode.mmap.mmapEnabled
is set totrue
invalues.yaml
.This is how I'm creating the collections:
Here are the peak CPU/memory usage while ingesting 2M vector data:
Simply loading (and subsequently releasing) these collections has the following peak CPU/memory usage:
Here are the peak CPU/memory usage while querying 100K queries (batch_size=100):
This is a test setup, so we just have one replica count for index/data/query nodes. But, in production, we plan to have a distributed cluster with multiple replicas.
For the production use case, we have collections with 100M vectors, so we cannot have a scenario where all the data is loaded in the memory despite memory mapping, as that will be cost-prohibitive.
PS: Please note, I'm only memory mapping the data and not the index.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions