You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After downloading model and log line INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:121: Starting JinaBertModel model on Cpu, memory spikes to >8GB for a second
Memory usage after startup is down to <4GB and stays there
Expected behavior
The Container should not produce big memory spikes only during model load that can cause resource errors.
Otherwise Kubernetes Deployments may need to provision double the resources really needed for inference for each container, leading to a huge amount of unused memory capacity.
I tried to deploy this to a RH OpenShift cluster with hard pod memory limits of 4GB and failed because of this, although after startup the container never needs more than 4GB of memory for handling requests and inference, only on startup.
The text was updated successfully, but these errors were encountered:
This is probably related to the Implementation of JinaBert.
When trying a model with another architecture like intfloat/multilingual-e5-large, i don't get this behavior.
System Info
Image: v1.2 CPU
Model used: jinaai/jina-embeddings-v2-base-de
Deployment: Docker / RH OpenShift
Information
Tasks
Reproduction
INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:121: Starting JinaBertModel model on Cpu
, memory spikes to >8GB for a secondExpected behavior
The Container should not produce big memory spikes only during model load that can cause resource errors.
Otherwise Kubernetes Deployments may need to provision double the resources really needed for inference for each container, leading to a huge amount of unused memory capacity.
I tried to deploy this to a RH OpenShift cluster with hard pod memory limits of 4GB and failed because of this, although after startup the container never needs more than 4GB of memory for handling requests and inference, only on startup.
The text was updated successfully, but these errors were encountered: