Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Image: High memory usage on startup #303

Open
2 of 4 tasks
freinold opened this issue Jun 24, 2024 · 1 comment
Open
2 of 4 tasks

CPU Image: High memory usage on startup #303

freinold opened this issue Jun 24, 2024 · 1 comment

Comments

@freinold
Copy link

System Info

Image: v1.2 CPU
Model used: jinaai/jina-embeddings-v2-base-de
Deployment: Docker / RH OpenShift

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

  1. Run the CPU image with following compose.yaml
version: '3.8'
name: test-tei
services:
  tei:
    image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.2
    command: ["--tokenization-workers", "1"]
    environment:
      MODEL_ID: "jinaai/jina-embeddings-v2-base-de"
      REVISION: "5078d9924a7b3bdd9556928fcfc08b8de041bfc1"
      MAX_CLIENT_BATCH_SIZE: 64
    volumes:
      - ./tei-docker-data:/data
    ports:
      - "8081:80"
  1. Monitor memory usage (e.g. via Docker Desktop)
  2. After downloading model and log line INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:121: Starting JinaBertModel model on Cpu, memory spikes to >8GB for a second
  3. Memory usage after startup is down to <4GB and stays there

Expected behavior

The Container should not produce big memory spikes only during model load that can cause resource errors.
Otherwise Kubernetes Deployments may need to provision double the resources really needed for inference for each container, leading to a huge amount of unused memory capacity.

I tried to deploy this to a RH OpenShift cluster with hard pod memory limits of 4GB and failed because of this, although after startup the container never needs more than 4GB of memory for handling requests and inference, only on startup.

@freinold
Copy link
Author

This is probably related to the Implementation of JinaBert.
When trying a model with another architecture like intfloat/multilingual-e5-large, i don't get this behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant