Add support for preloading models #822

alexnorell · 2024-11-20T21:15:32Z

Description

This PR introduces support for preloading models at startup and includes Kubernetes health check and readiness endpoints.

Key changes:

Added PRELOAD_MODELS environment variable to enable asynchronous preloading of specified models at server startup.
Implemented a /readiness endpoint for Kubernetes readiness probes to indicate when the server is ready to handle requests.
Added a /healthz endpoint for Kubernetes liveness probes to ensure the server is alive.
Updated http_api.py to handle model initialization with asynchronous tasks and readiness state tracking.
Update cpu and gpu builds:
- Allow for building and pushing to docker hub with custom tags
- move gpu build over to depot
- Create internal action to determine the list of tags to build for cpu and gpu builds
  - This can be rolled out to all other docker builds in the future.

Dependencies:

No new external dependencies added.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Locally by setting the environment variables

For example:

Tested model preloading functionality with a mock PRELOAD_MODELS to simulate loading multiple models.
Verified Kubernetes readiness and liveness endpoints using curl and simulated Kubernetes probes.
Also set invalid model ids and verified that this doesn't block startup.

Any specific deployment considerations

Ensure the PRELOAD_MODELS is properly configured with a comma-separated list of model IDs if preloading is required.
A valid API key must be stored in API_KEY
Update Kubernetes deployment manifests to include the new /readiness and /healthz probes.

Docs

Docs updated? What were the changes:
- Added information about the new environment variable PRELOAD_MODELS.
- Documented the readiness and health check endpoints for Kubernetes.

PawelPeczek-Roboflow

Seems to be a great extension, I do have one comment - the error in health seems to be non-recoverable one - once any of the model cannot be loaded (either temporally or by the virtue of actual problem) - service will never get better:

I am not sure if this is something that k8s by default would terminate after some time or maybe will fall in loops of reboots
also I am not 100% sure about the desired end in such scenario - what is the context of PR?

alexnorell · 2024-11-21T19:32:25Z

Seems to be a great extension, I do have one comment - the error in health seems to be non-recoverable one - once any of the model cannot be loaded (either temporally or by the virtue of actual problem) - service will never get better:

I am not sure if this is something that k8s by default would terminate after some time or maybe will fall in loops of reboots

also I am not 100% sure about the desired end in such scenario - what is the context of PR?

I've address some of the edge cases with this. The idea for these changes comes from the desire to preload models at startup rather than lazy loading in commonly used models.

For the K8s side:

/healthz should return positive status as soon as FastAPI is able to serve requests. We can add to this endpoint in the future to add additional states for health, but it is helpful to have at least something that lets us know the service is responding.
/readiness should return an error state until all the models at least have been attempted to be loaded. Once that has happened, K8s will start to move traffic to it. The intention with this change is to do a best effort initialization, but not prevent the service from running if it can't initialize.

PawelPeczek-Roboflow · 2024-11-22T11:31:13Z

ok, looks good - lmk if that is ready to be shipped

- Update GitHub Actions for deploying CPU and GPU containers

alexnorell · 2024-11-22T12:00:33Z

ok, looks good - lmk if that is ready to be shipped

Should be ready for review now

alexnorell requested review from PawelPeczek-Roboflow, grzegorz-roboflow, yeldarby, probicheaux and hansent as code owners November 20, 2024 21:15

alexnorell requested review from bigbitbus and isaacrob-roboflow November 20, 2024 21:16

alexnorell marked this pull request as draft November 20, 2024 21:17

alexnorell force-pushed the feature/default_model_load branch 5 times, most recently from 52ad9d5 to 3028769 Compare November 21, 2024 01:57

PawelPeczek-Roboflow reviewed Nov 21, 2024

View reviewed changes

Add support for preloading models

9df5b22

- Update GitHub Actions for deploying CPU and GPU containers

alexnorell force-pushed the feature/default_model_load branch from 18f1185 to 9df5b22 Compare November 22, 2024 11:59

alexnorell marked this pull request as ready for review November 22, 2024 12:00

PawelPeczek-Roboflow self-requested a review November 22, 2024 12:05

PawelPeczek-Roboflow approved these changes Nov 22, 2024

View reviewed changes

alexnorell merged commit a9bd1bf into main Nov 22, 2024
71 checks passed

alexnorell deleted the feature/default_model_load branch November 22, 2024 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for preloading models #822

Add support for preloading models #822

alexnorell commented Nov 20, 2024 •

edited

Loading

PawelPeczek-Roboflow left a comment •

edited

Loading

alexnorell commented Nov 21, 2024

PawelPeczek-Roboflow commented Nov 22, 2024

alexnorell commented Nov 22, 2024

Add support for preloading models #822

Add support for preloading models #822

Conversation

alexnorell commented Nov 20, 2024 • edited Loading

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

PawelPeczek-Roboflow left a comment • edited Loading

Choose a reason for hiding this comment

alexnorell commented Nov 21, 2024

PawelPeczek-Roboflow commented Nov 22, 2024

alexnorell commented Nov 22, 2024

alexnorell commented Nov 20, 2024 •

edited

Loading

PawelPeczek-Roboflow left a comment •

edited

Loading