Implement UpdateModel backend workflow #117
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Creates UpdateModel state machine implementation with frontend handler validations to force as many synchronous validations as possible before actually hitting the async workflow, which is a much user friendly experience, and easier to debug.
Model only polls for capacity between stopped and in-service states as the model is still functional in other scenarios. When starting a model, the state machine first waits for the desired number of instances to spin up, THEN it waits for the user-defined warmup time before adding the model to LiteLLM, that way customers can't try to make inference requires prior to the models actually spinning up. the model instances pop up healthy before the models are fully initialized, so this wait is necessary to ensure that the models are working before we open them back up to requests.
known issues:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.