Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DockerImageBuild state machine implementation #65

Merged
merged 1 commit into from
Sep 6, 2024

Conversation

stephensmith-aws
Copy link
Contributor

Description of changes:

Implement docker image build piece of state machine



def handle_set_model_to_creating(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
"""Set DDB entry to CREATING status."""
output_dict = deepcopy(event)
output_dict["create_infra"] = True
output_dict["create_infra"] = 'modelConfig' in event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean now- for these, we can use the names as they're defined in the CreateModelRequest, which we can pass as direct input into the state machine. I would suggest a few things here:

  1. Let's define any aws clients at the top of the file, similar to how they're done here:
    cloudformation = boto3.client("cloudformation", region_name=os.environ["AWS_REGION"], config=retry_config)
    dynamodb = boto3.resource("dynamodb", region_name=os.environ["AWS_REGION"], config=retry_config)
    ddb_table = dynamodb.Table(os.environ["MODEL_TABLE_NAME"])
    (these variables are already set from the CDK definition- we can add more variables as needed)
  2. For this create_infra variable, let's make it be True if all of AutoScalingConfig, ContainerConfig, InferenceContainer, InstanceType, and LoadBalancerConfig are not None. For False, all of these need to be None, and then let's throw an exception if they're partially configured (some null, some not). Basically we have two use cases:
    1. We define the ECS infrastructure which is all of those fields, and ModelUrl must also be none
    2. We define a LiteLLM-only entry, which requires all of the others to be null, and we'd set the ModelUrl as specified in the request. For the initial revision here, let's just worry about the first case.
  3. In this step specifically, let's make sure that we set the model state to Creating in DDB, and we can add other details, like last modified date, the entire request blob, and anything else that seems like it'll be useful to have around. We can also add all that to the output_dict so that future states can have that info without calling out to ddb

lib/models/ecs-model-deployer.ts Outdated Show resolved Hide resolved
lambda/dockerimagebuilder/__init__.py Show resolved Hide resolved
return {
"instance_id": instances[0].instance_id,
"image_tag": image_tag
}
except ClientError as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which errors we're expecting, but I think we can just let them boil out to the state machine. From the state machine definition after it's deployed, it automatically handles some form of sdk errors, but I'm not 100% sure if those are strictly lambda or if they're any sdk call:

      "Retry": [
        {
          "ErrorEquals": [
            "Lambda.ClientExecutionTimeoutException",
            "Lambda.ServiceException",
            "Lambda.AWSLambdaException",
            "Lambda.SdkClientException"
          ],
          "IntervalSeconds": 2,
          "MaxAttempts": 6,
          "BackoffRate": 2
        }
      ],

lambda/models/state_machine/create_model.py Outdated Show resolved Hide resolved
return output_dict


def handle_poll_docker_image_available(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
"""Check that Docker image is available in account or not."""
output_dict = deepcopy(event)

ecrClient = boto3.client("ecr")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same note for clients, let's define them at the top of the file with the region and retry config

lambda/models/state_machine/create_model.py Outdated Show resolved Hide resolved
lambda/models/state_machine/create_model.py Outdated Show resolved Hide resolved
@stephensmith-aws stephensmith-aws force-pushed the feature/create-model-workflow-impl branch 9 times, most recently from 4acff3c to 092c531 Compare September 5, 2024 22:07
@@ -80,7 +80,7 @@ repos:
- --docstring-convention=numpy
- --max-line-length=120
- --extend-immutable-calls=Query,fastapi.Depends,fastapi.params.Depends
- --ignore=B008 # Ignore error for function calls in argument defaults
- --ignore=B008,W503 # Ignore error for function calls in argument defaults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably keep W503 to keep with the pep standards https://www.flake8rules.com/rules/W503.html

where did this happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happened in create model state machine implementation. One of our formatters wants to put each condition on its own line, and flake8 doesn't. Couldn't make them both happy at the same time.
lambda/models/state-machine/create-model.py:38

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh for that we could try putting the and's at the end of the line. i don't think there's anything wrong with needing the backslash at the end to complete the line either. otherwise we can just do an inline ignore so we ignore the one line, but not the rest. so adding a comment like # noqa: W503 on the specific line could work

@stephensmith-aws stephensmith-aws force-pushed the feature/create-model-workflow-impl branch 3 times, most recently from 12567c3 to a15b7d1 Compare September 5, 2024 23:26
@stephensmith-aws stephensmith-aws force-pushed the feature/create-model-workflow-impl branch from a15b7d1 to 5e86546 Compare September 5, 2024 23:29
@stephensmith-aws stephensmith-aws changed the title [WIP] DockerImageBuild state machine implementation DockerImageBuild state machine implementation Sep 5, 2024
@stephensmith-aws stephensmith-aws marked this pull request as ready for review September 5, 2024 23:29
@stephensmith-aws stephensmith-aws force-pushed the feature/create-model-workflow-impl branch 2 times, most recently from ec479b6 to f0ba830 Compare September 6, 2024 03:11
@stephensmith-aws stephensmith-aws force-pushed the feature/create-model-workflow-impl branch from f0ba830 to bdbac48 Compare September 6, 2024 03:48
@stephensmith-aws stephensmith-aws force-pushed the feature/create-model-workflow-impl branch from bdbac48 to 8cbfa9b Compare September 6, 2024 03:50
@petermuller petermuller merged commit 08600ea into develop Sep 6, 2024
4 checks passed
@petermuller petermuller deleted the feature/create-model-workflow-impl branch September 6, 2024 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants