Merge branch 'develop' into dependabot/pip/lib/serve/rest-api/src/cry…

…ptography-43.0.1
awslabs · Sep 6, 2024 · 42666b5 · 42666b5
2 parents 472ece0 + 84fb90c
commit 42666b5
Show file tree

Hide file tree

Showing 31 changed files with 215 additions and 493 deletions.
diff --git a/README.md b/README.md
@@ -155,7 +155,7 @@ permissions to the "REST-Role" that was created in the IAM stack:
 ```
 
 After adding those permissions and access in the VPC, LiteLLM will now be able to route traffic to those entities, and
-they will be accessible through the LISA ALB, using the OpenAI specification for programmatic access.
+they will be accessible through the LISA API Gateway, using the OpenAI specification for programmatic access.
 
 #### Recommended Configuration Options
 
@@ -206,26 +206,6 @@ dev:
           model_type: embedding
 ```
 
-### DEV ONLY: Create Self-Signed Certificates for ALB
-
-**WARNING: THIS IS FOR DEV ONLY**
-
-When deploying for dev and testing you can use a self-signed certificate for the REST API ALB. You can create this by using the script: `gen-cert.sh` and uploading it to `IAM`.
-
-```
-export REGION=<region>
-./scripts/gen-certs.sh
-aws iam upload-server-certificate --server-certificate-name <certificate-name> --certificate-body file://scripts/server.pem --private-key file://scripts/server.key
-```
-
-And you will need to update the ALB certificate path in the config.yaml file:
-
-```yaml
-restApiConfig:
-  loadBalancerConfig:
-    sslCertIamArn: arn:aws:iam::<account-number>:server-certificate/<certificate-name>
-```
-
 ### Customize Configuration
 
 The [config.yaml](./config.yaml) file has many parameters and many of them can be left as defaults but it's important to discuss a few key ones.
@@ -347,11 +327,11 @@ pytest lisa-sdk/tests --url <rest-url-from-cdk-output> --verify <path-to-server.
 
 ## Programmatic API Tokens
 
-The LISA Serve ALB can be used for programmatic access outside the example Chat application.
+The LISA API Gateway can be used for programmatic access outside the example Chat application.
 An example use case would be for allowing LISA to serve LLM requests that originate from the [Continue VSCode Plugin](https://www.continue.dev/).
-To facilitate communication directly with the LISA Serve ALB, a user with sufficient DynamoDB PutItem permissions may add
+To facilitate communication directly with the LISA API Gateway, a user with sufficient DynamoDB PutItem permissions may add
 API keys to the APITokenTable, and once created, a user may make requests by including the `Authorization: Bearer ${token}`
-header or the `Api-Key: ${token}` header with that token. If using any OpenAI-compatible library, the `api_key` fields
+header with that token. If using any OpenAI-compatible library, the `api_key` fields
 will use the `Authorization: Bearer ${token}` format automatically, so there is no need to include additional headers
 when using those libraries.
 
@@ -492,9 +472,8 @@ window.env = {
   ADMIN_GROUP: '<The admin group you would like LISA to check the JWT token for>',
   CUSTOM_SCOPES:[<add your optional list of custom scopes to pull groups from your IdP here>],
   // Alternatively you can set this to be your REST api elb endpoint
-  RESTAPI_URI: 'http://localhost:8080/',
+  API_BASE_URL: 'https://${deployment_id}.execute-api.${regional_domain}/${deployment_stage}',
   RESTAPI_VERSION: 'v2',
-  SESSION_REST_API_URI: '<API GW session endpoint>',
   "MODELS": [
     {
       "model": "streaming-textgen-model",
@@ -546,33 +525,33 @@ routes as long as your underlying models can also respond to them.
 By supporting the OpenAI spec, we can more easily allow users to integrate their collection of models into their LLM applications and workflows. In LISA, users can authenticate
 using their OpenID Connect Identity Provider, or with an API token created through the DynamoDB token workflow as described [here](#programmatic-api-tokens). Once the token
 is retrieved, users can use that in direct requests to the LISA Serve REST API. If using the IdP, users must set the 'Authorization' header, otherwise if using the API token,
-users can set either the 'Api-Key' header or the 'Authorization' header. After that, requests to `https://${lisa_serve_alb}/v2/serve` will handle the OpenAI API calls. As an example, the following call can list all
-models that LISA is aware of, assuming usage of the API token. If you are using a self-signed cert, you must also provide the `--cacert $path` option to specify a CA bundle to trust for SSL verification.
+users can set the 'Authorization' header. After that, requests to `https://${lisa_api_gateway}/llm/v2/serve` will handle the OpenAI API calls. As an example, the following call can list all
+models that LISA is aware of, assuming usage of the API token.
 
 ```shell
-curl -s -H 'Api-Key: your-token' -X GET https://${lisa_serve_alb}/v2/serve/models
+curl -s -H 'Authorization: Bearer your-api-key' -X GET https://${lisa_api_gateway}/llm/v2/serve/models
 ```
 
 If using the IdP, the request would look like the following:
 
 ```shell
-curl -s -H 'Authorization: Bearer your-token' -X GET https://${lisa_serve_alb}/v2/serve/models
+curl -s -H 'Authorization: Bearer your-token' -X GET https://${lisa_api_gateway}/llm/v2/serve/models
 ```
 
-When using a library that requests an OpenAI-compatible base_url, you can provide `https://${lisa_serve_alb}/v2/serve` here. All of the OpenAI routes will
+When using a library that requests an OpenAI-compatible base_url, you can provide `https://${lisa_api_gateway}/llm/v2/serve` here. All of the OpenAI routes will
 automatically be added to the base URL, just as we appended `/models` to the `/v2/serve` route for listing all models tracked by LISA.
 
 #### Continue JetBrains and VS Code Plugin
 
 For developers that desire an LLM assistant to help with programming tasks, we support adding LISA as an LLM provider for the [Continue plugin](https://www.continue.dev).
 To add LISA as a provider, open up the Continue plugin's `config.json` file and locate the `models` list. In this list, add the following block, replacing the placeholder URL
-with your own REST API domain or ALB. The `/v2/serve` is required at the end of the `apiBase`. This configuration requires an API token as created through the [DynamoDB workflow](#programmatic-api-tokens).
+with your own REST API domain. The `/llm/v2/serve` is required at the end of the `apiBase`. This configuration requires an API token as created through the [DynamoDB workflow](#programmatic-api-tokens).
 
 ```json
 {
   "model": "AUTODETECT",
   "title": "LISA",
-  "apiBase": "https://<lisa_serve_alb>/v2/serve",
+  "apiBase": "https://<lisa_api_gateway>/llm/v2/serve",
   "provider": "openai",
   "apiKey": "your-api-token" // pragma: allowlist-secret
 }
@@ -600,27 +579,19 @@ client.models.list()
 
 To use the models being served by LISA, the client needs only a few changes:
 
-1. Specify the `base_url` as the LISA Serve ALB, using the /v2/serve route at the end, similar to the apiBase in the [Continue example](#continue-jetbrains-and-vs-code-plugin)
+1. Specify the `base_url` as the LISA API Gateway, using the /llm/v2/serve route at the end, similar to the apiBase in the [Continue example](#continue-jetbrains-and-vs-code-plugin)
 2. Add the API key that you generated from the [token generation steps](#programmatic-api-tokens) as your `api_key` field.
-3. If using a self-signed cert, you must provide a certificate path for validating SSL. If you're using an ACM or public cert, then this may be omitted.
-   1. We provide a convenience function in the `lisa-sdk` for generating a cert path from an IAM certificate ARN if one is provided in the `RESTAPI_SSL_CERT_ARN` environment variable.
 
 The Code block will now look like this and you can continue to use the library without any other modifications.
 
 ```python
-# for self-signed certificates
-import boto3
-from lisapy.utils import get_cert_path
 # main client library
 from openai import DefaultHttpxClient, OpenAI
 
-iam_client = boto3.client("iam")
-cert_path = get_cert_path(iam_client)
-
 client = OpenAI(
   api_key="my_key", # pragma: allowlist-secret not a real key
-  base_url="https://<lisa_serve_alb>/v2/serve",
-  http_client=DefaultHttpxClient(verify=cert_path), # needed for self-signed certs on your ALB, can be omitted otherwise
+  base_url="https://<lisa_api_gw>/llm/v2/serve",
+  http_client=DefaultHttpxClient(),
 )
 client.models.list()
 ```

diff --git a/ecs_model_deployer/src/index.ts b/ecs_model_deployer/src/index.ts
@@ -17,9 +17,6 @@
 import { spawnSync, spawn, ChildProcess } from 'child_process';
 import { readdirSync, symlinkSync, rmSync } from 'fs';
 
-const ACTION_DEPLOY = 'deploy';
-const ACTION_DESTROY = 'destroy';
-
 /*
   cdk CLI always wants ./ to be writable in order to write cdk.context.json.
   This should really be an environment variable or something, but this function
@@ -54,17 +51,9 @@ const createWritableEnv = () => {
 };
 
 export const handler = async (event: any) => {
-    if (!event.action) {
-        console.log(`action not provided in ${JSON.stringify(event)}`);
-        throw new Error('action not provided');
-    } else if ( ![ACTION_DESTROY, ACTION_DEPLOY].includes(event.action) ) {
-        console.log(`Invalid action ${event.action}`);
-        throw new Error(`Invalid action ${event.action}`);
-    }
-
     if (!event.modelConfig) {
         console.log(`modelConfig not provided in ${JSON.stringify(event)}`);
-        throw new Error('modeConfig not provided');
+        throw new Error('modelConfig not provided');
     }
     const modelConfig = event.modelConfig;
     process.env['LISA_MODEL_CONFIG'] = JSON.stringify(modelConfig);
@@ -79,65 +68,42 @@ export const handler = async (event: any) => {
 
     const ret = spawnSync('./node_modules/aws-cdk/bin/cdk', ['synth', '-o', '/tmp/cdk.out']);
 
-    let stdout = String(ret.output[1]);
-    let stderr = String(ret.output[2]);
+    const stderr = String(ret.output[2]);
     if ( ret.status !== 0 ) {
         console.log(`cdk synth failed with stderr: ${stderr}`);
         throw new Error('Stack failed to synthesize');
     }
 
 
     const stackName = `${config.deploymentName}-${modelConfig.modelId}`;
-    if ( event.action === ACTION_DEPLOY ) {
-        const deploy_promise: Promise<ChildProcess | undefined> = new Promise( (resolve) => {
-            const cp = spawn('./node_modules/aws-cdk/bin/cdk', ['deploy', stackName, '-o', '/tmp/cdk.out']);
-
-            cp.on('close', (code) => {
-                console.log(`cdk deploy exited early, code ${code}`);
-                resolve(cp);
-            });
-
-            cp.stdout.on('data', (data) => {
-                console.log(`Got data: ${data}`);
-            });
-
-            cp.stderr.on('data', (data) => {
-                console.log(`Got err data: ${data}`);
-            });
-
-            setTimeout(() => {
-                console.log('180 second timeout');
-                resolve(undefined);
-            }, 180 * 1000);
+    const deploy_promise: Promise<ChildProcess | undefined> = new Promise( (resolve) => {
+        const cp = spawn('./node_modules/aws-cdk/bin/cdk', ['deploy', stackName, '-o', '/tmp/cdk.out']);
+
+        cp.on('close', (code) => {
+            console.log(`cdk deploy exited early, code ${code}`);
+            resolve(cp);
         });
 
-        const cp = await deploy_promise;
-        if ( cp ) {
-            if ( cp.exitCode !== 0 ) {
-                throw new Error('Stack failed to deploy');
-            }
-        }
-    } else if ( event.action === ACTION_DESTROY ) {
-        const deploy_promise: Promise<Number> = new Promise( (resolve) => {
-            const cp = spawn('./node_modules/aws-cdk/bin/cdk', ['destroy', '-f', stackName, '-o', '/tmp/cdk.out']);
-
-            cp.on('close', (code) => {
-                resolve(code ?? -1);
-            });
-
-            setTimeout(() => {
-                console.log('60 second timeout');
-                resolve(0);
-            }, 180 * 1000);
+        cp.stdout.on('data', (data) => {
+            console.log(`Got data: ${data}`);
+        });
+
+        cp.stderr.on('data', (data) => {
+            console.log(`Got err data: ${data}`);
         });
 
-        const exitCode = await deploy_promise;
-        stdout = String(ret.output[1]);
-        stderr = String(ret.output[2]);
-        if ( exitCode !== 0 ) {
-            console.log(`cdk destroy failed with stdout: ${stdout}, stderr: ${stderr}`);
-            throw new Error('Stack failed to destroy');
+        setTimeout(() => {
+            console.log('180 second timeout');
+            resolve(undefined);
+        }, 180 * 1000);
+    });
+
+    const cp = await deploy_promise;
+    if ( cp ) {
+        if ( cp.exitCode !== 0 ) {
+            throw new Error('Stack failed to deploy');
         }
     }
-    return stackName;
+
+    return {stackName: stackName};
 };
diff --git a/ecs_model_deployer/src/lib/ecs-model.ts b/ecs_model_deployer/src/lib/ecs-model.ts
@@ -68,7 +68,6 @@ export class EcsModel extends Construct {
                 environment: this.getEnvironmentVariables(config, modelConfig),
                 identifier: getModelIdentifier(modelConfig),
                 instanceType: modelConfig.instanceType,
-                internetFacing: false,
                 loadBalancerConfig: modelConfig.loadBalancerConfig,
             },
             securityGroup,

diff --git a/ecs_model_deployer/src/lib/ecsCluster.ts b/ecs_model_deployer/src/lib/ecsCluster.ts
@@ -271,7 +271,7 @@ export class ECSCluster extends Construct {
         // Create application load balancer
         const loadBalancer = new ApplicationLoadBalancer(this, createCdkId([ecsConfig.identifier, 'ALB']), {
             deletionProtection: config.removalPolicy !== RemovalPolicy.DESTROY,
-            internetFacing: ecsConfig.internetFacing,
+            internetFacing: false,
             loadBalancerName: createCdkId([config.deploymentName, ecsConfig.identifier], 32, 2),
             dropInvalidHeaderFields: true,
             securityGroup,
@@ -280,18 +280,14 @@ export class ECSCluster extends Construct {
 
         // Add listener
         const listenerProps: BaseApplicationListenerProps = {
-            port: ecsConfig.loadBalancerConfig.sslCertIamArn ? 443 : 80,
-            open: ecsConfig.internetFacing,
-            certificates: ecsConfig.loadBalancerConfig.sslCertIamArn
-                ? [{ certificateArn: ecsConfig.loadBalancerConfig.sslCertIamArn }]
-                : undefined,
+            port: 80,
+            open: false,
         };
 
         const listener = loadBalancer.addListener(
             createCdkId([ecsConfig.identifier, 'ApplicationListener']),
             listenerProps,
         );
-        const protocol = listenerProps.port === 443 ? 'https' : 'http';
 
         // Add targets
         const loadBalancerHealthCheckConfig = ecsConfig.loadBalancerConfig.healthCheckConfig;
@@ -311,7 +307,7 @@ export class ECSCluster extends Construct {
         // ALB metric for ASG to use for auto scaling EC2 instances
         // TODO: Update this to step scaling for embedding models??
         const requestCountPerTargetMetric = new Metric({
-            metricName: ecsConfig.autoScalingConfig.metricConfig.AlbMetricName,
+            metricName: ecsConfig.autoScalingConfig.metricConfig.albMetricName,
             namespace: 'AWS/ApplicationELB',
             dimensionsMap: {
                 TargetGroup: targetGroup.targetGroupFullName,
@@ -332,7 +328,7 @@ export class ECSCluster extends Construct {
       ecsConfig.loadBalancerConfig.domainName !== null
           ? ecsConfig.loadBalancerConfig.domainName
           : loadBalancer.loadBalancerDnsName;
-        const endpoint = `${protocol}://${domain}`;
+        const endpoint = `http://${domain}`;
         this.endpointUrl = endpoint;
 
         // Update

diff --git a/ecs_model_deployer/src/lib/schema.ts b/ecs_model_deployer/src/lib/schema.ts
@@ -379,12 +379,10 @@ const HealthCheckConfigSchema = z.object({
 /**
  * Configuration schema for the load balancer.
  *
- * @property {string} [sslCertIamArn=null] - SSL certificate IAM ARN for load balancer.
  * @property {HealthCheckConfig} healthCheckConfig - Health check configuration for the load balancer.
  * @property {string} domainName - Domain name to use instead of the load balancer's default DNS name.
  */
 const LoadBalancerConfigSchema = z.object({
-    sslCertIamArn: z.string().optional().nullable().default(null),
     healthCheckConfig: HealthCheckConfigSchema,
     domainName: z.string().optional().nullable().default(null),
 });
@@ -400,7 +398,7 @@ const LoadBalancerConfigSchema = z.object({
  *
  */
 const MetricConfigSchema = z.object({
-    AlbMetricName: z.string(),
+    albMetricName: z.string(),
     targetValue: z.number(),
     duration: z.number().default(60),
     estimatedInstanceWarmup: z.number().min(0).default(180),
@@ -439,7 +437,6 @@ const AutoScalingConfigSchema = z.object({
  * @property {Record<string,string>} environment - Environment variables set on the task container
  * @property {identifier} modelType - Unique identifier for the cluster which will be used when naming resources
  * @property {string} instanceType - EC2 instance type for running the model.
- * @property {boolean} [internetFacing=false] - Whether or not the cluster will be configured as internet facing
  * @property {LoadBalancerConfig} loadBalancerConfig - Configuration for load balancer settings.
  */
 const EcsBaseConfigSchema = z.object({
@@ -451,7 +448,6 @@ const EcsBaseConfigSchema = z.object({
     environment: z.record(z.string()),
     identifier: z.string(),
     instanceType: z.enum(VALID_INSTANCE_KEYS),
-    internetFacing: z.boolean().default(false),
     loadBalancerConfig: LoadBalancerConfigSchema,
 });
 

diff --git a/example_config.yaml b/example_config.yaml
@@ -81,9 +81,7 @@ dev:
         targetValue: 1000
         duration: 60
         estimatedInstanceWarmup: 30
-    internetFacing: true
     loadBalancerConfig:
-      sslCertIamArn: arn:aws:iam::012345678901:server-certificate/lisa-self-signed-dev
       healthCheckConfig:
         path: /health
         interval: 60

diff --git a/lambda/dockerimagebuilder/__init__.py b/lambda/dockerimagebuilder/__init__.py
@@ -28,10 +28,21 @@
 mkdir /home/ec2-user/docker_resources
 aws --region ${AWS_REGION} s3 sync s3://{{BUCKET_NAME}} /home/ec2-user/docker_resources
 cd /home/ec2-user/docker_resources/{{LAYER_TO_ADD}}
-docker build -t {{IMAGE_ID}} --build-arg BASE_IMAGE={{BASE_IMAGE}} --build-arg MOUNTS3_DEB_URL={{MOUNTS3_DEB_URL}} .
-docker tag {{IMAGE_ID}} {{ECR_URI}}:{{IMAGE_ID}}
-aws --region ${AWS_REGION} ecr get-login-password | docker login --username AWS --password-stdin {{ECR_URI}}
-docker push {{ECR_URI}}:{{IMAGE_ID}}
+
+while [ 1 ]; do
+    shutdown -c;
+    sleep 5;
+done &
+
+function buildTagPush() {
+    docker build -t {{IMAGE_ID}} --build-arg BASE_IMAGE={{BASE_IMAGE}} --build-arg MOUNTS3_DEB_URL={{MOUNTS3_DEB_URL}} . && \
+    docker tag {{IMAGE_ID}} {{ECR_URI}}:{{IMAGE_ID}} && \
+    aws --region ${AWS_REGION} ecr get-login-password | docker login --username AWS --password-stdin {{ECR_URI}} && \
+    docker push {{ECR_URI}}:{{IMAGE_ID}}
+    return $?
+}
+
+(r=3;while ! buildTagPush ; do ((--r))||exit;sleep 10; done)
 """
 
 

diff --git a/lambda/models/clients/litellm_client.py b/lambda/models/clients/litellm_client.py
@@ -43,7 +43,6 @@ def list_models(self) -> List[Dict[str, Any]]:
             self._base_uri + "/model/info",
             headers=self._headers,
             timeout=self._timeout,
-            verify=self._verify,
         )
         all_models = resp.json()
         models_list: List[Dict[str, Any]] = all_models["data"]