Skip to content

Commit

Permalink
Merge branch 'develop' into dependabot/pip/lib/serve/rest-api/src/cry…
Browse files Browse the repository at this point in the history
…ptography-43.0.1
  • Loading branch information
estohlmann authored Sep 6, 2024
2 parents 472ece0 + 84fb90c commit 42666b5
Show file tree
Hide file tree
Showing 31 changed files with 215 additions and 493 deletions.
59 changes: 15 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ permissions to the "REST-Role" that was created in the IAM stack:
```

After adding those permissions and access in the VPC, LiteLLM will now be able to route traffic to those entities, and
they will be accessible through the LISA ALB, using the OpenAI specification for programmatic access.
they will be accessible through the LISA API Gateway, using the OpenAI specification for programmatic access.

#### Recommended Configuration Options

Expand Down Expand Up @@ -206,26 +206,6 @@ dev:
model_type: embedding
```
### DEV ONLY: Create Self-Signed Certificates for ALB
**WARNING: THIS IS FOR DEV ONLY**
When deploying for dev and testing you can use a self-signed certificate for the REST API ALB. You can create this by using the script: `gen-cert.sh` and uploading it to `IAM`.

```
export REGION=<region>
./scripts/gen-certs.sh
aws iam upload-server-certificate --server-certificate-name <certificate-name> --certificate-body file://scripts/server.pem --private-key file://scripts/server.key
```
And you will need to update the ALB certificate path in the config.yaml file:
```yaml
restApiConfig:
loadBalancerConfig:
sslCertIamArn: arn:aws:iam::<account-number>:server-certificate/<certificate-name>
```

### Customize Configuration
The [config.yaml](./config.yaml) file has many parameters and many of them can be left as defaults but it's important to discuss a few key ones.
Expand Down Expand Up @@ -347,11 +327,11 @@ pytest lisa-sdk/tests --url <rest-url-from-cdk-output> --verify <path-to-server.

## Programmatic API Tokens

The LISA Serve ALB can be used for programmatic access outside the example Chat application.
The LISA API Gateway can be used for programmatic access outside the example Chat application.
An example use case would be for allowing LISA to serve LLM requests that originate from the [Continue VSCode Plugin](https://www.continue.dev/).
To facilitate communication directly with the LISA Serve ALB, a user with sufficient DynamoDB PutItem permissions may add
To facilitate communication directly with the LISA API Gateway, a user with sufficient DynamoDB PutItem permissions may add
API keys to the APITokenTable, and once created, a user may make requests by including the `Authorization: Bearer ${token}`
header or the `Api-Key: ${token}` header with that token. If using any OpenAI-compatible library, the `api_key` fields
header with that token. If using any OpenAI-compatible library, the `api_key` fields
will use the `Authorization: Bearer ${token}` format automatically, so there is no need to include additional headers
when using those libraries.

Expand Down Expand Up @@ -492,9 +472,8 @@ window.env = {
ADMIN_GROUP: '<The admin group you would like LISA to check the JWT token for>',
CUSTOM_SCOPES:[<add your optional list of custom scopes to pull groups from your IdP here>],
// Alternatively you can set this to be your REST api elb endpoint
RESTAPI_URI: 'http://localhost:8080/',
API_BASE_URL: 'https://${deployment_id}.execute-api.${regional_domain}/${deployment_stage}',
RESTAPI_VERSION: 'v2',
SESSION_REST_API_URI: '<API GW session endpoint>',
"MODELS": [
{
"model": "streaming-textgen-model",
Expand Down Expand Up @@ -546,33 +525,33 @@ routes as long as your underlying models can also respond to them.
By supporting the OpenAI spec, we can more easily allow users to integrate their collection of models into their LLM applications and workflows. In LISA, users can authenticate
using their OpenID Connect Identity Provider, or with an API token created through the DynamoDB token workflow as described [here](#programmatic-api-tokens). Once the token
is retrieved, users can use that in direct requests to the LISA Serve REST API. If using the IdP, users must set the 'Authorization' header, otherwise if using the API token,
users can set either the 'Api-Key' header or the 'Authorization' header. After that, requests to `https://${lisa_serve_alb}/v2/serve` will handle the OpenAI API calls. As an example, the following call can list all
models that LISA is aware of, assuming usage of the API token. If you are using a self-signed cert, you must also provide the `--cacert $path` option to specify a CA bundle to trust for SSL verification.
users can set the 'Authorization' header. After that, requests to `https://${lisa_api_gateway}/llm/v2/serve` will handle the OpenAI API calls. As an example, the following call can list all
models that LISA is aware of, assuming usage of the API token.

```shell
curl -s -H 'Api-Key: your-token' -X GET https://${lisa_serve_alb}/v2/serve/models
curl -s -H 'Authorization: Bearer your-api-key' -X GET https://${lisa_api_gateway}/llm/v2/serve/models
```

If using the IdP, the request would look like the following:

```shell
curl -s -H 'Authorization: Bearer your-token' -X GET https://${lisa_serve_alb}/v2/serve/models
curl -s -H 'Authorization: Bearer your-token' -X GET https://${lisa_api_gateway}/llm/v2/serve/models
```

When using a library that requests an OpenAI-compatible base_url, you can provide `https://${lisa_serve_alb}/v2/serve` here. All of the OpenAI routes will
When using a library that requests an OpenAI-compatible base_url, you can provide `https://${lisa_api_gateway}/llm/v2/serve` here. All of the OpenAI routes will
automatically be added to the base URL, just as we appended `/models` to the `/v2/serve` route for listing all models tracked by LISA.

#### Continue JetBrains and VS Code Plugin

For developers that desire an LLM assistant to help with programming tasks, we support adding LISA as an LLM provider for the [Continue plugin](https://www.continue.dev).
To add LISA as a provider, open up the Continue plugin's `config.json` file and locate the `models` list. In this list, add the following block, replacing the placeholder URL
with your own REST API domain or ALB. The `/v2/serve` is required at the end of the `apiBase`. This configuration requires an API token as created through the [DynamoDB workflow](#programmatic-api-tokens).
with your own REST API domain. The `/llm/v2/serve` is required at the end of the `apiBase`. This configuration requires an API token as created through the [DynamoDB workflow](#programmatic-api-tokens).

```json
{
"model": "AUTODETECT",
"title": "LISA",
"apiBase": "https://<lisa_serve_alb>/v2/serve",
"apiBase": "https://<lisa_api_gateway>/llm/v2/serve",
"provider": "openai",
"apiKey": "your-api-token" // pragma: allowlist-secret
}
Expand Down Expand Up @@ -600,27 +579,19 @@ client.models.list()

To use the models being served by LISA, the client needs only a few changes:

1. Specify the `base_url` as the LISA Serve ALB, using the /v2/serve route at the end, similar to the apiBase in the [Continue example](#continue-jetbrains-and-vs-code-plugin)
1. Specify the `base_url` as the LISA API Gateway, using the /llm/v2/serve route at the end, similar to the apiBase in the [Continue example](#continue-jetbrains-and-vs-code-plugin)
2. Add the API key that you generated from the [token generation steps](#programmatic-api-tokens) as your `api_key` field.
3. If using a self-signed cert, you must provide a certificate path for validating SSL. If you're using an ACM or public cert, then this may be omitted.
1. We provide a convenience function in the `lisa-sdk` for generating a cert path from an IAM certificate ARN if one is provided in the `RESTAPI_SSL_CERT_ARN` environment variable.

The Code block will now look like this and you can continue to use the library without any other modifications.

```python
# for self-signed certificates
import boto3
from lisapy.utils import get_cert_path
# main client library
from openai import DefaultHttpxClient, OpenAI

iam_client = boto3.client("iam")
cert_path = get_cert_path(iam_client)

client = OpenAI(
api_key="my_key", # pragma: allowlist-secret not a real key
base_url="https://<lisa_serve_alb>/v2/serve",
http_client=DefaultHttpxClient(verify=cert_path), # needed for self-signed certs on your ALB, can be omitted otherwise
base_url="https://<lisa_api_gw>/llm/v2/serve",
http_client=DefaultHttpxClient(),
)
client.models.list()
```
Expand Down
86 changes: 26 additions & 60 deletions ecs_model_deployer/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,6 @@
import { spawnSync, spawn, ChildProcess } from 'child_process';
import { readdirSync, symlinkSync, rmSync } from 'fs';

const ACTION_DEPLOY = 'deploy';
const ACTION_DESTROY = 'destroy';

/*
cdk CLI always wants ./ to be writable in order to write cdk.context.json.
This should really be an environment variable or something, but this function
Expand Down Expand Up @@ -54,17 +51,9 @@ const createWritableEnv = () => {
};

export const handler = async (event: any) => {
if (!event.action) {
console.log(`action not provided in ${JSON.stringify(event)}`);
throw new Error('action not provided');
} else if ( ![ACTION_DESTROY, ACTION_DEPLOY].includes(event.action) ) {
console.log(`Invalid action ${event.action}`);
throw new Error(`Invalid action ${event.action}`);
}

if (!event.modelConfig) {
console.log(`modelConfig not provided in ${JSON.stringify(event)}`);
throw new Error('modeConfig not provided');
throw new Error('modelConfig not provided');
}
const modelConfig = event.modelConfig;
process.env['LISA_MODEL_CONFIG'] = JSON.stringify(modelConfig);
Expand All @@ -79,65 +68,42 @@ export const handler = async (event: any) => {

const ret = spawnSync('./node_modules/aws-cdk/bin/cdk', ['synth', '-o', '/tmp/cdk.out']);

let stdout = String(ret.output[1]);
let stderr = String(ret.output[2]);
const stderr = String(ret.output[2]);
if ( ret.status !== 0 ) {
console.log(`cdk synth failed with stderr: ${stderr}`);
throw new Error('Stack failed to synthesize');
}


const stackName = `${config.deploymentName}-${modelConfig.modelId}`;
if ( event.action === ACTION_DEPLOY ) {
const deploy_promise: Promise<ChildProcess | undefined> = new Promise( (resolve) => {
const cp = spawn('./node_modules/aws-cdk/bin/cdk', ['deploy', stackName, '-o', '/tmp/cdk.out']);

cp.on('close', (code) => {
console.log(`cdk deploy exited early, code ${code}`);
resolve(cp);
});

cp.stdout.on('data', (data) => {
console.log(`Got data: ${data}`);
});

cp.stderr.on('data', (data) => {
console.log(`Got err data: ${data}`);
});

setTimeout(() => {
console.log('180 second timeout');
resolve(undefined);
}, 180 * 1000);
const deploy_promise: Promise<ChildProcess | undefined> = new Promise( (resolve) => {
const cp = spawn('./node_modules/aws-cdk/bin/cdk', ['deploy', stackName, '-o', '/tmp/cdk.out']);

cp.on('close', (code) => {
console.log(`cdk deploy exited early, code ${code}`);
resolve(cp);
});

const cp = await deploy_promise;
if ( cp ) {
if ( cp.exitCode !== 0 ) {
throw new Error('Stack failed to deploy');
}
}
} else if ( event.action === ACTION_DESTROY ) {
const deploy_promise: Promise<Number> = new Promise( (resolve) => {
const cp = spawn('./node_modules/aws-cdk/bin/cdk', ['destroy', '-f', stackName, '-o', '/tmp/cdk.out']);

cp.on('close', (code) => {
resolve(code ?? -1);
});

setTimeout(() => {
console.log('60 second timeout');
resolve(0);
}, 180 * 1000);
cp.stdout.on('data', (data) => {
console.log(`Got data: ${data}`);
});

cp.stderr.on('data', (data) => {
console.log(`Got err data: ${data}`);
});

const exitCode = await deploy_promise;
stdout = String(ret.output[1]);
stderr = String(ret.output[2]);
if ( exitCode !== 0 ) {
console.log(`cdk destroy failed with stdout: ${stdout}, stderr: ${stderr}`);
throw new Error('Stack failed to destroy');
setTimeout(() => {
console.log('180 second timeout');
resolve(undefined);
}, 180 * 1000);
});

const cp = await deploy_promise;
if ( cp ) {
if ( cp.exitCode !== 0 ) {
throw new Error('Stack failed to deploy');
}
}
return stackName;

return {stackName: stackName};
};
1 change: 0 additions & 1 deletion ecs_model_deployer/src/lib/ecs-model.ts
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,6 @@ export class EcsModel extends Construct {
environment: this.getEnvironmentVariables(config, modelConfig),
identifier: getModelIdentifier(modelConfig),
instanceType: modelConfig.instanceType,
internetFacing: false,
loadBalancerConfig: modelConfig.loadBalancerConfig,
},
securityGroup,
Expand Down
14 changes: 5 additions & 9 deletions ecs_model_deployer/src/lib/ecsCluster.ts
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ export class ECSCluster extends Construct {
// Create application load balancer
const loadBalancer = new ApplicationLoadBalancer(this, createCdkId([ecsConfig.identifier, 'ALB']), {
deletionProtection: config.removalPolicy !== RemovalPolicy.DESTROY,
internetFacing: ecsConfig.internetFacing,
internetFacing: false,
loadBalancerName: createCdkId([config.deploymentName, ecsConfig.identifier], 32, 2),
dropInvalidHeaderFields: true,
securityGroup,
Expand All @@ -280,18 +280,14 @@ export class ECSCluster extends Construct {

// Add listener
const listenerProps: BaseApplicationListenerProps = {
port: ecsConfig.loadBalancerConfig.sslCertIamArn ? 443 : 80,
open: ecsConfig.internetFacing,
certificates: ecsConfig.loadBalancerConfig.sslCertIamArn
? [{ certificateArn: ecsConfig.loadBalancerConfig.sslCertIamArn }]
: undefined,
port: 80,
open: false,
};

const listener = loadBalancer.addListener(
createCdkId([ecsConfig.identifier, 'ApplicationListener']),
listenerProps,
);
const protocol = listenerProps.port === 443 ? 'https' : 'http';

// Add targets
const loadBalancerHealthCheckConfig = ecsConfig.loadBalancerConfig.healthCheckConfig;
Expand All @@ -311,7 +307,7 @@ export class ECSCluster extends Construct {
// ALB metric for ASG to use for auto scaling EC2 instances
// TODO: Update this to step scaling for embedding models??
const requestCountPerTargetMetric = new Metric({
metricName: ecsConfig.autoScalingConfig.metricConfig.AlbMetricName,
metricName: ecsConfig.autoScalingConfig.metricConfig.albMetricName,
namespace: 'AWS/ApplicationELB',
dimensionsMap: {
TargetGroup: targetGroup.targetGroupFullName,
Expand All @@ -332,7 +328,7 @@ export class ECSCluster extends Construct {
ecsConfig.loadBalancerConfig.domainName !== null
? ecsConfig.loadBalancerConfig.domainName
: loadBalancer.loadBalancerDnsName;
const endpoint = `${protocol}://${domain}`;
const endpoint = `http://${domain}`;
this.endpointUrl = endpoint;

// Update
Expand Down
6 changes: 1 addition & 5 deletions ecs_model_deployer/src/lib/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -379,12 +379,10 @@ const HealthCheckConfigSchema = z.object({
/**
* Configuration schema for the load balancer.
*
* @property {string} [sslCertIamArn=null] - SSL certificate IAM ARN for load balancer.
* @property {HealthCheckConfig} healthCheckConfig - Health check configuration for the load balancer.
* @property {string} domainName - Domain name to use instead of the load balancer's default DNS name.
*/
const LoadBalancerConfigSchema = z.object({
sslCertIamArn: z.string().optional().nullable().default(null),
healthCheckConfig: HealthCheckConfigSchema,
domainName: z.string().optional().nullable().default(null),
});
Expand All @@ -400,7 +398,7 @@ const LoadBalancerConfigSchema = z.object({
*
*/
const MetricConfigSchema = z.object({
AlbMetricName: z.string(),
albMetricName: z.string(),
targetValue: z.number(),
duration: z.number().default(60),
estimatedInstanceWarmup: z.number().min(0).default(180),
Expand Down Expand Up @@ -439,7 +437,6 @@ const AutoScalingConfigSchema = z.object({
* @property {Record<string,string>} environment - Environment variables set on the task container
* @property {identifier} modelType - Unique identifier for the cluster which will be used when naming resources
* @property {string} instanceType - EC2 instance type for running the model.
* @property {boolean} [internetFacing=false] - Whether or not the cluster will be configured as internet facing
* @property {LoadBalancerConfig} loadBalancerConfig - Configuration for load balancer settings.
*/
const EcsBaseConfigSchema = z.object({
Expand All @@ -451,7 +448,6 @@ const EcsBaseConfigSchema = z.object({
environment: z.record(z.string()),
identifier: z.string(),
instanceType: z.enum(VALID_INSTANCE_KEYS),
internetFacing: z.boolean().default(false),
loadBalancerConfig: LoadBalancerConfigSchema,
});

Expand Down
2 changes: 0 additions & 2 deletions example_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,7 @@ dev:
targetValue: 1000
duration: 60
estimatedInstanceWarmup: 30
internetFacing: true
loadBalancerConfig:
sslCertIamArn: arn:aws:iam::012345678901:server-certificate/lisa-self-signed-dev
healthCheckConfig:
path: /health
interval: 60
Expand Down
19 changes: 15 additions & 4 deletions lambda/dockerimagebuilder/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,21 @@
mkdir /home/ec2-user/docker_resources
aws --region ${AWS_REGION} s3 sync s3://{{BUCKET_NAME}} /home/ec2-user/docker_resources
cd /home/ec2-user/docker_resources/{{LAYER_TO_ADD}}
docker build -t {{IMAGE_ID}} --build-arg BASE_IMAGE={{BASE_IMAGE}} --build-arg MOUNTS3_DEB_URL={{MOUNTS3_DEB_URL}} .
docker tag {{IMAGE_ID}} {{ECR_URI}}:{{IMAGE_ID}}
aws --region ${AWS_REGION} ecr get-login-password | docker login --username AWS --password-stdin {{ECR_URI}}
docker push {{ECR_URI}}:{{IMAGE_ID}}
while [ 1 ]; do
shutdown -c;
sleep 5;
done &
function buildTagPush() {
docker build -t {{IMAGE_ID}} --build-arg BASE_IMAGE={{BASE_IMAGE}} --build-arg MOUNTS3_DEB_URL={{MOUNTS3_DEB_URL}} . && \
docker tag {{IMAGE_ID}} {{ECR_URI}}:{{IMAGE_ID}} && \
aws --region ${AWS_REGION} ecr get-login-password | docker login --username AWS --password-stdin {{ECR_URI}} && \
docker push {{ECR_URI}}:{{IMAGE_ID}}
return $?
}
(r=3;while ! buildTagPush ; do ((--r))||exit;sleep 10; done)
"""


Expand Down
1 change: 0 additions & 1 deletion lambda/models/clients/litellm_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ def list_models(self) -> List[Dict[str, Any]]:
self._base_uri + "/model/info",
headers=self._headers,
timeout=self._timeout,
verify=self._verify,
)
all_models = resp.json()
models_list: List[Dict[str, Any]] = all_models["data"]
Expand Down
Loading

0 comments on commit 42666b5

Please sign in to comment.