Skip to content

Latest commit

 

History

History
284 lines (186 loc) · 16.3 KB

README.md

File metadata and controls

284 lines (186 loc) · 16.3 KB

Project for Machine Learning Operations (MLOps)

GitHub Actions Cloud Deployment

The challenge requested to build an fully automated end-to-end Machine Learning infrastructure, including reading data from a feature store, automated model creation with hyperoptimization tuning and automatically finding the most efficient model, storing models in a model registry, building a CI/CD pipeline for deploying the registered model into a production environment, serving the model using HTTP API or Streaming interfaces, implementing monitoring, writing tests and linters, creating regular automated reporting.

Note: this is a personal project to demonstrate an automated ML infrastructure. Security was not in the scope of this project. For deploying this infrastructure in a production environment please ensure proper credentials are set for each service and the infrastructure is not exposing unwanted endpoints to the public internet.

Final architecture:

image

There is a Prefect orchestrator to run 2 flows on a scheduler basis. The Hyperoptimization deployment flow is executed by a Prefect Agent by pulling training data from the AWS S3 feature store, runs hyperoptimization ML model builds and save each model (around 50 models per run) in the MLFlow model registry. On each run it also finds the most efficient model and it registers it in MlFlow to be ready for deployment.

An engineer must decide when and which models should be deployed. First copies the RUN_ID of the selected deployment model from ML Flow and update .env.local (or .env.cloud for cloud deployment) with the new RUN_ID field.

Once the RUN_ID is updated, Github Actions triggers a new pipeline-run which will run tests and restarts the 2 servers (http-api and kinesis-streams servers). The servers will start by automatically loading the new model from ML Flow.

The Business Simulation using AWS Kinesis Streams simulates business regularly (every 60sec) sending events to Kinesis stream for prediction. ML Model Serving Kinesis Stream service is a ML serving server using Kinesis stream as input and output for running predictions is realtime.

The Business Simulation using HTTP API simulates business regularly (every 60sec) sending http requests. ML Model Serving Flask HTTP API service is a ML serving server using http APIs for running predictions in realtime. On each prediction request, input and prediction is saved in MongoDB for later processing and also sent to Evidently for monitoring.

Evidently is calculating data drift to understand if the running predictions are degrading over time.

Prometheus is storing monitoring data and Grafana is providing a dashboard UI to monitor the prediction data drift in realtime.

The Batch reporting flow is running regularly (every 3 hours) on the MongoDB data to generate data drift reports. These reports are saved as html and the File server - Nginx gives access to the report files.

Project progress

Input data

Data: Credit Card Churn Prediction

Description: A business manager of a consumer credit card bank is facing the problem of customer attrition. They want to analyze the data to find out the reason behind this and leverage the same to predict customers who are likely to drop off.

Source: https://www.kaggle.com/datasets/anwarsan/credit-card-bank-churn

Implementation plan:

  • cleanup data
  • exploratory data analysis
  • train model
  • ml pipeline for hyperparameter tuning
  • model registry
  • ML-serve API server
  • ML-serve Stream server (optional)
  • tests (partial)
  • linters
  • Makefile and CI/CD
  • deploy to cloud
  • logging and monitoring
  • batch reporting
  • docker and docker-compose everything
  • reporting server

To do reminders:

  • deploy on filechange only. Maybe with version as tag.

Data cleanup

  • Removes rows with "Unknown" records, removes irellevant columns, lowercase column names, lowercase categoriacal values.

  • Categories that can be ordered hiarachically are converted into ints, like "income" or "education level".

prepareData.ipynb

Exploratory data analysis

Train Model

  • Prepare a model using XGBoost.

  • Input data is split into 66%, 33%, 33% for training, validation and test data.

  • Measures MAE, MSE, RMSE.

  • Measures % of deviated predictions based on month threshold.

  • model_train.ipynb

Model Registry with MLFlow

image

image

Automated hyperoptimization tuning

System is using Prefect to orchestrate DAGs. Every few hours, Prefect Agent will start and read the training data from S3, it will build models using XGBoost by running hyperparameterization on the configurations, generating 50 models and calculating accuracy (rmse) for each of them. All 50 models are registered in the MLFlow model registry experiments. At the end of each run, the best model will be registered in MLFlow as ready for deployment.

image

Model serving HTTP API and Stream

There are 2 ML service servers. One serving predictions using HTTP API build in Python with Flask. Second serving predictions using AWS Kinesis streams, both consuming and publishing results back.

Simulation_business: sending data for realtime prediction

There are 2 Python scripts to simulate business requesting predictions from ML servers. One request data from HTTP API server and another one sending events to predictions Kinesis stream and receiving results to results Kinesis stream.

Monitoring

There are 3 services for monitoring the model predictions is realtime:

  • Evidently AI for calculating data drift. Evidently UI:
  • Prometheus for collecting monitoring data. Prometheus UI:
  • Grafana for Dashboards UI. Grafana UI: http://localhost:3000 (default user/pass: admin, admin)

image

Reporting

There is a Prefect flow to generate reporting using Evidently: create_report.py. This will generate reports every few hours save them in MongoDB and also generate static html pages with all data charts.

Report file example: ...

There is also an Nginx server to expose these html reports.

  • Nginx server: nginx
  • Nginx address: http://localhost:8888/

Report example

Deployment

All containers are put together in docker compose files for easy deployment of the entire infrastructure. Docker-compose if perfect for this project, for a more advanced production environment where each service is deployed in different VM, I recommend using more advance tools.

All deployment commands are grouped using the Makefile for simplicity of use.

The environment variables should be in .env file. The Makefile will use one of these: .env.local or .env.cloud.

$> make help

Commands:

run: make run_tests   to run tests locally
run: make reset_all   to delete all containers and cleanup volumes
run: make setup-model-registry env=local (or env=cloud)   to start model registry and training containers
run: make init_aws  to setup and initialize AWS services (uses localstack container)
run: make apply-model-train-flow   to apply the automated model training DAG 
run: make setup-model-serve env=local (or env=cloud)   to start the model serving containers
run: make apply-prediction-reporting   to apply the automated prediction reporting DAG
run: make stop-serve   to stop the model servers (http api and Stream)
run: make start-serve env=local   to start the model servers (http api and Stream)

CI/CD in Cloud

The continuos deployment is done using Github actions. Once a change is made to the repo, the deployment pipeline is triggered. This will restart the model servers to load a new model from the MLFlow model registry. The deployed model is always specified in .env.cloud file under RUN_ID environment variable.

The pipeline will:

Start infrastructure locally or cloud

To deploy in the cloud, the steps are similar except: use you cloud VM domain instead of localhost to access the UIs and replace env=local with env=cloud

  • install docker, docker compose, make

  • run make reset_all to ensure any existing containers are removed

  • run make setup-model-registry env=local to start model training infrastructure

  • open http://localhost:5051 to see MLFlow UI.

  • run make init_aws to setup training data and streams in AWS

  • run make apply-model-train-flow to apply model training script to the orchestrator. This will run trainings regularly.

  • open http://localhost:4200/#deployments, it will show the model_tuning_and_uploading deployment scheduled. Start a Quick Run to not wait for the scheduler. This will run the model training pipeline and upload a bunch of models to MLFlow server and register the best model.

  • from the MLFlow UI decide which model you want to deploy. Get the Run Id of the model and update RUN_ID in .env.local file (or .env.cloud for cloud deployment)

  • run make setup-model-serve env=local to start prediction servers

  • request a prediction using http API:

$> curl -X POST -H 'Content-Type: application/json' http://127.0.0.1:9696/predict -d '{"customer_age":50,"gender":"M","dependent_count":2,"education_level":3,"marital_status":"married","income_category":2,"card_category":"blue","months_on_book":4,"total_relationship_count":3,"credit_limit":4000,"total_revolving_bal":2511}'

{"churn chance":0.5,"model_run_id":"70cc813fa2d64c598e3f5cd93ad674af"}
  • run make apply-prediction-reporting to apply reporting script to the orchestrator. This will generate reports regularly.
  • open , it will show evidently_data_reporting deployment. This runs every 3 hours. The system needs to collect 3+ hours of predictions data first before generating any report. Running the reporting manually at this time will not generate reports yet.
  • open http://localhost:8888/ to see generated reports after 3+ hours.
  • open http://localhost:8085/metrics to see prometheus data.
  • open http://localhost:3000/dashboards to see Grafana realtime monitoring dashboard of data drift. (default user/pass: admin, admin)

Optionally:

  • publish to Kinesis. data is the request json payload base64 encoded
aws kinesis put-record \
    --stream-name predictions --endpoint-url=http://localhost:4566 \
    --partition-key 1 \
    --data "ewogICAgICAiY3VzdG9tZXJfYWdlIjogMTAwLAogICAgICAiZ2VuZGVyIjogIkYiLAogICAgICAiZGVwZW5kZW50X2NvdW50IjogMiwKICAgICAgImVkdWNhdGlvbl9sZXZlbCI6IDIsCiAgICAgICJtYXJpdGFsX3N0YXR1cyI6ICJtYXJyaWVkIiwKICAgICAgImluY29tZV9jYXRlZ29yeSI6IDIsCiAgICAgICJjYXJkX2NhdGVnb3J5IjogImJsdWUiLAogICAgICAibW9udGhzX29uX2Jvb2siOiA2LAogICAgICAidG90YWxfcmVsYXRpb25zaGlwX2NvdW50IjogMywKICAgICAgImNyZWRpdF9saW1pdCI6IDQwMDAsCiAgICAgICJ0b3RhbF9yZXZvbHZpbmdfYmFsIjogMjUwMAogICAgfQ=="
  • consume from Kinesis
aws kinesis get-shard-iterator  --shard-id shardId-000000000000 --endpoint-url=http://localhost:4566 --shard-iterator-type TRIM_HORIZON --stream-name results --query 'ShardIterator'
# copy shard iterator without quotes

aws kinesis get-records --endpoint-url=http://localhost:4566 --shard-iterator {SHARD_ITERATOR_HERE}
# decode Data based64
  • run docker logs -t {container} to see logs for now

All running containers: image

Cloud Deployment

The entire infrastructure was deployed in the cloud using virtual machine provided by Digital Ocean.

Links:

curl -X POST -H 'Content-Type: application/json' http://188.166.115.79:9696/predict -d '{"customer_age":50,"gender":"M","dependent_count":2,"education_level":3,"marital_status":"married","income_category":2,"card_category":"blue","months_on_book":4,"total_relationship_count":3,"credit_limit":4000,"total_revolving_bal":2511}' 
{"churn chance":0.5,"model_run_id":"8a19dc8026dc4e4cb972ad84194940fd"}
  • publish to Kinesis. data is the request json payload base64 encoded
aws kinesis put-record \
    --stream-name predictions --endpoint-url=http://188.166.115.79:4566 \
    --partition-key 1 \
    --data "ewogICAgICAiY3VzdG9tZXJfYWdlIjogMTAwLAogICAgICAiZ2VuZGVyIjogIkYiLAogICAgICAiZGVwZW5kZW50X2NvdW50IjogMiwKICAgICAgImVkdWNhdGlvbl9sZXZlbCI6IDIsCiAgICAgICJtYXJpdGFsX3N0YXR1cyI6ICJtYXJyaWVkIiwKICAgICAgImluY29tZV9jYXRlZ29yeSI6IDIsCiAgICAgICJjYXJkX2NhdGVnb3J5IjogImJsdWUiLAogICAgICAibW9udGhzX29uX2Jvb2siOiA2LAogICAgICAidG90YWxfcmVsYXRpb25zaGlwX2NvdW50IjogMywKICAgICAgImNyZWRpdF9saW1pdCI6IDQwMDAsCiAgICAgICJ0b3RhbF9yZXZvbHZpbmdfYmFsIjogMjUwMAogICAgfQ=="
  • consume from Kinesis
aws kinesis get-shard-iterator  --shard-id shardId-000000000000 --endpoint-url=http://188.166.115.79:4566 --shard-iterator-type TRIM_HORIZON --stream-name results --query 'ShardIterator'
# copy shard iterator without quotes

aws kinesis get-records --endpoint-url=http://188.166.115.79:4566 --shard-iterator {SHARD_ITERATOR_HERE}
# decode Data based64

Other useful links: