The Hitchhiker's Guide to the Machine Learning Engineering Galaxy

A presentation on how to start with MLOps for ML Conf EU 2020.

Updated for LINKIT Virtual Roadtrip "Defrost Your Data".

Intro

Are you a Software Engineer who got tasked to deploy a machine learning model for the first time in your life? Are you wondering what steps to take and how AI-powered software is different from traditional software? Then it is the right presentation to attend.

The internet offers thousands of articles and free of charge courses, showing how it is easy to train and deploy a simple ML model. At the same time in reality it is difficult to integrate a real model into the current infrastructure, debug, test, deploy, and monitor it properly. In this presentation, I will guide you through this process by sharing tips, tricks, and favorite open source tools that will make your life much easier. So, at the end of the presentation, you will know where to start your deployment journey, what tools to use, and what questions to ask.

Presentation level

Great for software engineers who got tasked with ML model deployment for the first time. No ML knowledge is assumed.

Prerequisites

No Machine Learning background is assumed.

How to use

Start with this presentation.
Continue with ML serving pipelines part

ML serving pipelines

In order to answer the question "How to deploy a model?" we need to understand how end users are going to interact with our model:

interactive or non-interactive
single record or batch
synchronous or asynchronous
real-time or non-real-time

Today we will explore 3 flavours of model deployments:

Batch serving
Online serving (near real-time)
Real-time serving with embedded model

Each flavour has its own dedicated repository.

Batch serving

Batch inference is about using data distributed processing infrastructure to carry out inference asynchronously on a large number of instances at once.

What to optimize: throughput, not latency-sensitive

End user: usually no direct interactions with a model. User interacts with the predictions stored in a data storage as a result of the batch jobs.

Validation: offline

To GitHub repo

Online serving (near real-time)

Online inference is definitely more challenging than batch inference. Why? Due to the latency restrictions on our systems.

Online inference is about responding with a prediction to the request of the end user with a low latency.

What to optimize: latency

End user: usually interacts with a model directly available through an API

Validation: offline and online via A/B testing

To GitHub repo

Real-time serving with embedded model

Real-time serving with embedded model is about distributed event-at-a-time processing with millisecond latency and high throughput.

What to optimize: latency and throughput

End user: usually no direct interactions with a model

Validation: offline and online via A/B testing

To GitHub repo

Favourite open-source tools

Miniconda - Package, dependency and environment management for any language: Python, R, Ruby, Scala, Java, JavaScript, C/ C++
MLflow - Experiment tracking, Model registry, Conda based Projects
DVC - Data versioning, Version control system for ML Projects
Pachyderm - Data lineage, e2e pipelines on k8s
Seldon core - ML models serving as REST/GRPS microservices running on k8s
TensorFlow Extended - e2e production ML pipelines

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
presentation		presentation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Hitchhiker's Guide to the Machine Learning Engineering Galaxy

Intro

Table of contents

Presentation level

Prerequisites

How to use

ML serving pipelines

Batch serving

Online serving (near real-time)

Real-time serving with embedded model

Favourite open-source tools

Where to go next?

About

License

una-honest-data-and-ai-consulting/The-Hitchhiker-Guide-to-the-Machine-Learning-Engineering-Galaxy

Folders and files

Latest commit

History

Repository files navigation

The Hitchhiker's Guide to the Machine Learning Engineering Galaxy

Intro

Table of contents

Presentation level

Prerequisites

How to use

ML serving pipelines

Batch serving

Online serving (near real-time)

Real-time serving with embedded model

Favourite open-source tools

Where to go next?

About

Topics

Resources

License

Stars

Watchers

Forks