Real-time serving with embedded model is about distributed event-at-a-time processing with millisecond latency and high throughput.
What to optimize: latency and throughput
End user: usually no direct interactions with a model
Validation: offline and online via A/B testing
Learn MLOps general concepts:
Next learn more about real-time serving with embedded ML models:
- Machine Learning and Real-Time Analytics in Apache Kafka Applications
- Kafka Streams machine learning examples
- Streaming Machine Learning at Scale from 100000 IoT Devices with HiveMQ, Apache Kafka and TensorFLow
- Streaming ML Model Deployment
This workshop is WIP
It will cover a real-life use case of embedding a machine learning model into streaming app and its troubleshooting