Welcome to the MLOps Repository! This repository is dedicated to sharing reading contents, labs and exercises for the MLOps (Machine Learning Operations) course at Northeastern University. The primary goal of this repository is to provide a centralized platform for students, instructors, and anyone interested in MLOps to access and collaborate on course-related materials. You can learn more on Machine learning topics by watching my videos on Youtube or visit my Website.
MLOps is an emerging discipline that focuses on the collaboration and communication of both data scientists and IT professionals while automating and streamlining the machine learning lifecycle. It bridges the gap between machine learning development and production deployment, ensuring that machine learning models are scalable, reproducible, and maintainable. This repository serves as a resource hub for students and instructors of Northeastern University's MLOps course.
The MLOps course at Northeastern University is designed to provide students with a comprehensive understanding of the MLOps field. Throughout the course, students will learn how to:
- Build end-to-end machine learning pipelines
- Deploy machine learning models to production
- Monitor and maintain ML systems
- Implement CI/CD/CM/CT (Continuous Integration/Continuous Deployment/Continuous Monitoring/Continuous Training) for ML
- Containerize and orchestrate ML workloads
- Handle data drift and model retraining
This repository hosts the labs, code samples, and documentation related to these topics.
The labs in this repository are organized according to the topics covered in the MLOps course. Each lab may include code examples, Jupyter notebooks, configuration files, and relevant documentation. Some of the key topics covered in the labs include:
- Data preprocessing and feature engineering
- Model training and evaluation
- Model deployment using containerization (e.g., Docker) and orchestration (e.g., Kubernetes)
- Monitoring and logging of deployed models
- CI/CD for ML pipelines
- Data labeling with Snorkel
- Handling data drift and retraining models
And more...
To get started with the labs and exercises in this repository, please follow these steps:
- Clone this repository to your local machine.
- Navigate to the specific lab you are interested in.
- Read the lab instructions and review any accompanying documentation.
- Follow the provided code samples and examples to complete the lab exercises.
- Feel free to explore, modify, and experiment with the code to deepen your understanding.
For more detailed information on each lab and prerequisites, please refer to the lab's README or documentation.
Contributions to this repository are welcome! If you are a student or instructor and would like to contribute your own labs, improvements, or corrections, please follow these guidelines:
- Fork this repository.
- Create a branch for your changes.
- Make your changes and commit them with clear, concise messages.
- Test your changes to ensure they work as expected.
- Submit a pull request to the main repository.
Your contributions will help improve the overall quality of the labs and benefit the entire MLOps community.
The reading materials of this repo was collected from Coursera under the Creative Commons License.
This repository is open-source and is distributed under the Creative Commons License. Please review the license for more details on how you can use and share the content within this repository.