diff --git a/README.md b/README.md index 0f1208f..bb3648f 100644 --- a/README.md +++ b/README.md @@ -5,20 +5,32 @@
-:orange_book: List of AI infrastructures (a.k.a., machine learning systems, pipelines, and platforms) for machine/deep learning training and/or inference in production :electric_plug:. Feel free to contribute / star / fork / pull request. Any recommendations and suggestions are welcome :tada:. +:orange_book: List of real-world AI infrastructures (a.k.a., **machine learning +systems, pipelines, workflows**, and **platforms**) for machine/deep learning training +and/or inference **in production** :electric_plug:. This usually includes technology +stack necessary to enable machine learning algorithms run in production +environments in a stable, scalable and reliable way. The list is for my own +learning purpose, but feel free to **contribute** / star / fork / pull request. Any +recommendations and suggestions are welcome :tada:.
*** # Introduction -This list contains some popular actively-maintained AI infrastructures that focus on one or more of the following topics: +This list contains some popular actively-maintained AI infrastructures that +focus on one or more of the following topics: -- Architecture of **end-to-end** machine learning **pipelines** -- **Deployment** at scale in production on Cloud :cloud: or on end devices :iphone: -- Novel ideas of efficient large-scale distributed **training** +- Architecture of **end-to-end** machine learning training **pipelines**. +- **Inference** at scale in production on Cloud :cloud: or on end devices :iphone:. +- **Compiler and optimization** stacks for deployments on variety of devices. +- Novel ideas of efficient large-scale **distributed training**. -in **no specific order**. This list cares more about overall architectures of AI solutions in production instead of individual machine/deep learning training or inference frameworks. +in **no specific order**. This list cares more about overall architectures of AI +solutions in production instead of individual machine/deep learning training or +inference frameworks. My learning goals are: understand the workflows and +principles of how to build (large-scale) systems that can enable machine +learning in production. # End-to-End Machine Learning Platforms @@ -30,7 +42,7 @@ in **no specific order**. This list cares more about overall architectures of AI #### Architecture: - + #### Components: @@ -62,6 +74,32 @@ in **no specific order**. This list cares more about overall architectures of AI - **Multi-Framework**: includes [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), [MXNet](https://mxnet.apache.org/), [Chainer](https://chainer.org/), and more. +### RAPIDS - Open GPU Data Science ([NVIDIA](https://www.nvidia.com/en-us/)) + +> The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. + +> RAPIDS is the result of contributions from the machine learning community and [GPU Open Analytics Initiative (GOAI)](http://gpuopenanalytics.com/) partners, such as [Anaconda](https://www.anaconda.com/), [BlazingDB](https://blazingdb.com/), [__Gunrock__](https://github.com/gunrock/gunrock), etc. + +| [__homepage__](https://rapids.ai/) | [__blog__](https://medium.com/rapids-ai) | [__github__](https://github.com/RAPIDSai) | + +#### Architecture: + + + +#### Components: + +- **[Apache Arrow](https://arrow.apache.org/)**: a columnar, in-memory data structure that delivers efficient and fast data interchange with flexibility to support complex data models. + +- **cuDF**: a DataFrame manipulation library based on [Apache Arrow](https://arrow.apache.org/) that accelerates loading, filtering, and manipulation of data for model training data preparation. The Python bindings of the core-accelerated CUDA DataFrame manipulation primitives mirror the pandas interface for seamless onboarding of pandas users. + +- **cuML**: a collection of GPU-accelerated machine learning libraries that will provide GPU versions of all machine learning algorithms available in [scikit-learn](https://scikit-learn.org/). + +- **cuGRAPH**: a framework and collection of graph analytics libraries that seamlessly integrate into the RAPIDS data science platform. + +- **Deep Learning Libraries**: data stored in [Apache Arrow](https://arrow.apache.org/) can be seamlessly pushed to deep learning frameworks that accept array_interface such as [PyTorch](https://pytorch.org/) and [Chainer](https://chainer.org/). + +- **Visualization Libraries**: RAPIDS will include tightly integrated data visualization libraries based on [Apache Arrow](https://arrow.apache.org/). Native GPU in-memory data format provides high-performance, high-FPS data visualization, even with very large datasets. + ### Michelangelo - Uber's Machine Learning Platform ([Uber](https://www.uber.com/)) > Michelangelo, an internal ML-as-a-service platform that democratizes machine learning and makes scaling AI to meet the needs of business as easy as requesting a ride. @@ -72,7 +110,7 @@ in **no specific order**. This list cares more about overall architectures of AI #### Architecture: - + #### Components: @@ -83,31 +121,23 @@ in **no specific order**. This list cares more about overall architectures of AI - Make predictions - Monitor predictions -### RAPIDS - Open GPU Data Science ([NVIDIA](https://www.nvidia.com/en-us/)) +### COTA: Improving Uber Customer Care with NLP & Machine Learning ([Uber](https://www.uber.com/)) -> The RAPIDS suite of open source software libraries gives you the freedom to execute end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. +> COTA, our Customer Obsession Ticket Assistant, a tool that uses machine learning and natural language processing (NLP) techniques to help agents deliver better customer support. Leveraging our Michelangelo machine learning-as-a-service platform on top of our customer support platform -> RAPIDS is the result of contributions from the machine learning community and [GPU Open Analytics Initiative (GOAI)](http://gpuopenanalytics.com/) partners, such as [Anaconda](https://www.anaconda.com/), [BlazingDB](https://blazingdb.com/), [__Gunrock__](https://github.com/gunrock/gunrock), etc. - -| [__homepage__](https://rapids.ai/) | [__blog__](https://medium.com/rapids-ai) | [__github__](https://github.com/RAPIDSai) | +| [__blog__](https://eng.uber.com/cota/) | #### Architecture: - + #### Components: -- **[Apache Arrow](https://arrow.apache.org/)**: a columnar, in-memory data structure that delivers efficient and fast data interchange with flexibility to support complex data models. +- Preprocessing +- Topic modeling +- Feature engineering +- Pointwise ranking algorithm -- **cuDF**: a DataFrame manipulation library based on [Apache Arrow](https://arrow.apache.org/) that accelerates loading, filtering, and manipulation of data for model training data preparation. The Python bindings of the core-accelerated CUDA DataFrame manipulation primitives mirror the pandas interface for seamless onboarding of pandas users. - -- **cuML**: a collection of GPU-accelerated machine learning libraries that will provide GPU versions of all machine learning algorithms available in [scikit-learn](https://scikit-learn.org/). - -- **cuGRAPH**: a framework and collection of graph analytics libraries that seamlessly integrate into the RAPIDS data science platform. - -- **Deep Learning Libraries**: data stored in [Apache Arrow](https://arrow.apache.org/) can be seamlessly pushed to deep learning frameworks that accept array_interface such as [PyTorch](https://pytorch.org/) and [Chainer](https://chainer.org/). - -- **Visualization Libraries**: RAPIDS will include tightly integrated data visualization libraries based on [Apache Arrow](https://arrow.apache.org/). Native GPU in-memory data format provides high-performance, high-FPS data visualization, even with very large datasets. ### FBLearner ([Facebook](https://www.facebook.com/)) @@ -117,7 +147,7 @@ in **no specific order**. This list cares more about overall architectures of AI #### Architecture: - + #### Components: @@ -136,7 +166,7 @@ up for easy, fast, and scalable distributed training. #### Architecture: - + #### Components: @@ -157,7 +187,7 @@ allows them to upload and browse the code assets, submit distributed jobs, and q #### Architecture: - + #### Components: @@ -177,7 +207,7 @@ allows them to upload and browse the code assets, submit distributed jobs, and q #### Architecture: - + #### Components: @@ -195,7 +225,7 @@ allows them to upload and browse the code assets, submit distributed jobs, and q #### Architecture: - + ### TransmogrifAI ([Salesforce](https://www.salesforce.com/)) @@ -205,7 +235,7 @@ allows them to upload and browse the code assets, submit distributed jobs, and q #### Architecture: - + #### Components: @@ -223,7 +253,7 @@ allows them to upload and browse the code assets, submit distributed jobs, and q #### Architecture: - + #### Components: @@ -241,9 +271,9 @@ allows them to upload and browse the code assets, submit distributed jobs, and q | [__h2o__](https://www.h2o.ai/products/h2o/) | [__h2o4gpu__](https://www.h2o.ai/products/h2o4gpu/) | - + -# Machine Learning Model Deployment +# Machine Learning Model Inference/Deployment ### Apple's CoreML @@ -253,7 +283,7 @@ allows them to upload and browse the code assets, submit distributed jobs, and q | [__documentation__](https://developer.apple.com/documentation/coreml) | - + ### Greengrass ([Amazon Web Service](https://aws.amazon.com/?nc2=h_lg)) @@ -261,7 +291,7 @@ allows them to upload and browse the code assets, submit distributed jobs, and q | [__blog__](https://aws.amazon.com/greengrass/) | - + ### GraphPipe ([Oracle](https://www.oracle.com/index.html)) @@ -269,7 +299,7 @@ allows them to upload and browse the code assets, submit distributed jobs, and q | [__homepage__](https://oracle.github.io/graphpipe/#/) | [__github__](https://github.com/oracle/graphpipe) | [__documentation__](https://oracle.github.io/graphpipe/#/guide/user-guide/overview) | - + ### PocketFlow ([Tencent](https://www.tencent.com/en-us/)) @@ -277,7 +307,52 @@ allows them to upload and browse the code assets, submit distributed jobs, and q | [__homepage__](https://pocketflow.github.io/) | [__github__](https://github.com/Tencent/PocketFlow) | - + + +### TVM - End to End Deep Learning Compiler Stack ([Amazon](https://aws.amazon.com/?nc2=h_lg)) + +> TVM is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks, and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end to end compilation to different backends. Checkout the tvm stack homepage for more information. + +| [__homepage__](https://tvm.ai/) | [__github__](https://github.com/dmlc/tvm) | [__documentation__](https://docs.tvm.ai/) | [__paper__](https://arxiv.org/abs/1802.04799) | + +#### Architecture: + + + +#### Components: + +- **Compilation of deep learning models** in Keras, MXNet, PyTorch, Tensorflow, CoreML, DarkNet into minimum deploy-able modules on diverse hardware backends. + +- **Infrastructure to automatic generate and optimize tensor operators** on more backend with better performance. + +### Glow - A community-driven approach to AI infrastructure ([Facebook](https://www.facebook.com/)) + +> Glow is a machine learning compiler that accelerates the performance of deep learning frameworks on different hardware platforms. It enables the ecosystem of hardware developers and researchers to focus on building next gen hardware accelerators that can be supported by deep learning frameworks like PyTorch. + +| [ __homepage__](https://facebook.ai/developers/tools/glow) | [__github__](https://github.com/pytorch/glow) | [__blog__](https://code.fb.com/ml-applications/glow-a-community-driven-approach-to-ai-infrastructure/) | [__paper__](https://arxiv.org/abs/1805.00907) | + +#### Architecture: + + + +#### Components: + +- **High-level intermediate representation** allows the optimizer to perform domain-specific optimizations. + +- **Lower-level intermediate representation**, an instruction-based address-only representation allows the compiler to perform memory-related optimizations, such as instruction scheduling, static memory allocation, and copy elimination. + +- The **optimizer** then performs machine-specific code generation to take advantage of specialized hardware features. + + +### ONNX - Open Neural Network Exchange + +> ONNX is a open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners. + +| [__homepage__](https://onnx.ai/) | [__documentation__](https://onnx.ai/getting-started) | [__github__](https://github.com/onnx) | + +#### Architecture + + # Large-Scale Distributed AI Training Efforts @@ -339,3 +414,19 @@ batch size during training - Distributed batch normalization - Input pipeline optimization: dataset sharding and caching, prefetch, fused JPEG decoding and cropping, parallel data parsing - Communication: 2D gradient summation + +# Machine Learning System Lectures + +#### CSE 599W Systems for ML (University of Washington) + +> Over the past few years, deep learning has become an important technique to successfully solve problems in many different fields, such as vision, NLP, robotics. An important ingredient that is driving this success is the development of deep learning systems that efficiently support the task of learning and inference of complicated models using many devices and possibly using distributed resources. The study of how to build and optimize these deep learning systems is now an active area of research and commercialization, and yet there isn’t a course that covers this topic. + +> This course is designed to fill this gap. We will be covering various aspects of deep learning systems, including: basics of deep learning, programming models for expressing machine learning models, automatic differentiation, memory optimization, scheduling, distributed learning, hardware acceleration, domain specific languages, and model serving. Many of these topics intersect with existing research directions in databases, systems and networking, architecture and programming languages. The goal is to offer a comprehensive picture on how deep learning systems works, discuss and execute on possible research opportunities, and build open-source software that will have broad appeal. + +| [__link__](http://dlsys.cs.washington.edu/) | [__github__](https://github.com/dlsys-course/dlsys-course.github.io) | [__materials__](http://dlsys.cs.washington.edu/schedule) | + +#### CSCE 790: Machine Learning Systems + +> When we talk about Machine Learning (ML), we typically refer to a technique or an algorithm that gives the computer systems the ability to learn and to reason with data. However, there is a lot more to ML than just implementing an algorithm or a technique. In this course, we will learn the fundamental differences between ML as a technique versus ML as a system in production. A machine learning system involves a significant number of components and it is important that they remain responsive in the face of failure and changes in load. This course covers several strategies to keep ML systems responsive, resilient, and elastic. Machine learning systems are different than other computer systems when it comes to building, testing, deploying, delivering, and evolving. ML systems also have unique challenges when we need to change the architecture or behavior of the system. Therefore, it is essential to learn how to deal with such unique challenges that only may happen when building real-world production-ready ML systems (e.g., performance issues, memory leaking, communication issues, multi-GPU issues, etc). The focus of this course will be primarily on deep learning systems, but the principles will remain similar across all ML systems. + +| [__link__](https://pooyanjamshidi.github.io/mls/) | [__github__](https://github.com/pooyanjamshidi/mls) | [__materials__](https://pooyanjamshidi.github.io/mls/lectures/) | diff --git a/images/amazon-tvm-arch.png b/images/amazon-tvm-arch.png new file mode 100644 index 0000000..b90ceab Binary files /dev/null and b/images/amazon-tvm-arch.png differ diff --git a/images/facebook-glow-arch.gif b/images/facebook-glow-arch.gif new file mode 100644 index 0000000..269db2d Binary files /dev/null and b/images/facebook-glow-arch.gif differ diff --git a/images/onnx-arch.png b/images/onnx-arch.png new file mode 100644 index 0000000..9a25a6a Binary files /dev/null and b/images/onnx-arch.png differ diff --git a/images/uber-cota-arch.png b/images/uber-cota-arch.png new file mode 100644 index 0000000..dc351db Binary files /dev/null and b/images/uber-cota-arch.png differ