Skip to content

yashjakhotiya/kubeflow-gsoc-2020

Repository files navigation

Samples for Notebook to Kubeflow Deployment using TensorFlow 2.0 Keras

This directory aims at building Kubeflow use case samples with Tensorflow 2.0 Keras training code demonstrating 'Customer User Journey' (CUJ) in the process. The samples which this directory hosts are given below. Both of these samples demonstrate Kubeflow functionalities over NLP tasks. These samples assume you have a Kubeflow instance running. For more information on how to set up Kubeflow, please follow this getting-started tutorial. These samples also assume you have Jupyter notebooks integrated with your Kubeflow instance. For an overview on Jupyter notebooks in Kubeflow and instructions on how to set up Jupyter in Kubeflow, please go through this documentation

Text Classification

This Tensorflow tutorial explains how to classify IMDB movie reviews with Tensorflow. We use code from this tutorial, modify it to suit Kubeflow needs and add Kubeflow code to demonstrate how Kubeflow can leverage containerization and cloud technologies to efficiently manage machine learning workflows that can take advantage of multiple compute nodes. This sample holds the following notebooks and details about each of these notebooks are given adjacent to their names.

  1. text_classification_with_rnn.py - This is the core training code upon which all subsequent examples showing Kubeflow functionalities are based. Please go through this first to know more about the machine learning task subsequent notebooks will manage.

  2. distributed_text_classification_with_rnn.py - To truly take advantage of multiple compute nodes, the training code has to be modified to support distributed training. The code in the above file is modified with Tensorflow's distributed training strategy and hosted here.

  3. Dockerfile - This is the dockerfile which is used to build Docker image of the training code. Some Kubeflow functionalities require that a docker image of the training code is built and hosted on a docker container registry. This Docker 101 tutorial is a good starting point to get hands-on training on Docker. For complete starters in the field of containerization, this introduction can serve as a good starting point.

  4. fairing-with-python-sdk.ipynb - Fairing is a Kubeflow functionality that lets you run model training tasks remotely. This is the Jupyter notebook which deploys a model training task on cloud using Kubeflow Fairing. Fairing does not require you to build a Docker image of the training code first. Hence, its training code resides in the same notebook. To know more about Kubeflow Fairing, please visit Fairing's official documentation

  5. katib-with-python-sdk.ipynb - Katib is a Kubeflow functionality that lets you perform hyperparameter tuning experiments and reports best set of hyperparameters based ona provided metric. This is the Jupyter notebook which launches Katib hyperparameter tuning experiments using its Python SDK. Katib requires you to build and host a Docker image of your training code in a container registry. For this sample, we have used gcloud builds to build the required Docker image of the training code along with the training data and hosts it on gcr.io.

  6. tfjob-with-python-sdk.ipynb - TFJobs are used to run distributed training jobs over Kubernetes. With multiple workers, TFJob truly leverage the ability of your code to support distributed training. This Jupyter notebook demonstrates how to use TFJob. The Docker image built from the distributed version of our core training code is used in this notebook.

  7. tekton-pipeline-with-python-sdk.ipynb - Kubeflow Pipeline is a platform that lets you build, manage and deploy end-to-end machine learning workflows. This is a Jupyter notebook which bundles Katib hyperparameter tuning and TFJob distributed training into one Kubeflow pipeline. The pipeline used here uses Tekton in its backend. Tekton is a Kubernetes resource to create efficient continuous integration and delivery (CI/CD) systems.

Neural Machine Translation

This other Tensorflow tutorial explains how to translate Spanish text to English using Tensorflow. As stated in the previous sample, we use code from this tutorial, modify it to suit Kubeflow needs and add Kubeflow code to demonstrate how Kubeflow can leverage containerization and cloud technologies to efficiently manage machine learning workflows that can take advantage of multiple compute nodes. This sample holds the following notebooks and details about each of these notebooks are given adjacent to their names.

  1. nmt_with_attention.py - This is the core training code upon which all subsequent examples showing Kubeflow functionalities are based. Please go through this first to know more about the machine learning task subsequent notebooks will manage.

  2. distributed_nmt_with_attention.py - To truly take advantage of multiple compute nodes, the training code has to be modified to support distributed training. The code in the above file is modified with Tensorflow's distributed training strategy and hosted here.

  3. Dockerfile - This is the dockerfile which is used to build Docker image of the training code. Some Kubeflow functionalities require that a docker image of the training code is built and hosted on a docker container registry. This Docker 101 tutorial is a good starting point to get hands-on training on Docker. For complete starters in the field of containerization, this introduction can serve as a good starting point.

  4. fairing-with-python-sdk.ipynb - JFairing is a Kubeflow functionality that lets you run model training tasks remotely. This is the Jupyter notebook which deploys a model training task on cloud using Kubeflow Fairing. As said above, Fairing does not require you to build an image by yourself. You have to expose a class for your ML model. In this notebook, we have imported the NeuralMachineTranslation class defined in nmt_with_attention.py and passed this to Fairing for it to build an image on its own. To know more about Kubeflow Fairing, please visit Fairing's official documentation.

  5. katib-with-python-sdk.ipynb - Katib is a Kubeflow functionality that lets you perform hyperparameter tuning experiments and reports best set of hyperparameters based ona provided metric. This is the Jupyter notebook which launches Katib hyperparameter tuning experiments using its Python SDK. Katib requires you to build and host a Docker image of your training code in a container registry. For this sample, we have used gcloud builds to build the required Docker image of the training code along with the training data and hosts it on gcr.io. We have used the Tree-structured Parzen Estimator(TPE) hyperparameter optimization algorithm in this example.

  6. tfjob-with-python-sdk.ipynb - TFJobs are used to run distributed training jobs over Kubernetes. With multiple workers, TFJob truly leverage the ability of your code to support distributed training. This Jupyter notebook demonstrates how to use TFJob. The Docker image built from the distributed version of our core training code is used in this notebook.

  7. tekton-pipeline-with-python-sdk.ipynb - Kubeflow Pipeline is a platform that lets you build, manage and deploy end-to-end machine learning workflows. This is a Jupyter notebook which bundles Katib hyperparameter tuning and TFJob distributed training into one Kubeflow pipeline. The pipeline used here uses Tekton in its backend. Tekton is a Kubernetes resource to create efficient continuous integration and delivery (CI/CD) systems.