Azure · screamyao · Apr 5, 2019 · Apr 5, 2019 · Apr 5, 2019 · Apr 5, 2019
diff --git a/2-kubernetes/README.md b/2-kubernetes/README.md
@@ -233,6 +233,31 @@ spec:
       restartPolicy: OnFailure # restart the pod if it fails
 ```
 
+For non-GPU clusters:
+```yaml
+apiVersion: batch/v1
+kind: Job # Our training should be a Job since it is supposed to terminate at some point
+metadata:
+  name: module2-ex1 # Name of our job
+spec:
+  template: # Template of the Pod that is going to be run by the Job
+    metadata:
+      name: module2-ex1 # Name of the pod
+    spec:
+      containers: # List of containers that should run inside the pod, in our case there is only one.
+        - name: tensorflow
+          image: ${DOCKER_USERNAME}/tf-mnist:cpu # The image to run, you can replace by your own.
+          args: ["--max_steps", "500"] # Optional arguments to pass to our command. By default the command is defined by ENTRYPOINT in the Dockerfile
+          volumeMounts:
+            - name: nvidia
+              mountPath: /usr/local/nvidia
+      volumes:
+        - name: nvidia
+          hostPath:
+            path: /usr/local/nvidia
+      restartPolicy: OnFailure # restart the pod if it fails
+```
+
 Save this template somewhere and deploy it with:
 
 ```console

diff --git a/4-kubeflow/README.md b/4-kubeflow/README.md
@@ -31,6 +31,25 @@ Kubeflow uses [`ksonnet`](https://github.com/ksonnet/ksonnet) templates as a way
 
 First, install ksonnet version [0.13.1](https://ksonnet.io/#get-started), or you can [download a prebuilt binary](https://github.com/ksonnet/ksonnet/releases/tag/v0.13.1) for your OS.
 
+Pull down ksonnet in Cloud Shell
+
+```
+wget https://github.com/ksonnet/ksonnet/releases/download/v0.13.1/ks_0.13.1_linux_amd64.tar.gz
+```
+untar the zip
+
+```
+tar -zxvf ks_0.13.1_linux_amd64.tar.gz
+```
+
+Add the ksonnet cli to the CLoud Shell path
+
+```
+PATH=$PATH:~/ks_0.13.1_linux_amd64/
+```
+>NOTE: You may have to run this again if your cloud shell times out as it will not persist across sessions.
+
+
 Then run the following commands to download Kubeflow:
 
 ```bash
@@ -65,6 +84,16 @@ ${KUBEFLOW_SRC}/scripts/kfctl.sh apply k8s
 
 `kubectl get pods -n kubeflow`
 
+To make the kubeflow namespace the default for your context enter:
+```cli
+kubectl config get-contexts
+```
+Fine the name of you existing context which is the name of the cluster
+```cli
+kubectl config set-context aks-ejv --namespace kubeflow
+```
+Please use your own cluster name instread of aks-ejv
+
 should return something like this:
 
 ```
@@ -100,7 +129,7 @@ kubeflow      workflow-controller-cf79dfbff-lv7jk                       1/1
 
 The most important components for the purpose of this lab are `jupyter-0` which is the JupyterHub spawner running on your cluster, and `tf-job-operator-v1beta1-5949f668f7-j5zrn` which is a controller that will monitor your cluster for new TensorFlow training jobs (called `TfJobs`) specifications and manages the training, we will look at this two components later.
 
-### Remove Kubeflow
+### Remove Kubeflow _ONLY IF YOU ARE DONE WITH LABS!!!!
 
 If you want to remove the Kubeflow deployment, you can run the following to remove the namespace and installed components:
 

diff --git a/5-jupyterhub/README.md b/5-jupyterhub/README.md
@@ -68,15 +68,19 @@ Then navigate to JupyterHub: http://localhost:8080/hub
 To update the default service created for JupyterHub, run the following command to change the service to type LoadBalancer:
 
 ```bash
-cd ks_app
+cd kubeflow/mykubeflowapp/ks_app
 ks param set jupyter serviceType LoadBalancer
 cd ..
-${KUBEFLOW_SOURCE}/scripts/kfctl.sh apply k8s
+~/kubeflow/scripts/kfctl.sh apply k8s
+```
+wait for the public IP of the jupyter service
+```
+kubectl get svc -w
 ```
 
 Create a new Jupyter Notebook instance:
 
-- open http://localhost:8080/hub/ in your browser (or use the public IP for the service `tf-hub-lb`)
+- open http://<PublicIP_OF_JUPYTER_SVC>/hub/ in your browser (or use the public IP for the service `tf-hub-lb`)
 - log in using any username and password
 - click the "Start My Server" button to sprawn a new Jupyter notebook
 - from the image dropdown, select a tensorflow image for your notebook

diff --git a/6-tfjob/README.md b/6-tfjob/README.md
@@ -102,6 +102,24 @@ spec:
           restartPolicy: OnFailure
 ```
 
+For Non-GPU clusters:
+```yaml
+apiVersion: kubeflow.org/v1beta1
+kind: TFJob
+metadata:
+  name: module6-ex1
+spec:
+  tfReplicaSpecs:
+    MASTER:
+      replicas: 1
+      template:
+        spec:
+          containers:
+            - image: <DOCKER_USERNAME>/tf-mnist:cpu # From module 1
+              name: tensorflow
+          restartPolicy: OnFailure
+```
+
 Save the template that applies to you in a file, and create the `TFJob`:
 
 ```console
@@ -183,6 +201,78 @@ Be aware of a few details first:
 - PVC are namespaced so be sure to create it on the same namespace that is launching the TFJob objects
 - If you are using RBAC might need to run the cluster role and binding: [see docs here](https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv#create-a-cluster-role-and-binding)
 
+Create an `azurefiles-rbac.yaml` file
+```yaml
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: system:azure-cloud-provider
+rules:
+- apiGroups: ['']
+  resources: ['secrets']
+  verbs:     ['get','create']
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: system:azure-cloud-provider
+roleRef:
+  kind: ClusterRole
+  apiGroup: rbac.authorization.k8s.io
+  name: system:azure-cloud-provider
+subjects:
+- kind: ServiceAccount
+  name: persistent-volume-binder
+  namespace: kube-system
+```
+
+apply the rbac files
+```cli
+kubectl apply -f azurefiles-rbac.yaml
+```
+
+Create an `azurefiles-class.yaml` files
+```yaml
+kind: StorageClass
+apiVersion: storage.k8s.io/v1
+metadata:
+  name: azurefile
+provisioner: kubernetes.io/azure-file
+mountOptions:
+  - dir_mode=0777
+  - file_mode=0777
+  - uid=1000
+  - gid=1000
+parameters:
+  skuName: Standard_LRS
+```
+
+Apply the storage class to the cluster
+```cli
+kubectl apply -f azurefiles-class.yaml
+```
+
+Create an `azurefiles-pvc.yaml` file
+```yaml
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: azurefile
+spec:
+  accessModes:
+    - ReadWriteMany
+  storageClassName: azurefile
+  resources:
+    requests:
+      storage: 5Gi
+```
+
+Apply the pvc file to the cluster
+```cli
+kubectl apply -f azurefiles-pvc.yaml
+```
+
 Once you completed all the steps, run:
 
 ```console
@@ -219,6 +309,22 @@ Turns out mounting an Azure File share into a container is really easy, we simpl
       claimName: azurefile
 ```
 
+For non-GPU Clusters:
+```yaml
+[...]
+ containers:
+  - image: <IMAGE>
+    name: tensorflow
+    volumeMounts:
+      - name: azurefile
+        subPath: module6-ex2
+        mountPath: /tmp/tensorflow
+ volumes:
+  - name: azurefile
+    persistentVolumeClaim:
+      claimName: azurefile
+```
+
 Update your template from exercise 1 to mount the Azure File share into your container, and create your new job.
 
 Once the container starts running, if you go to the Azure Portal, into your storage account, and browse your `tensorflow` file share, you should see something like that:
@@ -265,6 +371,38 @@ spec:
                 claimName: azurefile
 ```
 
+For non-GPU cluster:
+```yaml
+apiVersion: kubeflow.org/v1beta1
+kind: TFJob
+metadata:
+  name: module6-ex2
+spec:
+  tfReplicaSpecs:
+    MASTER:
+      replicas: 1
+      template:
+        spec:
+          containers:
+            - image: <DOCKER_USERNAME>/tf-mnist:cpu
+              name: tensorflow
+              volumeMounts:
+                # By default our classifier saves the summaries in /tmp/tensorflow,
+                # so that's where we want to mount our Azure File Share.
+                - name: azurefile
+                  # The subPath allows us to mount a subdirectory within the azure file share instead of root
+                  # this is useful so that we can save the logs for each run in a different subdirectory
+                  # instead of overwriting what was done before.
+                  subPath: module6-ex2
+                  mountPath: /tmp/tensorflow
+          restartPolicy: OnFailure
+          volumes:
+            - name: azurefile
+              persistentVolumeClaim:
+                claimName: azurefile
+```
+
+
 </details>
 
 ## Next Step