Skip to content

Commit

Permalink
Restructured repository
Browse files Browse the repository at this point in the history
  • Loading branch information
arueth committed Sep 17, 2024
1 parent 36b66dc commit 8dad0cb
Show file tree
Hide file tree
Showing 176 changed files with 126 additions and 128 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ __pycache__/
venv/

# Terraform
.terraform
*.terraform/
*.terraform-*/
*.terraform.lock.hcl

# Test
test/log/*.log
test/scripts/environment_files/*
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Google Cloud AI/ML Platform Reference Architectures
# Google Cloud Accelerated Platform Reference Architectures

This repository is collection of AI/ML platform reference architectures and use cases for Google Cloud.
This repository is collection of accelerated platform reference architectures and use cases for Google Cloud.

- [GKE ML Platform for enabling ML Ops](/docs/gke-ml-platform.md)
- [GKE AI/ML Platform for enabling AI/ML Ops](/docs/platforms/gke-aiml/README.md)
- [Model Fine Tuning Pipeline](/docs/use-cases/model-fine-tuning-pipeline/README.md)
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ This process can also be automated utilizing Continuous Integration (CI) tools s

To pair the notebook, simply use the pair function in the File menu:

![jupyter-pairing](../images/notebook/jupyter-pairing.png)
![jupyter-pairing](images/jupyter-pairing.png)

In this example we use the file [gpt-j-online.ipynb](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/ray-on-gke/examples/notebooks/gpt-j-online.ipynb):![jupyter-gpt-j-online-ipynb](/docs/images/notebook/jupyter-gpt-j-online-ipynb.png)
In this example we use the file [gpt-j-online.ipynb](https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/ray-on-gke/examples/notebooks/gpt-j-online.ipynb):![jupyter-gpt-j-online-ipynb](images/jupyter-gpt-j-online-ipynb.png)

1. After pairing, we get the generated raw python:

![jupyter-gpt-j-online-py](../images/notebook/jupyter-gpt-j-online-py.png)
![jupyter-gpt-j-online-py](images/jupyter-gpt-j-online-py.png)

**NOTE**: This conversion can also be performed via the `jupytext` cli with the following command:

Expand All @@ -45,7 +45,7 @@ This process can also be automated utilizing Continuous Integration (CI) tools s

The following is an example output:

![jupyter-generate-requirements](../images/notebook/jupyter-generate-requirements.png)
![jupyter-generate-requirements](images/jupyter-generate-requirements.png)
**NOTE**: (the `!cat requirements.txt` line is an example of the generated `requirements.txt`)

1. Create the Dockerfile
Expand Down Expand Up @@ -91,4 +91,5 @@ The nbconvert tool is available inside your Jupyter notebook environment in Goog
```

Below is an example of the commands
![jupyter-nbconvert](../images/notebook/jupyter-nbconvert.png)

![jupyter-nbconvert](images/jupyter-nbconvert.png)
File renamed without changes
File renamed without changes
File renamed without changes
Binary file removed docs/images/use-case/TensorBoard.png
Binary file not shown.
49 changes: 19 additions & 30 deletions docs/gke-ml-platform.md → docs/platforms/gke-aiml/README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# GKE Machine learning platform (MLP) reference architecture for enabling Machine Learning Operations (MLOps)
# GKE AI/ML Platform reference architecture for enabling Machine Learning Operations (MLOps)

## Platform Principles

This reference architecture demonstrates how to build a GKE platform that facilitates Machine Learning. The reference architecture is based on the following principles:

- The platform admin will create the GKE platform using IaC tool like [Terraform][terraform]. The IaC will come with re-usable modules that can be referred to create more resources as the demand grows.
- The platform will be based on [GitOps][gitops].
- After the GKE platform has been created, cluster scoped resources on it will be created through [Config Sync][config-sync] by the admins.
- The platform admin will create the GKE platform using IaC tool like [Terraform](https://www.terraform.io/). The IaC will come with re-usable modules that can be referred to create more resources as the demand grows.
- The platform will be based on [GitOps](https://about.gitlab.com/topics/gitops/).
- After the GKE platform has been created, cluster scoped resources on it will be created through [Config Sync](https://cloud.google.com/anthos-config-management/docs/config-sync-overview) by the admins.
- Platform admins will create a namespace per application and provide the application team member full access to it.
- The namespace scoped resources will be created by the Application/ML teams either via [Config Sync][config-sync] or through a deployment tool like [Cloud Deploy][cloud-deploy]
- The namespace scoped resources will be created by the Application/ML teams either via Config Sync or through a deployment tool like [Cloud Deploy](https://cloud.google.com/deploy)

For an outline of products and features used in the platform, see the [Platform Products and Features](/docs/gke-ml-platform/products-and-features.md) document.
For an outline of products and features used in the platform, see the [Platform Products and Features](products-and-features.md) document.

## Critical User Journeys (CUJs)

Expand Down Expand Up @@ -47,36 +47,25 @@ For an outline of products and features used in the platform, see the [Platform

- This guide is meant to be run on [Cloud Shell](https://shell.cloud.google.com) which comes preinstalled with the [Google Cloud SDK](https://cloud.google.com/sdk) and other tools that are required to complete this tutorial.
- Familiarity with following
- [Google Kubernetes Engine][gke]
- [Terraform][terraform]
- [git][git]
- [Google Configuration Management root-sync][root-sync]
- [Google Configuration Management repo-sync][repo-sync]
- [GitHub][github]
- [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine)
- [Terraform](https://www.terraform.io/)
- [git](https://git-scm.com/)
- [Google Configuration Management root-sync](https://cloud.google.com/anthos-config-management/docs/reference/rootsync-reposync-fields)
- [Google Configuration Management repo-sync](https://cloud.google.com/anthos-config-management/docs/reference/rootsync-reposync-fields)
- [GitHub](https://github.com/)

## Deploy the platform

[Playground Reference Architecture](/examples/platform/playground/README.md): Set up an environment to familiarize yourself with the architecture and get an understanding of the concepts.
[Playground Reference Architecture](/platforms/gke-aiml/playground/README.md): Set up an environment to familiarize yourself with the architecture and get an understanding of the concepts.

## Use cases

- [Distributed Data Processing with Ray](/examples/use-case/data-processing/ray/README.md): Run a distributed data processing job using Ray.
- [Dataset Preparation for Fine Tuning Gemma IT With Gemini Flash](/examples/use-case/data-preparation/gemma-it/README.md): Generate prompts for fine tuning Gemma Instruction Tuned model with Vertex AI Gemini Flash
- [Fine Tuning Gemma2 9B IT model With FSDP](/examples/use-case/fine-tuning/pytorch/README.md): Fine tune Gemma2 9B IT model with PyTorch FSDP
- [Model Fine Tuning Pipeline](/docs/use-cases/model-fine-tuning-pipeline/README.md)
- [Distributed Data Processing with Ray](/use-cases/model-fine-tuning-pipeline/data-processing/ray/README.md): Run a distributed data processing job using Ray.
- [Dataset Preparation for Fine Tuning Gemma IT With Gemini Flash](/use-cases/model-fine-tuning-pipeline/data-preparation/gemma-it/README.md): Generate prompts for fine tuning Gemma Instruction Tuned model with Vertex AI Gemini Flash
- [Fine Tuning Gemma2 9B IT model With FSDP](/use-cases/model-fine-tuning-pipeline/fine-tuning/pytorch/README.md): Fine tune Gemma2 9B IT model with PyTorch FSDP
- [Model evaluation and validation](/use-cases/model-fine-tuning-pipeline/model-eval/README.md): Evaluation and validation of the fine tuned Gemma2 9B IT model

## Resources

- [Packaging Jupyter notebooks](/docs/notebook/packaging.md): Patterns and tools to get your ipynb's ready for deployment in a container runtime.

[gitops]: https://about.gitlab.com/topics/gitops/
[repo-sync]: https://cloud.google.com/anthos-config-management/docs/reference/rootsync-reposync-fields
[root-sync]: https://cloud.google.com/anthos-config-management/docs/reference/rootsync-reposync-fields
[config-sync]: https://cloud.google.com/anthos-config-management/docs/config-sync-overview
[cloud-deploy]: https://cloud.google.com/deploy?hl=en
[terraform]: https://www.terraform.io/
[gke]: https://cloud.google.com/kubernetes-engine?hl=en
[git]: https://git-scm.com/
[github]: https://github.com/
[gcp-project]: https://cloud.google.com/resource-manager/docs/creating-managing-projects
[personal-access-token]: https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens
[machine-user-account]: https://docs.github.com/en/get-started/learning-about-github/types-of-github-accounts
- [Packaging Jupyter notebooks](/docs/guides/packaging-jupyter-notebooks/README.md): Patterns and tools to get your ipynb's ready for deployment in a container runtime.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Playground Machine learning platform (MLP) on GKE: Architecture
# Playground AI/ML Platform on GKE: Architecture

![Playground Architecture](/docs/images/platform/playground/mlp_playground_architecture.svg)
![Playground Architecture](/docs/platforms/gke-aiml/playground/images/architecture.svg)

## Platform

Expand All @@ -20,7 +20,7 @@
- CPU system node pool
- GPU on-demand node pool
- GPU spot node pool
- Google Kubernetes Engine (GKE) Enterprise ([docs])(https://cloud.google.com/kubernetes-engine/enterprise/docs)
- [Google Kubernetes Engine (GKE) Enterprise](https://cloud.google.com/kubernetes-engine/enterprise/docs)
- Configuration Management
- Config Sync
- Policy Controller
Expand All @@ -38,7 +38,7 @@
- [Classic SSL Certificate](https://console.cloud.google.com/security/ccm/list/lbCertificates)
- Gateway SSL Certificate
- Ray dashboard
- Identity-Aware Proxy (IAP) ([docs])(https://cloud.google.com/iap/docs/concepts-overview)
- [Identity-Aware Proxy (IAP)](https://cloud.google.com/iap/docs/concepts-overview)
- Ray head Backend Service
- [Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccount)
- Default
Expand Down
File renamed without changes
Loading

0 comments on commit 8dad0cb

Please sign in to comment.