-
Notifications
You must be signed in to change notification settings - Fork 639
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
34ea7c6
commit d2b73c6
Showing
19 changed files
with
344 additions
and
172 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9 changes: 4 additions & 5 deletions
9
latest/ug/workloads/inferentia-support.adoc → latest/ug/ml/inferentia-support.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
//!!NODE_ROOT <chapter> | ||
include::../attributes.txt[] | ||
[.topic] | ||
[[machine-learning-on-eks,machine-learning-on-eks.title]] | ||
= Overview of Machine Learning on Amazon EKS | ||
:doctype: book | ||
:sectnums: | ||
:toc: left | ||
:icons: font | ||
:experimental: | ||
:idprefix: | ||
:idseparator: - | ||
:sourcedir: . | ||
:info_doctype: chapter | ||
:info_title: Machine Learning on Amazon EKS Overview | ||
:info_titleabbrev: Machine Learning on EKS | ||
:keywords: Machine Learning, Amazon EKS, Artificial Intelligence | ||
:info_abstract: Learn to manage containerized applications with Amazon EKS | ||
|
||
[abstract] | ||
-- | ||
Complete guide for running Machine Learning applications on Amazon EKS. This includes everything from provisioning infrastructure to choosing and deploying Machine Learning workloads on Amazon EKS. | ||
-- | ||
|
||
[[ml-features,ml-features.title]] | ||
|
||
Machine Learning (ML) is an area of Artificial Intelligence (AI) where machines process large amounts of data to look for patterns and make connections between the data. This can expose new relationships and help predict outcomes that might not have been apparent otherwise. | ||
|
||
For large-scale ML projects, data centers must be able to store large amounts of data, process data quickly, and integrate data from many sources. The platforms running ML applications must be reliable and secure, but also offer resiliency to recover from data center outages and application failures. {aws} Elastic Kubernetes Service (EKS), running in the {aws} cloud, is particularly suited for ML workloads. | ||
|
||
The primary goal of this section of the EKS User Guide is to help you put together the hardware and software component to build platforms to run Machine Learning workloads in an EKS cluster. | ||
We start by explaining the features and services available to you in EKS and the {aws} cloud, then provide you with tutorials to help you work with ML platforms, frameworks, and models. | ||
|
||
=== Advantages of Machine Learning on EKS and the {aws} cloud | ||
|
||
Amazon Elastic Kubernetes Service (EKS) is a powerful, managed Kubernetes platform that has become a cornerstone for deploying and managing AI/ML workloads in the cloud. With its ability to handle complex, resource-intensive tasks, Amazon EKS provides a scalable and flexible foundation for running AI/ML models, making it an ideal choice for organizations aiming to harness the full potential of machine learning. | ||
|
||
Key Advantages of AI/ML Platforms on Amazon EKS include: | ||
|
||
* *Scalability and Flexibility* | ||
Amazon EKS enables organizations to scale AI/ML workloads seamlessly. Whether you're training large language models that require vast amounts of compute power or deploying inference pipelines that need to handle unpredictable traffic patterns, EKS scales up and down efficiently, optimizing resource use and cost. | ||
|
||
* *High Performance with GPUs and Neuron Instances* | ||
Amazon EKS supports a wide range of compute options, including GPUs and {aws}} Neuron instances, which are essential for accelerating AI/ML workloads. This support allows for high-performance training and low-latency inference, ensuring that models run efficiently in production environments. | ||
|
||
* *Integration with AI/ML Tools* | ||
Amazon EKS integrates seamlessly with popular AI/ML tools and frameworks like TensorFlow, PyTorch, and Ray, providing a familiar and robust ecosystem for data scientists and engineers. These integrations enable users to leverage existing tools while benefiting from the scalability and management capabilities of Kubernetes. | ||
|
||
* *Automation and Management* | ||
Kubernetes on Amazon EKS automates many of the operational tasks associated with managing AI/ML workloads. Features like automatic scaling, rolling updates, and self-healing ensure that your applications remain highly available and resilient, reducing the overhead of manual intervention. | ||
|
||
* *Security and Compliance* | ||
Running AI/ML workloads on Amazon EKS provides robust security features, including fine-grained IAM roles, encryption, and network policies, ensuring that sensitive data and models are protected. EKS also adheres to various compliance standards, making it suitable for enterprises with strict regulatory requirements. | ||
|
||
=== Why Choose Amazon EKS for AI/ML? | ||
Amazon EKS offers a comprehensive, managed environment that simplifies the deployment of AI/ML models while providing the performance, scalability, and security needed for production workloads. With its ability to integrate with a variety of AI/ML tools and its support for advanced compute resources, EKS empowers organizations to accelerate their AI/ML initiatives and deliver innovative solutions at scale. | ||
|
||
By choosing Amazon EKS, you gain access to a robust infrastructure that can handle the complexities of modern AI/ML workloads, allowing you to focus on innovation and value creation rather than managing underlying systems. Whether you are deploying simple models or complex AI systems, Amazon EKS provides the tools and capabilities needed to succeed in a competitive and rapidly evolving field. | ||
|
||
=== Start using Machine Learning on EKS | ||
|
||
To begin planning for and using Machine Learning platforms and workloads on EKS on the {aws} cloud, proceed to the <<ml-get-started>> section. | ||
|
||
include::ml-get-started.adoc[leveloffset=+1] | ||
|
||
include::ml-prepare-for-cluster.adoc[leveloffset=+1] | ||
|
||
include::ml-tutorials.adoc[leveloffset=+1] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
//!!NODE_ROOT <section> | ||
[.topic] | ||
[[ml-eks-optimized-ami,ml-eks-optimized-ami.title]] | ||
= Create nodes with EKS optimized accelerated Amazon Linux AMIs | ||
:info_titleabbrev: Run GPU AMIs | ||
|
||
include::../attributes.txt[] | ||
|
||
The Amazon EKS optimized accelerated Amazon Linux AMI is built on top of the standard Amazon EKS optimized Amazon Linux AMI. For details on these AMIs, see <<gpu-ami>>. | ||
The following text describes how to enable {aws} Neuron-based workloads. | ||
|
||
.To enable {aws} Neuron (ML accelerator) based workloads | ||
For details on training and inference workloads using [.noloc]`Neuron` in Amazon EKS, see the following references: | ||
|
||
* https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/kubernetes-getting-started.html[Containers - Kubernetes - Getting Started] in the _{aws} [.noloc]`Neuron` Documentation_ | ||
* https://github.com/aws-neuron/aws-neuron-eks-samples/blob/master/README.md#training[Training] in {aws} [.noloc]`Neuron` EKS Samples on GitHub | ||
* <<inferentia-support,Deploy ML inference workloads with AWSInferentia on Amazon EKS>> | ||
The following procedure describes how to run a workload on a GPU based instance with the Amazon EKS optimized accelerated AMI. | ||
|
||
. After your GPU nodes join your cluster, you must apply the https://github.com/NVIDIA/k8s-device-plugin[NVIDIA device plugin for Kubernetes] as a [.noloc]`DaemonSet` on your cluster. Replace [.replaceable]`vX.X.X` with your desired https://github.com/NVIDIA/k8s-device-plugin/releases[NVIDIA/k8s-device-plugin] version before running the following command. | ||
+ | ||
[source,bash,subs="verbatim,attributes,quotes"] | ||
---- | ||
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/vX.X.X/deployments/static/nvidia-device-plugin.yml | ||
---- | ||
. You can verify that your nodes have allocatable GPUs with the following command. | ||
+ | ||
[source,bash,subs="verbatim,attributes,quotes"] | ||
---- | ||
kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu" | ||
---- | ||
. Create a file named `nvidia-smi.yaml` with the following contents. Replace [.replaceable]`tag` with your desired tag for https://hub.docker.com/r/nvidia/cuda/tags[nvidia/cuda]. This manifest launches an https://developer.nvidia.com/cuda-zone[NVIDIA CUDA] container that runs `nvidia-smi` on a node. | ||
+ | ||
[source,yaml,subs="verbatim,attributes,quotes"] | ||
---- | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: nvidia-smi | ||
spec: | ||
restartPolicy: OnFailure | ||
containers: | ||
- name: nvidia-smi | ||
image: nvidia/cuda:tag | ||
args: | ||
- "nvidia-smi" | ||
resources: | ||
limits: | ||
nvidia.com/gpu: 1 | ||
---- | ||
. Apply the manifest with the following command. | ||
+ | ||
[source,bash,subs="verbatim,attributes,quotes"] | ||
---- | ||
kubectl apply -f nvidia-smi.yaml | ||
---- | ||
. After the [.noloc]`Pod` has finished running, view its logs with the following command. | ||
+ | ||
[source,bash,subs="verbatim,attributes,quotes"] | ||
---- | ||
kubectl logs nvidia-smi | ||
---- | ||
+ | ||
An example output is as follows. | ||
+ | ||
[source,bash,subs="verbatim,attributes,quotes"] | ||
---- | ||
Mon Aug 6 20:23:31 20XX | ||
+-----------------------------------------------------------------------------+ | ||
| NVIDIA-SMI XXX.XX Driver Version: XXX.XX | | ||
|-------------------------------+----------------------+----------------------+ | ||
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | ||
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | ||
|===============================+======================+======================| | ||
| 0 Tesla V100-SXM2... On | 00000000:00:1C.0 Off | 0 | | ||
| N/A 46C P0 47W / 300W | 0MiB / 16160MiB | 0% Default | | ||
+-------------------------------+----------------------+----------------------+ | ||
+-----------------------------------------------------------------------------+ | ||
| Processes: GPU Memory | | ||
| GPU PID Type Process name Usage | | ||
|=============================================================================| | ||
| No running processes found | | ||
+-----------------------------------------------------------------------------+ | ||
---- | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
//!!NODE_ROOT <section> | ||
|
||
[.topic] | ||
[[ml-get-started,ml-get-started.title]] | ||
= Get started with ML | ||
:info_doctype: section | ||
:info_title: Get started deploying Machine Learning tools on EKS | ||
:info_titleabbrev: Get started with ML | ||
:info_abstract: Choose the Machine Learning on EKS tools and platforms that best suit your needs, then use quick start procedures to deploy them to the {aws} cloud. | ||
|
||
include::../attributes.txt[] | ||
|
||
|
||
[abstract] | ||
-- | ||
Choose the Machine Learning on EKS tools and platforms that best suit your needs, then use quick start procedures to deploy ML workloads and EKS clusters to the {aws} cloud. | ||
-- | ||
|
||
To jump into Machine Learning on EKS, start by choosing from these prescriptive patterns to quickly get an EKS cluster and ML software and hardware ready to begin running ML workloads. Most of these patterns are based on Terraform blueprints that are available from the https://awslabs.github.io/data-on-eks/docs/introduction/intro[Data on Amazon EKS] site. Before you begin, here are few things to keep in mind: | ||
|
||
* GPUs or Neuron instances are required to run these procedures. Lack of availability of these resources can cause these procedures to fail during cluster creation or node autoscaling. | ||
* Neuron SDK (Tranium and Inferentia-based instances) can save money and are more available than NVIDIA GPUs. So, when your worloads permit it, we recommend that you consider using Neutron for your Machine Learning workloads (see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/[Welcome to {aws} Neuron]). | ||
* Some of the getting started experiences here require that you get data via your own https://huggingface.co/[Hugging Face] account. | ||
To get started, choose from the following selection of patterns that are designed to get you started setting up infrastructure to run your Machine Learning workloads: | ||
|
||
* *https://awslabs.github.io/data-on-eks/docs/blueprints/ai-ml/jupyterhub[JupyterHub on EKS]*: Explore the https://awslabs.github.io/data-on-eks/docs/blueprints/ai-ml/jupyterhub[JupyterHub blueprint], which showcases Time Slicing and MIG features, as well as multi-tenant configurations with profiles. This is ideal for deploying large-scale JupyterHub platforms on EKS. | ||
* *https://aws.amazon.com/ai/machine-learning/neuron/[Large Language Models on {aws} Neuron and RayServe]*: Use https://aws.amazon.com/ai/machine-learning/neuron/[{aws} Neuron] to run large language models (LLMs) on Amazon EKS and {aws} Trainium and {aws} Inferentia accelerators. See https://awslabs.github.io/data-on-eks/docs/gen-ai/inference/Neuron/vllm-ray-inf2[Serving LLMs with RayServe and vLLM on {aws} Neuron] for instructions on setting up a platform for making inference requests, with components that include: | ||
+ | ||
** {aws} Neuron SDK toolkit for deep learning | ||
** {aws} Inferentia and Trainium accelerators | ||
** vLLM - variable-length language model (see the https://docs.vllm.ai/en/latest/[vLLM] documentation site) | ||
** RayServe scalable model serving library (see the https://docs.ray.io/en/latest/serve/index.html[Ray Serve: Scalable and Programmable Serving] site) | ||
** Llama-3 language model, using your own https://huggingface.co/[Hugging Face] account. | ||
** Observability with {aws} CloudWatch and Neuron Monitor | ||
** Open WebUI | ||
* *https://awslabs.github.io/data-on-eks/docs/gen-ai/inference/GPUs/vLLM-NVIDIATritonServer[Large Language Models on NVIDIA and Triton]*: Deploy multiple large language models (LLMs) on Amazon EKS and NVIDIA GPUs. See https://awslabs.github.io/data-on-eks/docs/gen-ai/inference/GPUs/vLLM-NVIDIATritonServer[Deploying Multiple Large Language Models with NVIDIA Triton Server and vLLM] for instructions for setting up a platform for making inference requests, with components that include: | ||
+ | ||
** NVIDIA Triton Inference Server (see the https://github.com/triton-inference-server/server[Triton Inference Server] GitHub site) | ||
** vLLM - variable-length language model (see the https://docs.vllm.ai/en/latest/[vLLM] documentation site) | ||
** Two language models: mistralai/Mistral-7B-Instruct-v0.2 and meta-llama/Llama-2-7b-chat-hf, using your own https://huggingface.co/[Hugging Face] account. | ||
=== Continuing with ML on EKS | ||
|
||
Along with choosing from the blueprints described on this page, there are other ways you can proceed through the ML on EKS documentation if you prefer. For example, you can: | ||
|
||
* *Try tutorials for ML on EKS* – Run other end-to-end tutorials for building and running your own Machine Learning models on EKS. See <<ml-tutorials>>. | ||
|
||
To improve your work with ML on EKS, refer to the following: | ||
|
||
* *Prepare for ML* – Learn how to prepare for ML on EKS with features like custom AMIs and GPU reservations. See <<ml-prepare-for-cluster>>. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
//!!NODE_ROOT <section> | ||
|
||
[.topic] | ||
[[ml-prepare-for-cluster,ml-prepare-for-cluster.title]] | ||
= Prepare for ML clusters | ||
:info_doctype: section | ||
:info_title: Prepare to create an EKS cluster for Machine Learning | ||
:info_titleabbrev: Prepare for ML | ||
:info_abstract: Learn how to make decisions about CPU, AMIs, and tooling before creating an EKS cluster for ML. | ||
|
||
include::../attributes.txt[] | ||
|
||
|
||
[abstract] | ||
-- | ||
Learn how to make decisions about CPU, AMIs, and tooling before creating an EKS cluster for ML. | ||
-- | ||
|
||
There are ways that you can enhance your Machine Learning on EKS experience. | ||
Following pages in this section will help you: | ||
|
||
* Understand your choices for using ML on EKS and | ||
* Help in preparation of your EKS and ML environment. | ||
In particular, this will help you: | ||
|
||
* *Choose AMIs*: {aws} offers multiple customized AMIs for running ML workloads on EKS. See <<ml-eks-optimized-ami>>. | ||
* *Customize AMIs*: You can further modify {aws} custom AMIs to add other software and drivers needed for your particular use cases. See <<capacity-blocks>>. | ||
* *Reserve GPUs*: Because of the demand for GPUs, to ensure that the GPUs you need are available when you need them, you can reserve the GPUs you need in advance. See <<node-taints-managed-node-groups>>. | ||
* *Add EFA*: Add the Elastic Fabric Adapter to improve network performance for inter-node cluster communications. See <<node-efa>>. | ||
* *Use AWSInferentia workloads*: Create an EKS cluster with Amazon EC2 Inf1 instances. See <<inferentia-support>>. | ||
[.topiclist] | ||
[[Topic List]] | ||
|
||
include::ml-eks-optimized-ami.adoc[leveloffset=+1] | ||
|
||
include::capacity-blocks.adoc[leveloffset=+1] | ||
|
||
include::node-taints-managed-node-groups.adoc[leveloffset=+1] | ||
|
||
include::node-efa.adoc[leveloffset=+1] | ||
|
||
include::inferentia-support.adoc[leveloffset=+1] |
Oops, something went wrong.