- Github Repo: Awsome Distributed Training Architectures
- Talk @ AWS Startups 2024 SageMaker HyperPod: Supercharge Your Healthcare and Life Sciences Workflows
- AWS ML Blog Accelerate your generative AI distributed training workloads with the NVIDIA NeMo Framework on Amazon EKS
- AWS On Air AWS EFA & NVIDIA Nsight Systems: Distributed Training Performance Analysis
- Talk @ Nvidia GTC 2024 Achieving Higher Performance From Your Data Center and Cloud Application [S62388]
- Talk @ Nvidia GTC 2023 Training and Productionizing PyTorch Large Language Models on AWS [S52388]
- AWS HPC BLOG Accelerate drug discovery with NVIDIA BioNeMo Framework on Amazon EKS
- AWS HPC BLOG Protein language model training with NVIDIA BioNeMo framework on AWS ParallelCluster
- AWS ML BLOG Accelerate Hyperparameter Grid Search for Sentiment Analysis with Bert Models using Weights & Biases, EKS and TorchElastic
- AWS ML BLOG Running ML inference of PyTorch based OpenFold protein folding model on AWS using Amazon EKS
- AWS ML BLOG Achieve four times higher ML inference throughput at three times lower cost per inference with Amazon EC2 G5 instances for NLP and CV PyTorch models
- AWS ML BLOG Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances
- Talk @ AWS and HuggingFace Startup Event 2022 Accelerating NLP with Transformers: AWS & HuggingFace
Distributed Training and Inference @ AWS
-
22:53
(UTC -08:00) - in/ankur-srivastava-6b628521
Pinned Loading
-
aws-samples/awsome-distributed-training
aws-samples/awsome-distributed-training PublicCollection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
-
aws-samples/aws-distributed-training-workshop-eks
aws-samples/aws-distributed-training-workshop-eks PublicCreate an Amazon EKS cluster and run a distributed training example
-
aws-samples/aws-do-openfold-inference
aws-samples/aws-do-openfold-inference PublicOpenfold inference architecture for Amazon EKS
-
aws-samples/best-practices-for-fastapi-on-inferentia
aws-samples/best-practices-for-fastapi-on-inferentia Public -
aws-samples/aws-distributed-training-workshop-pcluster
aws-samples/aws-distributed-training-workshop-pcluster Public -
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.