Esm2 on Sagemaker Hyperpod #387

awsankur · 2024-07-25T06:32:47Z

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: Ankur Srivastava <[email protected]>

KeitaW · 2024-07-25T08:01:38Z

Do we have any SMHP specific feature in this test case?
If not we may organize the test case per scheduler:

23.esm
├── kubernetes
└── slurm

see also #381

KeitaW · 2024-07-30T23:22:56Z

3.test_cases/23.SMHP-esm2/README.md

+
+|  Model | device_batch_size | num_nodes | torch.compile |     Instance   |   Throughput   |
+|:------:|:-----------------:|:---------:|:-------------:| :------------: | :------------: |
+|  ESM2  |         8         |     2     |       No      |  g5.12xlarge   |  160 samples/s | 


The set up instruction advise to use 24xl but actually 12xl was used?

KeitaW · 2024-07-30T23:37:18Z

3.test_cases/23.SMHP-esm2/README.md

+## What is ESM-2?
+[ESM-2](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1) is a pLM trained using unsupervied masked language modelling on 250 Million protein sequences by researchers at [Facebook AI Research (FAIR)](https://www.biorxiv.org/content/10.1101/2022.07.20.500902v1). It is available in several sizes, ranging from 8 Million to 15 Billion parameters. The smaller models are suitable for various sequence and token classification tasks. The FAIR team also adapted the 3 Billion parameter version into the ESMFold protein structure prediction algorithm. They have since used ESMFold to predict the struture of [more than 700 Million metagenomic proteins](https://esmatlas.com/about).
+
+ESM-2 is a powerful pLM. We will demonstrate how to use QLoRA to fine-tune ESM-2 on g5.24xlarge instances. We will use ESM-2 to predict [subcellular localization](https://academic.oup.com/nar/article/50/W1/W228/6576357?login=false). Understanding where proteins appear in cells can help us understand their role in disease and find new drug targets.


Is this test case demonstrating pretraining? or finetuning? I believe latter but the title states former.

3.test_cases/23.SMHP-esm2/README.md

3.test_cases/23.SMHP-esm2/3.train_fsdp.sh

3.test_cases/23.SMHP-esm2/2.train_ddp.sh

3.test_cases/23.SMHP-esm2/3.train_fsdp.sh

perifaws · 2024-09-12T16:16:14Z

@awsankur @KeitaW are we good on this?

awsankur added 4 commits July 3, 2024 18:04

Added files

8631a20

Signed-off-by: Ankur Srivastava <[email protected]>

Updated with training example

251e015

Signed-off-by: Ankur Srivastava <[email protected]>

Added ESM2 training on SMHP

82d2e4e

Signed-off-by: Ankur Srivastava <[email protected]>

Added ESM2 training on SMHP

a2d7766

Signed-off-by: Ankur Srivastava <[email protected]>

awsankur requested review from KeitaW and amanshanbhag July 25, 2024 06:32

KeitaW reviewed Jul 30, 2024

View reviewed changes

3.test_cases/23.SMHP-esm2/README.md Outdated Show resolved Hide resolved

KeitaW reviewed Jul 30, 2024

View reviewed changes

3.test_cases/23.SMHP-esm2/3.train_fsdp.sh Outdated Show resolved Hide resolved

KeitaW reviewed Jul 30, 2024

View reviewed changes

3.test_cases/23.SMHP-esm2/2.train_ddp.sh Outdated Show resolved Hide resolved

KeitaW reviewed Jul 30, 2024

View reviewed changes

3.test_cases/23.SMHP-esm2/3.train_fsdp.sh Outdated Show resolved Hide resolved

KeitaW assigned awsankur Jul 30, 2024

KeitaW added 4 commits November 5, 2024 11:07

Update 3.test_cases/23.SMHP-esm2/2.train_ddp.sh

d87b862

Update 3.test_cases/23.SMHP-esm2/README.md

f5b0543

Update 3.test_cases/23.SMHP-esm2/3.train_fsdp.sh

30cb879

Update 3.test_cases/23.SMHP-esm2/3.train_fsdp.sh

d63b2c6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Esm2 on Sagemaker Hyperpod #387

Esm2 on Sagemaker Hyperpod #387

awsankur commented Jul 25, 2024

KeitaW commented Jul 25, 2024 •

edited

Loading

KeitaW Jul 30, 2024

KeitaW Jul 30, 2024

perifaws commented Sep 12, 2024

Esm2 on Sagemaker Hyperpod #387

Are you sure you want to change the base?

Esm2 on Sagemaker Hyperpod #387

Conversation

awsankur commented Jul 25, 2024

KeitaW commented Jul 25, 2024 • edited Loading

KeitaW Jul 30, 2024

Choose a reason for hiding this comment

KeitaW Jul 30, 2024

Choose a reason for hiding this comment

perifaws commented Sep 12, 2024

KeitaW commented Jul 25, 2024 •

edited

Loading