-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks: model benchmarks - change torch.distributed.launch to torchrun #556
Conversation
@microsoft-github-policy-service agree company="AMD" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR, pls also replace all python3 -m torch.distributed.launch --use_env
with torchrun
in tests/
to pass unit tests
/azp run |
Azure Pipelines successfully started running 3 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## main #556 +/- ##
==========================================
- Coverage 86.96% 86.32% -0.64%
==========================================
Files 93 93
Lines 6268 6268
==========================================
- Hits 5451 5411 -40
- Misses 817 857 +40
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Thanks for your PR. We will test it soon to check whether it will impact the final performance or not. If not, we will merge it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the functionality,
works for both single-node and multi-node on MI200
Regarding the performance,
not observe much difference on single A100 node
Model | Precision | Previous Throughput | New Throughput |
---|---|---|---|
bert/pytorch-bert-base | fp32 | 380.13 | 379.61 |
bert/pytorch-bert-base | fp16 | 614.74 | 614.43 |
bert/pytorch-bert-large | fp32 | 130.85 | 130.77 |
bert/pytorch-bert-large | fp16 | 224.03 | 223.17 |
densenet/pytorch-densenet169 | fp32 | 268.70 | 264.50 |
densenet/pytorch-densenet169 | fp16 | 274.66 | 266.47 |
densenet/pytorch-densenet201 | fp32 | 219.88 | 219.15 |
densenet/pytorch-densenet201 | fp16 | 219.61 | 218.44 |
gpt/pytorch-gpt2-small | fp32 | 179.65 | 179.04 |
gpt/pytorch-gpt2-small | fp16 | 188.58 | 189.43 |
gpt/pytorch-gpt2-large | fp32 | 35.37 | 35.48 |
gpt/pytorch-gpt2-large | fp16 | 59.36 | 59.25 |
lstm/pytorch-lstm | fp32 | 4975.33 | 5026.24 |
lstm/pytorch-lstm | fp16 | 7895.35 | 7981.03 |
resnet/pytorch-resnet50 | fp32 | 945.86 | 945.61 |
resnet/pytorch-resnet50 | fp16 | 1273.37 | 1317.63 |
resnet/pytorch-resnet101 | fp32 | 607.28 | 611.11 |
resnet/pytorch-resnet101 | fp16 | 887.07 | 913.76 |
resnet/pytorch-resnet152 | fp32 | 436.23 | 435.34 |
resnet/pytorch-resnet152 | fp16 | 652.38 | 660.70 |
vgg/pytorch-vgg11 | fp32 | 760.03 | 757.03 |
vgg/pytorch-vgg11 | fp16 | 1130.59 | 1139.74 |
vgg/pytorch-vgg13 | fp32 | 554.00 | 552.60 |
vgg/pytorch-vgg13 | fp16 | 858.53 | 885.61 |
vgg/pytorch-vgg16 | fp32 | 482.54 | 481.02 |
vgg/pytorch-vgg16 | fp16 | 777.03 | 785.29 |
vgg/pytorch-vgg19 | fp32 | 422.60 | 422.29 |
vgg/pytorch-vgg19 | fp16 | 693.83 | 696.07 |
/azp run |
Azure Pipelines successfully started running 3 pipeline(s). |
This PR has following changes