Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MV-473: [MV-MIOpen] Benchmark & Port Interpolate #41

Closed

Conversation

hieule88
Copy link
Collaborator

@hieule88 hieule88 commented Jul 16, 2024

  • Added Interpolate forward and backward with multi-mode operation and kernel.

  • Added driver test and gtest for Interpolate .

  • New API is guarded by MIOPEN_BETA_API macro.

  • Compared to ROCm pytorch: Link

  • Average over all cases:

  • Mode Nearest

Type Forward 3D Backward 3D Forward 4-5D Backward 4D Backward 5D
float16 1.70 5.42 Not Improved 3.56 1.51
float32 1.67 5.36 Not Improved 2.69 1.41
bfloat16 1.68 18.38 Not Improved 3.54 1.56
  • Mode Linear
Type Forward Backward
float16 22.50 217.65
float32 25.98 152.73
bfloat16 22.67 218.43
  • Mode Bilinear
Type Forward Backward
float16 6.63 99.29
float32 6.78 2.59
bfloat16 6.82 32.17
  • Mode Bicubic
Type Forward Backward
float16 2.85 10.99
float32 2.99 7.17
bfloat16 3.02 10.76
  • Mode Trilinear
Type Forward Backward
float16 Not Improved 20.82
float32 Not Improved 4.70
bfloat16 Not Improved 26.27

@littlecutebird
Copy link
Collaborator

littlecutebird commented Jul 17, 2024

Wow the improvement is very big. Could you provide script you used to benchmark rocm vs modnn + script benchmark rocm vs miopen to here or comment on your Jira ticket (recommend this method because recently our workflow required uploading benchmark script to comment of Jira ticket). Did you find any specific reasons that the improvement in some cases is very big? For example I saw linear backward is around 200x improvement compared to ROCm

@hieule88
Copy link
Collaborator Author

Wow the improvement is very big. Could you provide script you used to benchmark rocm vs modnn + script benchmark rocm vs miopen to here or comment on your Jira ticket (recommend this method because recently our workflow required uploading benchmark script to comment of Jira ticket). Did you find any specific reasons that the improvement in some cases is very big? For example I saw linear backward is around 200x improvement compared to ROCm

Very nice of you to remind me. I added the script to JIRA ticket. There are no special reasons; my MIOpen's performance is similar to Modnn's performance, and I made some constraints to eliminate low-performance cases.

Copy link
Collaborator

@littlecutebird littlecutebird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

src/CMakeLists.txt Outdated Show resolved Hide resolved
include/miopen/miopen.h Outdated Show resolved Hide resolved
include/miopen/miopen.h Show resolved Hide resolved
include/miopen/miopen.h Outdated Show resolved Hide resolved
src/include/miopen/interpolate/problem_description.hpp Outdated Show resolved Hide resolved
@kyeonghwanryu
Copy link

kyeonghwanryu commented Jul 18, 2024

Wow the numbers look very impressive. But Some of that don't make sense to me, for example 99.29x in fp16 and 2.59x in fp32 (Bilinear bwd). Can you figure out why this much difference happen?

It looks like one case with 1800x improvement ruined the geomean but please check.

driver/interpolate_driver.hpp Outdated Show resolved Hide resolved
driver/interpolate_driver.hpp Outdated Show resolved Hide resolved
driver/interpolate_driver.hpp Outdated Show resolved Hide resolved
@littlecutebird
Copy link
Collaborator

I read benchmark number in 5d backward nearest and it looks like the last (very) small tests contributed too much to your average improvement. Overall big tests produced slower results (around 0.8 - 1x improvement) but the last very small tests with only 2048 and 4096 threads have 10-14x improvement, and then average improvement for this case is 2.66x, which I think is not fair. Could you create more tests for that cases with total threads between 100k and 5M to see if it is faster or not?
image

Copy link
Collaborator

@littlecutebird littlecutebird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some reviews I got from AMD and I think it will fit to your PR too

src/include/miopen/interpolate.hpp Show resolved Hide resolved
test/gtest/interpolate.cpp Outdated Show resolved Hide resolved
test/gtest/interpolate.hpp Outdated Show resolved Hide resolved
test/gtest/interpolate.hpp Outdated Show resolved Hide resolved
test/gtest/interpolate.hpp Outdated Show resolved Hide resolved
test/gtest/interpolate.hpp Outdated Show resolved Hide resolved
src/solver/interpolate/bwd_bicubic_interpolate.cpp Outdated Show resolved Hide resolved
@littlecutebird
Copy link
Collaborator

Run make analyze and fix the warnings existed in your file in this PR.

@littlecutebird
Copy link
Collaborator

littlecutebird commented Jul 29, 2024

Please don't update this branch to newest commit in upstream repo. Because in this PR you choose to merge to develop-moreh branch, now it will show '752 file changed' and I cannot review your code =))) (I clicked Files changed button and my Edge crashed after that 😡) You can consider creating a new PR which merge to moreh-dev:develop if you want
image

Copy link
Collaborator

@littlecutebird littlecutebird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

src/include/miopen/interpolate/problem_description.hpp Outdated Show resolved Hide resolved
src/include/miopen/interpolate/problem_description.hpp Outdated Show resolved Hide resolved
src/include/miopen/interpolate/problem_description.hpp Outdated Show resolved Hide resolved
src/include/miopen/interpolate/problem_description.hpp Outdated Show resolved Hide resolved
src/interpolate/problem_description.cpp Outdated Show resolved Hide resolved
src/include/miopen/interpolate/utils.hpp Show resolved Hide resolved
src/include/miopen/interpolate/solvers.hpp Outdated Show resolved Hide resolved
@hieule88
Copy link
Collaborator Author

hieule88 commented Jul 29, 2024

Please don't update this branch to newest commit in upstream repo. Because in this PR you choose to merge to develop-moreh branch, now it will show '752 file changed' and I cannot review your code =))) (I clicked Files changed button and my Edge crashed after that 😡) You can consider creating a new PR which merge to moreh-dev:develop if you want image

If I don't merge to develop, I can't fix issues which you mentioned above about MIOPEN_INTERNALS_EXPORT or Gtest new convention. You reap what you sow, my dear. Love your thoughtful reviews btw <3 !

@littlecutebird
Copy link
Collaborator

Please don't update this branch to newest commit in upstream repo. Because in this PR you choose to merge to develop-moreh branch, now it will show '752 file changed' and I cannot review your code =))) (I clicked Files changed button and my Edge crashed after that 😡) You can consider creating a new PR which merge to moreh-dev:develop if you want image

If I don't merge to develop, I can't fix issues which you mentioned above about MIOPEN_INTERNALS_EXPORT or Gtest new convention. You reap what you sow, my dear. Love your thoughtful reviews btw <3 !

Ok bro 🥲 if that is you can ignore my comments which can only fix when updating to new branch, i thought that macro existed from older branch, my bad 🥹 please revert back to previous commit before you updated to latest commit on upstream repo because now I have no ways to review your PR 😂😂 And i rethink it’s not a good idea to open a new PR to moreh-dev:develop because that branch is not synced up to latest commit on upstream repo, if your last commit you update to branch develop on moreh-dev then it is ok to open a new PR, but if you update to branch develop on rocm then it will show up many new files not related to your operation when opening new PR to moreh-dev:develop

Copy link
Collaborator

@littlecutebird littlecutebird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for solvers + kernels (without backward bicubic because this kernel is more complicated than others, I will review it later). Overall it is all good

src/kernels/MIOpenInterpolate.cpp Show resolved Hide resolved
src/kernels/MIOpenInterpolate.cpp Show resolved Hide resolved
src/kernels/MIOpenInterpolate.cpp Outdated Show resolved Hide resolved
@moreh-dev moreh-dev deleted a comment from littlecutebird Jul 30, 2024
Copy link
Collaborator

@littlecutebird littlecutebird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gtest + other things

src/kernels/MIOpenInterpolate.cpp Show resolved Hide resolved
test/gtest/interpolate.hpp Outdated Show resolved Hide resolved
test/gtest/interpolate.hpp Show resolved Hide resolved
test/gtest/interpolate.hpp Outdated Show resolved Hide resolved
test/gtest/interpolate.hpp Show resolved Hide resolved
Copy link
Collaborator

@littlecutebird littlecutebird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor things in driver

driver/interpolate_driver.hpp Show resolved Hide resolved
driver/interpolate_driver.hpp Show resolved Hide resolved
driver/interpolate_driver.hpp Show resolved Hide resolved
driver/interpolate_driver.hpp Outdated Show resolved Hide resolved
driver/interpolate_driver.hpp Outdated Show resolved Hide resolved
@littlecutebird
Copy link
Collaborator

Overall it looks good now. Make sure to check this comment #41 (comment) and update 5D backward nearest benchmark result after you create more test cases. Also I found that in the newest commit d15fbc5 you commented out IsOverRocmNearestBwd and IsOverRocmNearestFwd, is this accidental or on purpose?

@hieule88
Copy link
Collaborator Author

hieule88 commented Aug 5, 2024

IsOverRocmNearestBwd

I commented for re-run benchmark, updated 5D backward, check it out

@littlecutebird
Copy link
Collaborator

LGTM! Thank you for your hard work in this PR, and make sure to ask for review from other reviewers.

@hieule88
Copy link
Collaborator Author

hieule88 commented Aug 5, 2024

@kyeonghwanryu @DuongQLee Do you want to add more reviews? If not, can I close this PR?

@kyeonghwanryu
Copy link

Sorry it seems I missed the comment. I'm closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants