-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement MultiMarginLoss #3146
Conversation
const auto solvers = solver::SolverContainer<solver::multimarginloss::MultiMarginLossForward>{}; | ||
|
||
auto pair_size_vector = solvers.GetWorkspaceSizes(ctx, problem); | ||
return pair_size_vector.empty() ? static_cast<size_t>(-1) : pair_size_vector.front().second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that Get*WorkspaceSize
may return size_t(-1)
value in case of empty solvers.
I guess it's not quite a good solution because it may lead to a crash while allocating 2^64 bytes of GPU memory.
It's better either to return 0 or to throw an exception that no solution found.
It's just a notice and has to be fixed in a separated PR across all the other affected solvers like Sum or t5layernorm.
@seungmanhan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not return 0 because 0 means "we need 0 workspace size but it didn't mean this is a case of empty solvers". For example in my operation when reduction mode = None calling this function will return 0. Maybe throw an exception is a better solution and I agree that this need to be discussed more and fixed in a separated PR.
auto reduce_out = | ||
static_cast<Data_t>(static_cast<char*>(params.workspace) + | ||
size * get_data_size(deref(params.oDesc).GetType())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially reduce_out
may not be aligned properly (or at least efficiently).
MultiBufferWorkspaceTraits
is a good option to use in this case.
auto elem = problem.GetiDesc().GetLengths()[0]; | ||
return (elem + (elem + LOCAL_SIZE_REDUCE) / LOCAL_SIZE_REDUCE) * | ||
get_data_size(problem.GetoDesc().GetType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it does not take proper alignment into account.
MultiBufferWorkspaceTraits
is a good option to use in this case.
if(problem.GetiDesc().GetLengths()[1] <= 30) | ||
return true; | ||
if((problem.GetiDesc().GetType() == miopenHalf || | ||
problem.GetiDesc().GetType() == miopenBFloat16) && | ||
problem.GetiDesc().IsContiguous() && problem.GetiDesc().GetLengths()[1] <= 40) | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for clarification.
Am I right that if .GetLengths()[1] <= 30
this kernel is always faster for any datatype and even for layouts which are not .IsContiguous()
, but for (B)FP16 with .IsContiguous()
layout it can be faster up to 40 elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is, I have edited the condition when kernel is faster in PR description for better clarification. I will add more test cases for (B)FP16 with .GetLengths()[1] between 31 and 40 to the benchmark result in PR description to demonstrate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some test cases to fp16 and bfp16 dropdown. Please have a look at it. Should I or you click 'resolve conversation'?
test/gtest/multimarginloss.cpp
Outdated
INSTANTIATE_TEST_SUITE_P(MultiMarginLossTestSet, | ||
MultiMarginLossForwardTestFloat, | ||
testing::ValuesIn(MultiMarginLossTestConfigs())); | ||
INSTANTIATE_TEST_SUITE_P(MultiMarginLossTestSet, | ||
MultiMarginLossForwardTestHalf, | ||
testing::ValuesIn(MultiMarginLossFp16TestConfigs())); | ||
INSTANTIATE_TEST_SUITE_P(MultiMarginLossTestSet, | ||
MultiMarginLossForwardTestBFloat16, | ||
testing::ValuesIn(MultiMarginLossTestConfigs())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask you to rename the tests according to the new naming scheme described on the wiki?
Also check this document and properly adjust test assertion. Some of EXCEPT_
assertions should be replaced with ASSERT_
. Some of the assertion conditions should be properly used, for example EXCEPT_TRUE(a == b);
should be replaced with EXCEPT_EQ(a, b);
.
@seungmanhan @long10024070 @BuiChiTrung @et16kr could you also take into account this wiki page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the test suite name format following the wiki page. However, a warning is raised: "Avoid using "_" in test suite name "GPU_SigmoidFocalLoss_fwd_FP32" according to Googletest FAQ". Can we update the naming scheme or just ignore this warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BuiChiTrung I've also noticed that, but let's ignore this warning for now.
We've already started that renaming and have already done some work.
I'll update the scheme and rename the test a bit later, once we've got regexp checker for the test names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it
test/gtest/multimarginloss.hpp
Outdated
margin, | ||
reduction_mode); | ||
if(ws_sizeInBytes == static_cast<size_t>(-1)) | ||
GTEST_SKIP() << "Call GetMultiMarginLossForwardWorkspaceSize failed!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is FAIL()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway I'm not sure if it should be failed or skipped. I prefer skip because basically there are nothing wrong with the test input or data validation, it just does not have any solvers to handle that cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or probably we just removed the solver and got the tests passed.
@CAHEK7 I closed this PR because this PR is created from a branch in a fork repo and I noticed that the CI/CD will not run full tests. I created a new PR in #3166 , please have a look. I have fixed all except UPDATED: I have added |
Unreduced:
fp32
fp16
bfp16
Reduced:
fp32
fp16
bfp16