Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MultiMarginLoss #3146

Closed
wants to merge 16 commits into from

Conversation

littlecutebird
Copy link
Collaborator

@littlecutebird littlecutebird commented Jul 24, 2024

  • Add MultiMarginLoss forward operation. Given input tensor is (N,C), MIOpen is better if C <= 30 (every data type, both uncontiguous and contiguous tensor). And if it is float16, bfloat16 contiguous tensor then C can be a little bit bigger, which is <= 40. Backward is not better in general.
  • Compared to ROCm:

Unreduced:

type Forward
float32 6.84
float16 13.33
bfloat16 13.82
fp32
input_size num class cont ROCm MIOpen Improvement
1234567 18 uncont 2736031 1823520 1.50
1234567 18 cont 2483891 978455 2.54
845729 13 uncont 1819759 708077 2.57
845729 13 cont 1681666 249187 6.75
1974532 24 uncont 4458704 4219790 1.06
1974532 24 cont 4000088 3142670 1.27
763492 6 uncont 1588180 100389 15.82
763492 6 cont 1516405 55920 27.12
1293850 20 uncont 2887276 2201450 1.31
1293850 20 cont 2607842 1378610 1.89
987654 7 uncont 2050747 190443 10.77
987654 7 cont 1957373 81095 24.14
345678 10 uncont 745427 164980 4.52
345678 10 cont 693732 58318 11.90
1654783 21 uncont 3721822 3039910 1.22
1654783 21 cont 3342149 1971750 1.70
234567 9 uncont 510567 88844 5.75
234567 9 cont 473656 38723 12.23
1892345 12 uncont 3999049 1354490 2.95
1892345 12 cont 3732094 416875 8.95
574839 15 uncont 1266874 616905 2.05
574839 15 cont 1148460 252875 4.54
1495832 25 uncont 3432324 3518130 0.98
1495832 25 cont 3035611 2590030 1.17
934750 8 uncont 1943294 243038 8.00
934750 8 cont 1841760 107171 17.19
847293 22 uncont 1933358 1645010 1.18
847293 22 cont 1723602 1092990 1.58
1639204 19 uncont 3641152 2624590 1.39
1639204 19 cont 3296039 1530090 2.15
215678 14 uncont 486488 190374 2.56
215678 14 cont 451624 85781 5.26
1274835 11 uncont 2687633 789673 3.40
1274835 11 cont 2518452 234340 10.75
1346789 5 uncont 2752544 116485 23.63
1346789 5 cont 2644498 63096 41.91
1765432 17 uncont 3891244 2401560 1.62
1765432 17 cont 3532131 1156920 3.05
263748 16 uncont 583974 260365 2.24
263748 16 cont 536647 137267 3.91
1498765 23 uncont 3401829 3117330 1.09
1498765 23 cont 3034908 2182460 1.39
1728394 18 uncont 3813535 2559770 1.49
1728394 18 cont 3472084 1386990 2.50
459283 20 uncont 1043054 769673 1.36
459283 20 cont 934768 458607 2.04
1583642 7 uncont 3274808 300505 10.90
1583642 7 cont 3113979 122190 25.48
983472 13 uncont 2114547 830775 2.55
983472 13 cont 1954917 288469 6.78
1862345 24 uncont 4195471 3978520 1.05
1862345 24 cont 3790247 2962250 1.28
712345 6 uncont 1478424 93957 15.74
712345 6 cont 1412536 52516 26.90
1456789 15 uncont 3158216 1609560 1.96
1456789 15 cont 2898056 640847 4.52
298376 10 uncont 654681 140607 4.66
298376 10 cont 600040 52729 11.38
1872345 21 uncont 4204781 3438470 1.22
1872345 21 cont 3782457 2244320 1.69
fp16
input_size num class cont ROCm MIOpen Improvement
1534567 35 cont 3286684 2237740 1.47
765432 33 cont 1636918 880785 1.86
1987654 38 cont 4265762 3758800 1.13
1234567 18 uncont 2770542 953708 2.91
1234567 18 cont 2589537 280351 9.24
845729 13 uncont 1885278 250485 7.53
845729 13 cont 1782992 102149 17.45
1974532 24 uncont 4522927 2879150 1.57
1974532 24 cont 4147142 870065 4.77
763492 6 uncont 1650466 60471 27.29
763492 6 cont 1592787 37285 42.72
1293850 20 uncont 2921260 1319450 2.21
1293850 20 cont 2711519 354060 7.66
987654 7 uncont 2135050 83424 25.59
987654 7 cont 2058300 56718 36.29
345678 10 uncont 756323 59047 12.81
345678 10 cont 727251 34421 21.13
1654783 21 uncont 3748893 1884360 1.99
1654783 21 cont 3471346 527952 6.58
234567 9 uncont 521591 40164 12.99
234567 9 cont 495352 24909 19.89
1892345 12 uncont 4175606 422635 9.88
1892345 12 cont 3958634 196871 20.11
574839 15 uncont 1287945 251523 5.12
574839 15 cont 1214155 93767 12.95
1495832 25 uncont 3432836 2407830 1.43
1495832 25 cont 3151945 775702 4.06
934750 8 uncont 2018860 109838 18.38
934750 8 cont 1944830 98779 19.69
847293 22 uncont 1950558 1043660 1.87
847293 22 cont 1783041 309192 5.77
1639204 19 uncont 3685872 1469910 2.51
1639204 19 cont 3436612 414511 8.29
215678 14 uncont 493448 86688 5.69
215678 14 cont 463240 40304 11.49
1274835 11 uncont 2807439 237078 11.84
1274835 11 cont 2668594 117374 22.74
1346789 5 uncont 2884926 68305 42.24
1346789 5 cont 2804704 52766 53.15
1765432 17 uncont 3940028 1107510 3.56
1765432 17 cont 3700896 344600 10.74
263748 16 uncont 595366 138832 4.29
263748 16 cont 559302 71913 7.78
1498765 23 uncont 3427829 2069380 1.66
1498765 23 cont 3149082 606082 5.20
1728394 18 uncont 3889197 1343220 2.90
1728394 18 cont 3618386 387175 9.35
459283 20 uncont 1056206 443371 2.38
459283 20 cont 971055 135879 7.15
1583642 7 uncont 3412214 123399 27.65
1583642 7 cont 3295832 84748 38.89
983472 13 uncont 2179909 290069 7.52
983472 13 cont 2065448 116785 17.69
1862345 24 uncont 4255988 2711720 1.57
1862345 24 cont 3912746 821653 4.76
712345 6 uncont 1548953 56961 27.19
712345 6 cont 1482665 35911 41.29
1456789 15 uncont 3237951 636225 5.09
1456789 15 cont 3057486 215630 14.18
298376 10 uncont 654425 53672 12.19
298376 10 cont 629128 31200 20.16
1872345 21 uncont 4235705 2142240 1.98
1872345 21 cont 3923990 594676 6.60
bfp16
input_size num class cont ROCm MIOpen Improvement
1534567 35 cont 3403528 2228260 1.53
765432 33 cont 1698116 878278 1.93
1987654 38 cont 4417182 3760470 1.17
1234567 18 uncont 2867228 946064 3.03
1234567 18 cont 2684832 278982 9.62
845729 13 uncont 1948669 250022 7.79
845729 13 cont 1840799 101741 18.09
1974532 24 uncont 4648364 2871730 1.62
1974532 24 cont 4301426 866846 4.96
763492 6 uncont 1717569 60009 28.62
763492 6 cont 1655650 37374 44.30
1293850 20 uncont 3018138 1295110 2.33
1293850 20 cont 2813822 352332 7.99
987654 7 uncont 2210089 82659 26.74
987654 7 cont 2131146 56647 37.62
345678 10 uncont 781282 59118 13.22
345678 10 cont 752674 34190 22.01
1654783 21 uncont 3869212 1890230 2.05
1654783 21 cont 3595072 525530 6.84
234567 9 uncont 530918 39897 13.31
234567 9 cont 512471 25068 20.44
1892345 12 uncont 4305588 420766 10.23
1892345 12 cont 4101240 195893 20.94
574839 15 uncont 1332040 250491 5.32
574839 15 cont 1257514 93501 13.45
1495832 25 uncont 3554850 2402840 1.48
1495832 25 cont 3268662 773227 4.23
934750 8 uncont 2087515 109749 19.02
934750 8 cont 2013965 98797 20.38
847293 22 uncont 2001453 1047739 1.91
847293 22 cont 1851424 307947 6.01
1639204 19 uncont 3813230 1471720 2.59
1639204 19 cont 3559714 412163 8.64
215678 14 uncont 502679 85284 5.89
215678 14 cont 475848 39841 11.94
1274835 11 uncont 2896046 236544 12.24
1274835 11 cont 2762464 117001 23.61
1346789 5 uncont 2983292 68624 43.47
1346789 5 cont 2900686 53388 54.33
1765432 17 uncont 4078969 1109770 3.68
1765432 17 cont 3834782 342644 11.19
263748 16 uncont 616565 138600 4.45
263748 16 cont 578374 72038 8.03
1498765 23 uncont 3532211 2072120 1.70
1498765 23 cont 3268280 604392 5.41
1728394 18 uncont 3999691 1341120 2.98
1728394 18 cont 3757055 385504 9.75
459283 20 uncont 1082045 442909 2.44
459283 20 cont 1004479 135150 7.43
1583642 7 uncont 3539592 122616 28.87
1583642 7 cont 3414588 84108 40.60
983472 13 uncont 2257385 289322 7.80
983472 13 cont 2137531 116554 18.34
1862345 24 uncont 4394361 2713810 1.62
1862345 24 cont 4056669 819216 4.95
712345 6 uncont 1604379 57138 28.08
712345 6 cont 1536123 35911 42.78
1456789 15 uncont 3349318 634286 5.28
1456789 15 cont 3166837 214777 14.74
298376 10 uncont 678089 53387 12.70
298376 10 cont 653993 30987 21.11
1872345 21 uncont 4372007 2141560 2.04
1872345 21 cont 4071267 591635 6.88

Reduced:

type Forward
float32 5.50
float16 10.12
bfloat16 10.28
fp32
input_size num class cont ROCm MIOpen Improvement
1234567 18 uncont 2756526 1853070 1.49
1234567 18 cont 2505107 1013640 2.47
845729 13 uncont 1842239 736450 2.50
845729 13 cont 1703170 283871 6.00
1974532 24 uncont 4508367 4299520 1.05
1974532 24 cont 5639611 3182010 1.77
763492 6 uncont 1604643 128696 12.47
763492 6 cont 1527509 83959 18.19
1293850 20 uncont 2908540 2235730 1.30
1293850 20 cont 2632593 1408350 1.87
987654 7 uncont 2076539 219726 9.45
987654 7 cont 1978717 115517 17.13
345678 10 uncont 762306 190832 3.99
345678 10 cont 715635 83352 8.59
1654783 21 uncont 3749277 3080300 1.22
1654783 21 cont 3366740 2012690 1.67
234567 9 uncont 523559 117807 4.44
234567 9 cont 494791 63135 7.84
1892345 12 uncont 4024937 1395050 2.89
1892345 12 cont 3757646 455118 8.26
574839 15 uncont 1288873 644854 2.00
574839 15 cont 1169340 280770 4.16
1495832 25 uncont 3451571 3554570 0.97
1495832 25 cont 3064090 2621360 1.17
934750 8 uncont 1967774 268178 7.34
934750 8 cont 1867775 136062 13.73
847293 22 uncont 1953470 1674770 1.17
847293 22 cont 1748225 1126130 1.55
1639204 19 uncont 4908330 2660810 1.84
1639204 19 cont 3318390 1568600 2.12
215678 14 uncont 512855 212775 2.41
215678 14 cont 462248 109072 4.24
1274835 11 uncont 2716928 821781 3.31
1274835 11 cont 2543524 265932 9.56
1346789 5 uncont 2782336 148878 18.69
1346789 5 cont 2674498 95684 27.95
1765432 17 uncont 3913532 2441770 1.60
1765432 17 cont 3556162 1197030 2.97
263748 16 uncont 607126 281681 2.16
263748 16 cont 554662 161392 3.44
1498765 23 uncont 3426581 3155590 1.09
1498765 23 cont 3060379 2216410 1.38
1728394 18 uncont 3847342 2596630 1.48
1728394 18 cont 3492244 1426420 2.45
459283 20 uncont 1061486 793852 1.34
459283 20 cont 959712 485595 1.98
1583642 7 uncont 3293495 338426 9.73
1583642 7 cont 3140634 154884 20.28
983472 13 uncont 2136548 860500 2.48
983472 13 cont 1971223 321839 6.12
1862345 24 uncont 4223018 4012330 1.05
1862345 24 cont 3799152 2994980 1.27
712345 6 uncont 1506617 119664 12.59
712345 6 cont 1434248 78899 18.18
1456789 15 uncont 3181691 1640260 1.94
1456789 15 cont 2920026 680581 4.29
298376 10 uncont 674761 164980 4.09
298376 10 cont 620520 76285 8.13
1872345 21 uncont 4232803 3478490 1.22
1872345 21 cont 3804127 2278100 1.67
fp16
input_size num class cont ROCm MIOpen Improvement
1534567 35 cont 3305867 2286430 1.45
765432 33 cont 1669093 916144 1.82
1987654 38 cont 4298065 3821870 1.12
1234567 18 uncont 2803902 983237 2.85
1234567 18 cont 2616913 310999 8.41
845729 13 uncont 1910797 280493 6.81
845729 13 cont 1799488 129438 13.90
1974532 24 uncont 4523518 2918170 1.55
1974532 24 cont 4176581 907306 4.60
763492 6 uncont 1677234 87978 19.06
763492 6 cont 1616275 63672 25.38
1293850 20 uncont 2944891 1363530 2.16
1293850 20 cont 2737663 386116 7.09
987654 7 uncont 2154537 115499 18.65
987654 7 cont 2093962 86535 24.20
345678 10 uncont 1600659 84828 18.87
345678 10 cont 747826 58745 12.73
1654783 21 uncont 3765949 1923530 1.96
1654783 21 cont 3491234 562069 6.21
234567 9 uncont 544966 64557 8.44
234567 9 cont 519223 47187 11.00
1892345 12 uncont 4203350 459900 9.14
1892345 12 cont 3983866 232536 17.13
574839 15 uncont 1311177 285570 4.59
574839 15 cont 1233226 119868 10.29
1495832 25 uncont 3463507 2443650 1.42
1495832 25 cont 3179048 816699 3.89
934750 8 uncont 2040972 137058 14.89
934750 8 cont 1968478 126373 15.58
847293 22 uncont 1964830 1079250 1.82
847293 22 cont 1807104 337531 5.35
1639204 19 uncont 3707055 1512520 2.45
1639204 19 cont 3461204 448308 7.72
215678 14 uncont 519975 112468 4.62
215678 14 cont 478888 63131 7.59
1274835 11 uncont 2829615 267888 10.56
1274835 11 cont 2693041 149322 18.04
1346789 5 uncont 2908573 102458 28.39
1346789 5 cont 2828319 84128 33.62
1765432 17 uncont 3969851 1157480 3.43
1765432 17 cont 3724528 383197 9.72
263748 16 uncont 618869 161641 3.83
263748 16 cont 581542 94705 6.14
1498765 23 uncont 3450501 2104600 1.64
1498765 23 cont 3176249 642883 4.94
1728394 18 uncont 3895085 1384550 2.81
1728394 18 cont 3645953 421452 8.65
459283 20 uncont 1066110 469079 2.27
459283 20 cont 997359 160680 6.21
1583642 7 uncont 3441654 155844 22.08
1583642 7 cont 3322346 116803 28.44
983472 13 uncont 2201607 323279 6.81
983472 13 cont 2091002 146439 14.28
1862345 24 uncont 4281854 2749480 1.56
1862345 24 cont 3945155 853244 4.62
712345 6 uncont 1575546 83166 18.94
712345 6 cont 1505194 64658 23.28
1456789 15 uncont 3261474 679727 4.80
1456789 15 cont 3079921 249284 12.36
298376 10 uncont 682713 76019 8.98
298376 10 cont 649161 54365 11.94
1872345 21 uncont 4257535 2186150 1.95
1872345 21 cont 3945419 631476 6.25
bfp16
input_size num class cont ROCm MIOpen Improvement
1534567 35 cont 3429736 2283930 1.50
765432 33 cont 1723268 917158 1.88
1987654 38 cont 4446733 3806780 1.17
1234567 18 uncont 2896940 982188 2.95
1234567 18 cont 3536000 311497 11.35
845729 13 uncont 1974732 281009 7.03
845729 13 cont 1862318 129882 14.34
1974532 24 uncont 4684028 2914930 1.61
1974532 24 cont 4335346 902775 4.80
763492 6 uncont 1728321 85239 20.28
763492 6 cont 1667858 68739 24.26
1293850 20 uncont 3048665 1356060 2.25
1293850 20 cont 2836814 386682 7.34
987654 7 uncont 2233544 111569 20.02
987654 7 cont 2154634 87086 24.74
345678 10 uncont 804562 82569 9.74
345678 10 cont 775618 57909 13.39
1654783 21 uncont 3895259 1928070 2.02
1654783 21 cont 3619216 565781 6.40
234567 9 uncont 556422 63686 8.74
234567 9 cont 534103 48164 11.09
1892345 12 uncont 4329507 461266 9.39
1892345 12 cont 4129607 232037 17.80
574839 15 uncont 1353752 280378 4.83
574839 15 cont 1280041 120365 10.63
1495832 25 uncont 3573377 2445960 1.46
1495832 25 cont 3297878 813761 4.05
934750 8 uncont 2110139 139404 15.14
934750 8 cont 2039388 125590 16.24
847293 22 uncont 2035949 1079900 1.89
847293 22 cont 1869919 336712 5.55
1639204 19 uncont 3835373 1514920 2.53
1639204 19 cont 3586082 449462 7.98
215678 14 uncont 525175 111952 4.69
215678 14 cont 497559 63256 7.87
1274835 11 uncont 2919757 267745 10.90
1274835 11 cont 2785568 149038 18.69
1346789 5 uncont 3017932 101764 29.66
1346789 5 cont 2925197 87132 33.57
1765432 17 uncont 4109449 1151830 3.57
1765432 17 cont 3863501 378147 10.22
263748 16 uncont 639525 161908 3.95
263748 16 cont 601670 94527 6.37
1498765 23 uncont 3562003 2104510 1.69
1498765 23 cont 3289848 637690 5.16
1728394 18 uncont 4028171 1384450 2.91
1728394 18 cont 3780239 420313 8.99
459283 20 uncont 1105581 468545 2.36
459283 20 cont 1026015 160164 6.41
1583642 7 uncont 3558443 158031 22.52
1583642 7 cont 3438494 120803 28.46
983472 13 uncont 2275354 321714 7.07
983472 13 cont 2164461 147541 14.67
1862345 24 uncont 4421699 2745370 1.61
1862345 24 cont 4086630 852620 4.79
712345 6 uncont 1618908 82454 19.63
712345 6 cont 1562475 63254 24.70
1456789 15 uncont 3370569 677594 4.97
1456789 15 cont 3189081 249107 12.80
298376 10 uncont 712394 76836 9.27
298376 10 cont 672761 61992 10.85
1872345 21 uncont 4410621 2187640 2.02
1872345 21 cont 4093784 628240 6.52

driver/multimarginloss_driver.hpp Outdated Show resolved Hide resolved
driver/multimarginloss_driver.hpp Outdated Show resolved Hide resolved
src/kernels/MIOpenMultiMarginLoss.cpp Outdated Show resolved Hide resolved
src/kernels/warp_shuffle.hpp Show resolved Hide resolved
const auto solvers = solver::SolverContainer<solver::multimarginloss::MultiMarginLossForward>{};

auto pair_size_vector = solvers.GetWorkspaceSizes(ctx, problem);
return pair_size_vector.empty() ? static_cast<size_t>(-1) : pair_size_vector.front().second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed that Get*WorkspaceSize may return size_t(-1) value in case of empty solvers.
I guess it's not quite a good solution because it may lead to a crash while allocating 2^64 bytes of GPU memory.

It's better either to return 0 or to throw an exception that no solution found.

It's just a notice and has to be fixed in a separated PR across all the other affected solvers like Sum or t5layernorm.
@seungmanhan

Copy link
Collaborator Author

@littlecutebird littlecutebird Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not return 0 because 0 means "we need 0 workspace size but it didn't mean this is a case of empty solvers". For example in my operation when reduction mode = None calling this function will return 0. Maybe throw an exception is a better solution and I agree that this need to be discussed more and fixed in a separated PR.

Comment on lines +185 to +187
auto reduce_out =
static_cast<Data_t>(static_cast<char*>(params.workspace) +
size * get_data_size(deref(params.oDesc).GetType()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially reduce_out may not be aligned properly (or at least efficiently).
MultiBufferWorkspaceTraits is a good option to use in this case.

Comment on lines +228 to +230
auto elem = problem.GetiDesc().GetLengths()[0];
return (elem + (elem + LOCAL_SIZE_REDUCE) / LOCAL_SIZE_REDUCE) *
get_data_size(problem.GetoDesc().GetType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it does not take proper alignment into account.
MultiBufferWorkspaceTraits is a good option to use in this case.

Comment on lines +49 to +54
if(problem.GetiDesc().GetLengths()[1] <= 30)
return true;
if((problem.GetiDesc().GetType() == miopenHalf ||
problem.GetiDesc().GetType() == miopenBFloat16) &&
problem.GetiDesc().IsContiguous() && problem.GetiDesc().GetLengths()[1] <= 40)
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for clarification.
Am I right that if .GetLengths()[1] <= 30 this kernel is always faster for any datatype and even for layouts which are not .IsContiguous(), but for (B)FP16 with .IsContiguous() layout it can be faster up to 40 elements?

Copy link
Collaborator Author

@littlecutebird littlecutebird Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, I have edited the condition when kernel is faster in PR description for better clarification. I will add more test cases for (B)FP16 with .GetLengths()[1] between 31 and 40 to the benchmark result in PR description to demonstrate that.

Copy link
Collaborator Author

@littlecutebird littlecutebird Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some test cases to fp16 and bfp16 dropdown. Please have a look at it. Should I or you click 'resolve conversation'?

Comment on lines 105 to 113
INSTANTIATE_TEST_SUITE_P(MultiMarginLossTestSet,
MultiMarginLossForwardTestFloat,
testing::ValuesIn(MultiMarginLossTestConfigs()));
INSTANTIATE_TEST_SUITE_P(MultiMarginLossTestSet,
MultiMarginLossForwardTestHalf,
testing::ValuesIn(MultiMarginLossFp16TestConfigs()));
INSTANTIATE_TEST_SUITE_P(MultiMarginLossTestSet,
MultiMarginLossForwardTestBFloat16,
testing::ValuesIn(MultiMarginLossTestConfigs()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask you to rename the tests according to the new naming scheme described on the wiki?
Also check this document and properly adjust test assertion. Some of EXCEPT_ assertions should be replaced with ASSERT_. Some of the assertion conditions should be properly used, for example EXCEPT_TRUE(a == b); should be replaced with EXCEPT_EQ(a, b);.

@seungmanhan @long10024070 @BuiChiTrung @et16kr could you also take into account this wiki page?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the test suite name format following the wiki page. However, a warning is raised: "Avoid using "_" in test suite name "GPU_SigmoidFocalLoss_fwd_FP32" according to Googletest FAQ". Can we update the naming scheme or just ignore this warning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BuiChiTrung I've also noticed that, but let's ignore this warning for now.
We've already started that renaming and have already done some work.
I'll update the scheme and rename the test a bit later, once we've got regexp checker for the test names.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it

margin,
reduction_mode);
if(ws_sizeInBytes == static_cast<size_t>(-1))
GTEST_SKIP() << "Call GetMultiMarginLossForwardWorkspaceSize failed!";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is FAIL().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway I'm not sure if it should be failed or skipped. I prefer skip because basically there are nothing wrong with the test input or data validation, it just does not have any solvers to handle that cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or probably we just removed the solver and got the tests passed.

@littlecutebird
Copy link
Collaborator Author

littlecutebird commented Jul 29, 2024

@CAHEK7 I closed this PR because this PR is created from a branch in a fork repo and I noticed that the CI/CD will not run full tests. I created a new PR in #3166 , please have a look. I have fixed all except MultiBufferWorkspaceTraits and __hip_ds_swizzlef_N problem.

UPDATED: I have added MultiBufferWorkspaceTraits, and another member has already implemented __hip_ds_swizzlef_N in #3146 (comment) but the performance is worse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants