sse4.2: added the implementation for mm_cmpestra #295

masterchef2209 · 2020-05-16T09:30:38Z

sse4.2: added the implementation for mm_cmpestra

I have created this PR from a new branch as old branch's commit history became very messy.
The implementation is same as old branch.
I am trying to add tests in such a way that each block(case) of code is tested for 2 different values of imm8 and also for both size=8 and size=16 variants.
So for cmpestra there would be 14 cases as written in a comment in the tests file.
I have named the test cases functions to indicate which block of code they would be surely testing.
I have only implemented 4/14 for now, I will implement the rest after getting feedback.
I can also merge different cases in one big test function but I think it looks cleaner this way, if anything needs to be changed please inform me.
Thank You
PS: I am using generator code :)

mr-c · 2020-05-16T12:21:20Z

[7/94] gcc-10 -Itest/x86/40b6604@@simde-tests-x86-emul@sta -Itest/x86 -I../test/x86 -I. -I.. -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -std=c99 -g -march=native -DSIMDE_ENABLE_NATIVE_ALIASES -DSIMDE_NATIVE_ALIASES_TESTING -fPIC -fopenmp-simd -DSIMDE_ENABLE_OPENMP -Wno-psabi -DSIMDE_NO_NATIVE -MD -MQ 'test/x86/40b6604@@simde-tests-x86-emul@sta/sse4.2.c.o' -MF 'test/x86/40b6604@@simde-tests-x86-emul@sta/sse4.2.c.o.d' -o 'test/x86/40b6604@@simde-tests-x86-emul@sta/sse4.2.c.o' -c ../test/x86/sse4.2.c
../test/x86/sse4.2.c: In function ‘test_simde_mm_cmpestra_equalany_8’:
../test/x86/sse4.2.c:150:11: warning: implicit declaration of function ‘_mm_cmpestra_8_’; did you mean ‘_mm_cmpestra’? [-Wimplicit-function-declaration]
  150 |       r = _mm_cmpestra_8_(test_vec[i].a, test_vec[i].la, test_vec[i].b, test_vec[i].lb, 122);
      |           ^~~~~~~~~~~~~~~
      |           _mm_cmpestra
../test/x86/sse4.2.c: In function ‘test_simde_mm_cmpestra_equalany_16’:
../test/x86/sse4.2.c:231:11: warning: implicit declaration of function ‘_mm_cmpestra_16_’; did you mean ‘_mm_cmpestra’? [-Wimplicit-function-declaration]
  231 |       r = _mm_cmpestra_16_(test_vec[i].a, test_vec[i].la, test_vec[i].b, test_vec[i].lb, 123);
      |           ^~~~~~~~~~~~~~~~

https://github.com/nemequ/simde/pull/295/checks?check_run_id=680612596#step:7:10

simde/x86/sse4.2.h

mr-c · 2020-05-16T12:29:40Z

With regards to https://github.com/nemequ/simde/pull/295/checks?check_run_id=680612576#step:6:15 , was that supposed to be a negative test? @nemequ do you want negative (should fail to compile) tests?

nemequ · 2020-05-16T17:31:01Z

With regards to https://github.com/nemequ/simde/pull/295/checks?check_run_id=680612576#step:6:15 , was that supposed to be a negative test? @nemequ do you want negative (should fail to compile) tests?

It would be nice, but it would require some work to get it such tests to work, I don't know that it's worth the effort. I can't remember if µnit supports negative tests or not, but for this type of thing that wouldn't be enough anyways; you'd need support in the build system.

I think this was just a bug that CI caught.

nemequ

Looks good, I think you're on the right track here.

Just make sure you don't throw away your test case generator code accidentally! You might want to push it to a branch just in case.

Sincerely,
Someone who has accidentally trashed code on many, many occasions

simde/x86/sse4.2.h

nemequ · 2020-05-16T17:46:13Z

sse4.2: added the implementation for mm_cmpestra

I have created this PR from a new branch as old branch's commit history became very messy.
The implementation is same as old branch.

FWIW, you can force push to the branch to update the PR. The PR on GitHub will look a bit messy, but I really don't care about that. The history in the git repo will be nice and clean, and that's all I care about :)

If you don't know how to do that just let me know, I can walk you through it.

masterchef2209 · 2020-05-16T19:20:55Z

Looks good, I think you're on the right track here.

Just make sure you don't throw away your test case generator code accidentally! You might want to push it to a branch just in case.

Sincerely,
Someone who has accidentally trashed code on many, many occasions

Yes, I should push it in a branch, just in case.

masterchef2209 · 2020-05-16T19:22:28Z

sse4.2: added the implementation for mm_cmpestra
I have created this PR from a new branch as old branch's commit history became very messy.
The implementation is same as old branch.

FWIW, you can force push to the branch to update the PR. The PR on GitHub will look a bit messy, but I really don't care about that. The history in the git repo will be nice and clean, and that's all I care about :)

If you don't know how to do that just let me know, I can walk you through it.

I will remember that, thanks. I have been using force push for a while now.

masterchef2209 · 2020-05-16T19:25:29Z

I think I did some mistake while generating test or there is some bug in my code, so some test functions are failing, I will try to fix it now.

simde/x86/sse4.2.h

masterchef2209 · 2020-05-17T09:09:32Z

I think everything is fine now, couple of jobs are failing in travis ci, but I think it is not related to PR, I shall quickly implement all the tests.
This is the generater code I am using with some changes for each test : Link

Fixes #295

nemequ · 2020-05-17T18:05:38Z

The clang -Wsign-conversion diagnostic on the OpenMP pragma is interesting. I've filed LLVM bug #45959. To work around it in SIMDe we can use something like:

diff --git a/simde/simde-common.h b/simde/simde-common.h
index 1f40c3a..7247011 100644
--- a/simde/simde-common.h
+++ b/simde/simde-common.h
@@ -690,6 +690,7 @@ typedef SIMDE_FLOAT64_TYPE simde_float64;
 #    if defined(SIMDE_ARCH_AARCH64)
 #      define SIMDE_BUG_CLANG_45541
 #    endif
+#    define SIMDE_BUG_CLANG_45959
 #  endif
 #  if defined(HEDLEY_EMSCRIPTEN_VERSION)
 #    define SIMDE_BUG_EMSCRIPTEN_MISSING_IMPL /* Placeholder for (as yet) unfiled issues. */
diff --git a/simde/x86/sse4.2.h b/simde/x86/sse4.2.h
index 6299cd5..e3ceb7c 100644
--- a/simde/x86/sse4.2.h
+++ b/simde/x86/sse4.2.h
@@ -167,11 +167,16 @@ simde_mm_cmpestra_8_(simde__m128i a, int la, simde__m128i b, int lb, const int i
       int_res_1 = 0xff;
       for(int i = 0 ; i < upper_bound ; i++){
         int k = i;
+        HEDLEY_DIAGNOSTIC_PUSH
+        #if defined(SIMDE_BUG_CLANG_45959)
+          #pragma clang diagnostic ignored "-Wsign-conversion"
+        #endif
         SIMDE_VECTORIZE_REDUCTION(&:int_res_1)
         for(int j = 0 ; j < (upper_bound-i) ; j++){
           int_res_1 &=  (((bool_res_.i8[k] >> j) & 1 ) << i) ;
           k += 1;
         }
+        HEDLEY_DIAGNOSTIC_POP
       }
       break;
   }
@@ -272,11 +277,16 @@ simde_mm_cmpestra_16_(simde__m128i a, int la, simde__m128i b, int lb, const int
       int_res_1 = 0xffff;
       for(int i = 0 ; i < upper_bound ; i++){
         int k = i;
+        HEDLEY_DIAGNOSTIC_PUSH
+        #if defined(SIMDE_BUG_CLANG_45959)
+          #pragma clang diagnostic ignored "-Wsign-conversion"
+        #endif
         SIMDE_VECTORIZE_REDUCTION(&:int_res_1)
         for(int j = 0 ; j < (upper_bound-i) ; j++){
           int_res_1 &= (((bool_res_.i16[k] >> j) & 1) << i) ;
           k += 1;
         }
+        HEDLEY_DIAGNOSTIC_POP
       }
       break;
   }

I'm more concerned with the other errors on the native cases. I can reproduce them locally with CFLAGS='-sse4.2' CXXFLAGS='-sse4.2' cmake .. && make -j && ./run-tests /x86/sse4_2/mm_cmpestra. It looks like the native versions generate a different result for some tests.

Also, you should probably tweak your test case generation code to occasionally generate a 0 in the inputs. Hitting the end of the string is a common case, there should be tests for it.

masterchef2209 · 2020-05-17T18:09:09Z

Also, you should probably tweak your test case generation code to occasionally generate a 0 in the inputs. Hitting the end of the string is a common case, there should be tests for it.

I will include those cases.

masterchef2209 · 2020-05-17T18:12:21Z

It looks like the native versions generate a different result for some tests.

I am concerned with this, what can be the reason for this? Is there anything wrong with the implementation. I feel like the code does what is given in the IIG, but is there something that I am missing?

nemequ · 2020-05-17T22:17:17Z

This is why we have tests, and why it's so important that they be exhaustive. I didn't see anything wrong with the code on my previous reviews, either.

I am concerned with this, what can be the reason for this? Is there anything wrong with the implementation.

If the native tests don't match the emul tests, there is either a problem with the test cases or the implementation. SIMDe should always match what the instructions actually do, which may not be exactly what the IIG shows.

In the past, the IIG has been a somewhat dubious source. My understanding is that the latest version of the IIG is actually verified against Intel's implementation or internal specifications, so it should be accurate. The only real exception is when an 8-bit immediate arugment is outside the expected range.

I spent a bit of time investigating this. The first thing I did was figure out exactly which test was failing. For /x86/sse4_2/mm_cmpestra_equalordered_8, it's the 7th one. Decoding the flags, I get SIMDE_SIDD_SBYTE_OPS | SIMDE_SIDD_MASKED_NEGATIVE_POLARITY | SIMDE_SIDD_LEAST_SIGNIFICANT. So now I know which cases to take a closer look at.

Byte vs. word seemed okay. I verified that it was being dispatched to the correct function; you can just add a quick printf in the code, which is what I did. If you define SIMDE_NO_INLINE SIMDe functions won't be inlined and you should be able to set a break point in you debugger, but I usually don't resort to that for something as simple as this.

So, next was to look at the polarity. Taking a closer look, the problem jumped right out. The code looks like:

    if (polarity & 1) {
      if ((polarity >> 1) & 1) {

So, first we check if the least significant bit is set. If it is, we then right shift by one bit and see if the least significant bit is set (or, to put it another way, see if the second least siginificant bit is set).

But that's not right. Those two bits are used for the _OPS flags. We need the the 5th and 6th bits (4th and 5th if we're counting from 0). If we change the code to

    if (polarity) {
      if (polarity & SIMDE_SIDD_MASKED_POSITIVE_POLARITY) {

It works. We've already masked off the irrelevant bits when you set polarity at the beginning of the function, so all we need to do for the first check is see if polarity is non-zero. For the second test, we don't need to bother with a shift, or hard-code values. Per the IIG, we're only trying to see if imm8[5] is set. imm8[5] maps to _SIDD_MASKED_POSITIVE_POLARITY, so just test polarity & SIMDE_SIDD_MASKED_POSITIVE_POLARITY.

Once you've fixed this (in both the 8 and 16 bit versions), you'll need to regenerate the test cases. I strongly suggest that you always make sure you're generating test cases using the native implementations, and make sure that you enable the relevant ISA extension during development (in this case, add -msse4.2 to your CFLAGS and CXXFLAGS).

masterchef2209 · 2020-05-18T02:38:56Z

It works. We've already masked off the irrelevant bits when you set polarity at the beginning of the function, so all we need to do for the first check is see if polarity is non-zero. For the second test, we don't need to bother with a shift, or hard-code values. Per the IIG, we're only trying to see if imm8[5] is set. imm8[5] maps to _SIDD_MASKED_POSITIVE_POLARITY, so just test polarity & SIMDE_SIDD_MASKED_POSITIVE_POLARITY.

It seems so obvious now I see it, I should have been more rigorous in finding the bug. As I had already found one bug and couldn't spot more I was biased into thinking that the cause of error is something too complex. Clearly, I was wrong. Thank You for finding the bug.

Once you've fixed this (in both the 8 and 16 bit versions), you'll need to regenerate the test cases. I strongly suggest that you always make sure you're generating test cases using the native implementations, and make sure that you enable the relevant ISA extension during development (in this case, add -msse4.2 to your CFLAGS and CXXFLAGS).

I will do that and make sure to use output of native implementations for test cases.

nemequ · 2020-05-18T03:32:08Z

These things often seem obvious in hindsight; I've spent hours on bugs simpler than this. Often you just need a fresh set of eyes; another person is great, but usually just starting fresh the next day or even just stepping away for a few hours is enough.

masterchef2209 · 2020-05-19T10:19:07Z

Also, you should probably tweak your test case generation code to occasionally generate a 0 in the inputs. Hitting the end of the string is a common case, there should be tests for it.

I will include those cases.

I am confused about this again, isn't hitting the end of the string in mm_cmpestra is decided by la and lb, why we need to have 0 in the input?

nemequ · 2020-05-20T16:45:56Z

I am confused about this again, isn't hitting the end of the string in mm_cmpestra is decided by la and lb, why we need to have 0 in the input?

You're right, sorry. I was thinking there would be a case in the code for handling a null terminating string regardless of la/lb, but there isn't.

So I guess it's just a matter of fixing the polarity thing, then this can be merged.

masterchef2209 · 2020-05-20T22:09:24Z

I have spotted more errors, I realised that setting a bit of a number is different from assigning it. I will complete this with all tests soon. I have also made a mistake in limit of for loops, I am actually surprised by the number of errors I have found on looking carefully. I hope it is right after this.

masterchef2209 · 2020-05-20T22:09:59Z

simde/x86/sse4.2.h

+      break;
+    case SIMDE_SIDD_CMP_EQUAL_EACH:
+      for(int i = 0 ; i <= upper_bound ; i++){
+        //SIMDE_VECTORIZE_REDUCTION(|:int_res_1)


@nemequ what should I do here, if I have to use 2 operators inside this loop?

You can't have two reductions on the same variable.

That said, I don't see why you need this; int_res_1 is initialized to 0 (all bits unset), so you shouldn't need to unset the bits here. You only need to handle the case where the bits should be set but aren't.

nemequ · 2020-05-20T23:29:52Z

I have spotted more errors, I realised that setting a bit of a number is different from assigning it. I will complete this with all tests soon. I have also made a mistake in limit of for loops, I am actually surprised by the number of errors I have found on looking carefully. I hope it is right after this.

Regarding the ranges, this would be a good opportunity to improve the test cases to cover these cases. Maybe after you set a and b to random values you could do something like (untested, but hopefully you get the idea):

int matching_chars = munit_rand_int_range(0, sizeof(a.i8) / sizeof(a.i8[0]));
for (int j = 0 ; j < matching_chars ; j++) {
  b.i8[j] = a.i8[j];
}

That way we have tests where the vectors match, either partially or fully. If you want to increase the number of elements test_vec to increase coverage feel free; 8 is just an arbitrary choice that is sufficient for most functions… it's much better to have too many tests than too few.

simde/simde-common.h

masterchef2209 · 2020-05-21T19:34:14Z

I have a few doubts here. I have added a single test which is giving 0 as output on native but 1 on emul. But I am not able to figure out why. Most probably this part is the problem. I have made some changes in switch part too to match the IIG.

case SIMDE_SIDD_CMP_RANGES:
      for(int i = 0 ; i <= upper_bound ; i++){
        SIMDE_VECTORIZE_REDUCTION(|:int_res_1)
        for(int j = 0 ; j <= upper_bound ; j++){
          int_res_1 |= ((((bool_res_.i8[i] >> j) & 1) & ((bool_res_.i8[i] >> (j + 1)) & 1)) << i);
          j += 2;
        }
      }

Apart from this I feel there are some things which logically don't make sense to me from the intrinisics guide, for example how should we handle a_invalid and b_invalid. If we go by IIG it will be like what I have done in 8-bit version, but what I have done in 16-bit makes more sense.`

nemequ · 2020-05-22T04:34:15Z

I'm not seeing the error either. I even threw together a quick implementation from scratch, and I'm getting the same results, so either the IIG is wrong or we're both misreading it in the same way. I'll play around with this a bit more tomorrow and see if I can narrow down the issue.

In the meantime, you could work on the CRC functions instead. At least those are well-understood functions; they basically implement CRC-32C. Or, of course, the AVX2 functions should all be pretty straightforward.

mr-c · 2020-09-01T09:59:42Z

@masterchef2209 Can you refresh this PR ? Thanks!

masterchef2209 · 2020-09-01T11:11:36Z

I have added the commit of this branch on top of current master.
Thank You

test/x86/sse4.2.c

mr-c reviewed May 16, 2020

View reviewed changes

simde/x86/sse4.2.h Show resolved Hide resolved

masterchef2209 force-pushed the hidayat/may16_1 branch from 957045d to e8628a7 Compare May 16, 2020 17:40

nemequ reviewed May 16, 2020

View reviewed changes

simde/x86/sse4.2.h Show resolved Hide resolved

masterchef2209 force-pushed the hidayat/may16_1 branch from e8628a7 to 50eb9aa Compare May 16, 2020 17:46

nemequ mentioned this pull request May 17, 2020

sse4.2: first attempt at implementing mm_cmpestra #280

Closed

masterchef2209 commented May 17, 2020

View reviewed changes

simde/x86/sse4.2.h Outdated Show resolved Hide resolved

masterchef2209 force-pushed the hidayat/may16_1 branch 2 times, most recently from 1805269 to 60e10dc Compare May 17, 2020 08:18

masterchef2209 force-pushed the hidayat/may16_1 branch from 60e10dc to b73630c Compare May 17, 2020 10:53

nemequ pushed a commit that referenced this pull request May 17, 2020

sse4.2: added the implementation for mm_cmpestra

02d7f01

Fixes #295

masterchef2209 force-pushed the hidayat/may16_1 branch from b73630c to dd9be1d Compare May 20, 2020 22:06

masterchef2209 commented May 20, 2020

View reviewed changes

mr-c reviewed May 21, 2020

View reviewed changes

simde/simde-common.h Outdated Show resolved Hide resolved

masterchef2209 force-pushed the hidayat/may16_1 branch 2 times, most recently from b46c1c3 to 7ebfef7 Compare May 21, 2020 19:24

masterchef2209 force-pushed the hidayat/may16_1 branch from 7ebfef7 to 8d7f40b Compare September 1, 2020 11:05

mr-c reviewed Dec 10, 2020

View reviewed changes

test/x86/sse4.2.c Outdated Show resolved Hide resolved

mr-c force-pushed the hidayat/may16_1 branch 3 times, most recently from 2749ad5 to cdbf71e Compare December 10, 2020 17:10

jserv mentioned this pull request Oct 9, 2022

Implement SSE4.2 text processing intrinsics DLTcollab/sse2neon#308

Closed

mr-c force-pushed the hidayat/may16_1 branch from 62232fb to 9eef8d9 Compare January 17, 2023 07:45

masterchef2209 and others added 2 commits September 12, 2024 13:05

sse4.2: added the implementation for mm_cmpestra

e54152b

Adjust to new test style

ad69848

mr-c force-pushed the hidayat/may16_1 branch from 9eef8d9 to ad69848 Compare September 12, 2024 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sse4.2: added the implementation for mm_cmpestra #295

sse4.2: added the implementation for mm_cmpestra #295

masterchef2209 commented May 16, 2020 •

edited

Loading

mr-c commented May 16, 2020

mr-c commented May 16, 2020

nemequ commented May 16, 2020

nemequ left a comment

nemequ commented May 16, 2020

masterchef2209 commented May 16, 2020

masterchef2209 commented May 16, 2020

masterchef2209 commented May 16, 2020 •

edited

Loading

masterchef2209 commented May 17, 2020

nemequ commented May 17, 2020

masterchef2209 commented May 17, 2020 •

edited

Loading

masterchef2209 commented May 17, 2020

nemequ commented May 17, 2020

masterchef2209 commented May 18, 2020

nemequ commented May 18, 2020

masterchef2209 commented May 19, 2020 •

edited

Loading

nemequ commented May 20, 2020

masterchef2209 commented May 20, 2020 •

edited

Loading

masterchef2209 May 20, 2020 •

edited

Loading

nemequ May 20, 2020

nemequ commented May 20, 2020

masterchef2209 commented May 21, 2020 •

edited

Loading

nemequ commented May 22, 2020

mr-c commented Sep 1, 2020

masterchef2209 commented Sep 1, 2020

sse4.2: added the implementation for mm_cmpestra #295

Are you sure you want to change the base?

sse4.2: added the implementation for mm_cmpestra #295

Conversation

masterchef2209 commented May 16, 2020 • edited Loading

mr-c commented May 16, 2020

mr-c commented May 16, 2020

nemequ commented May 16, 2020

nemequ left a comment

Choose a reason for hiding this comment

nemequ commented May 16, 2020

masterchef2209 commented May 16, 2020

masterchef2209 commented May 16, 2020

masterchef2209 commented May 16, 2020 • edited Loading

masterchef2209 commented May 17, 2020

nemequ commented May 17, 2020

masterchef2209 commented May 17, 2020 • edited Loading

masterchef2209 commented May 17, 2020

nemequ commented May 17, 2020

masterchef2209 commented May 18, 2020

nemequ commented May 18, 2020

masterchef2209 commented May 19, 2020 • edited Loading

nemequ commented May 20, 2020

masterchef2209 commented May 20, 2020 • edited Loading

masterchef2209 May 20, 2020 • edited Loading

Choose a reason for hiding this comment

nemequ May 20, 2020

Choose a reason for hiding this comment

nemequ commented May 20, 2020

masterchef2209 commented May 21, 2020 • edited Loading

nemequ commented May 22, 2020

mr-c commented Sep 1, 2020

masterchef2209 commented Sep 1, 2020

masterchef2209 commented May 16, 2020 •

edited

Loading

masterchef2209 commented May 16, 2020 •

edited

Loading

masterchef2209 commented May 17, 2020 •

edited

Loading

masterchef2209 commented May 19, 2020 •

edited

Loading

masterchef2209 commented May 20, 2020 •

edited

Loading

masterchef2209 May 20, 2020 •

edited

Loading

masterchef2209 commented May 21, 2020 •

edited

Loading