-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVX512 and VPCLMULQDQ based CRC-32 and CRC-32C #90
base: main
Are you sure you want to change the base?
Conversation
|
Thank you, I was aware that this repository depends on something else, and now I know that it is https://github.com/awslabs/aws-c-common/. After cmake -DCMAKE_MODULE_PATH=/usr/local/lib/cmake /path/to/aws-checksums I got the CMake logic to work as apparently intended, and fixed the compilation failures in a0a8785. While working on this, I found the following:
The This compiler invocation indicates a problem with I think that it is clearer to define target attributes for those functions that make use of ISA extensions. The only limitation that I know of is that with GCC 4, you can only use the Intel intrinsic headers if you first enable the ISA extensions with
As far as I understand, I see that there are some branches in |
warning: ISO C99 does not support ‘_Alignas’ [-Wpedantic] etc.. warning: pointer targets in passing argument 1 of ‘load512’ differ in signedness [-Wpointer-sign] etc..
|
Sorry, I somehow overlooked those warnings. Fixed in c69772c.
Right, we would need that for the zlib I will try to refactor the ISA extension detection. I think that |
While refactoring the ISA extension detection in 7c6a87a, I found a bug in your existing implementation, in case the carry-less multiplication is not available: diff --git a/source/intel/crc_hw.c b/source/intel/crc_hw.c
--- a/source/intel/crc_hw.c
+++ b/source/intel/crc_hw.c
@@ -95,13 +95,13 @@ uint32_t aws_checksums_crc32c_hw(const uint8_t *input, int length, uint32_t prev
crc = (uint32_t)_mm_crc32_u8(crc, *input++);
}
- if (detected_sse42 && detected_clmul) {
+ if (0) {
return aws_checksums_crc32c_sse42(input, length, crc);
}
/* Spin through remaining (aligned) 8-byte chunks using the CRC32Q quad word instruction */
while (length >= (int)sizeof(slice_ptr_int_type)) {
- crc = (uint32_t)crc_intrin_fn(crc, *input);
+ crc = (uint32_t)crc_intrin_fn(crc, *(const slice_ptr_int_type*)input);
input += sizeof(slice_ptr_int_type);
length -= (int)sizeof(slice_ptr_int_type);
} The Note: my CPU detection depends on the following patch to awslabs/aws-c-common@6ebf3bc: diff --git a/source/arch/intel/cpuid.c b/source/arch/intel/cpuid.c
index 465fccd1..03c52993 100644
--- a/source/arch/intel/cpuid.c
+++ b/source/arch/intel/cpuid.c
@@ -88,9 +88,10 @@ static bool s_has_avx2(void) {
static bool s_has_avx512(void) {
uint32_t abcd[4];
- /* Check AVX512F:
- * CPUID.(EAX=07H, ECX=0H):EBX.AVX512[bit 16]==1 */
- uint32_t avx512_mask = (1 << 16);
+ /* Check AVX512 flags.
+ * CPUID.(EAX=07H, ECX=0H):EBX */
+ uint32_t avx512_mask = 1 << 16 /* AVX512F */ | 1 << 17 /* AVX512DQ */ |
+ 1U << 30 /* AVX512BW */ | 1U << 31 /* AVX512VL */;
aws_run_cpuid(7, 0, abcd);
if ((abcd[1] & avx512_mask) != avx512_mask) {
return false; It could be clearer to rename By the way, I see that |
One more change needed:
|
Thank you. My bad, I do not currently have access to AVX512 capable hardware. Fixed in b06f512. I also fixed the ISA extension trouble in 5095452. I do not yet have a solution for skipping the compilation of the AVX512 code on older compilers (clang 7 or earlier, GCC 10 or older). The simplest solution that I can come up with would be to merge the AVX512 code to The inaccurate |
Well, I guess that this is actually covered by the
I would like to hear your opinion on this. We can also ignore this and assume that the combination of checking for AVX512F and VPCLMULQDQ is adequate, that is, all processors that implement those will also implement AVX512DQ, AVX512BW, and AVX512VL. |
If https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512 can be trusted, I squashed all commits to 2e84a92. |
This implementation is based on crc32_refl_by16_vclmul_avx512 in https://github.com/intel/intel-ipsec-mb/ with some optimizations. Changes to CMakeLists.txt and source/intel/asm/crc32c_sse42_asm.c are based on awslabs#72. This also fixes a bug in aws_checksums_crc32c_hw() when 128-bit pclmul is not available. crc_intrin_fn was being invoked on bytes instead of 32-bit or 64-bit words. The aws-checksums-tests was extended to cover all SIMD implementations. Note: The availability of the Intel CRC-32C instructions is checked as part of testing AWS_CPU_FEATURE_SSE_4_2. Both ISA extensions were introduced in the Intel Nehalem microarchitecture. For compiling this, https://github.com/awslabs/aws-c-common must be installed and CMAKE_MODULE_PATH must point to it, e.g.: cmake -DCMAKE_MODULE_PATH=/usr/local/lib/cmake. The AWS_CPU_FEATURE_AVX512 currently only checks for AVX512F and not other features that this implementation depends on: AVX512VL, AVX512BW, AVX512DQ. According to https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512 there currently exist no CPUs that would support VPCLMULQDQ without supporting all those AVX512 features. The architecture target evex512 is something that was introduced as mandatory in GCC 14 and clang 18 as part of introducing the AVX10.1-512 target, which basically is a new name for a number of AVX512 features. Older compilers do not recognize this target, but they do emit EVEX encoded instructions.
I think that you should test your AVX512 detection logic by starting the Linux kernel with the |
Issue #, if available: #72
Description of changes: This implementation in
crc32_avx512.c
is based oncrc32_refl_by16_vclmul_avx512
in https://github.com/intel/intel-ipsec-mb/ with some optimizations.Some of the code is based on #72. Because this repository appears to depend on definitions in other repositories, I was not able to compile and test this locally. I merely tested that
gcc -O2 -std=c11 -c crc32_avx512.c
produces reasonable looking output.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.