-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[zstd][dict] Ensure that dictionary training functions are fully reen… #4078
Conversation
…trant The two main functions used for dictionary training using the COVER algorithm require initialization of a COVER_ctx_t where a call to qsort() is performed. The issue is that the standard C99 qsort() function doesn't offer a way to pass an extra parameter for the comparison function callback (e.g. a pointer to a context) and currently zstd relies on a *global* static variable to hold a pointer to a context needed to perform the sort operation. If a zstd library user invokes either ZDICT_trainFromBuffer_cover or ZDICT_optimizeTrainFromBuffer_cover from multiple threads, the global context may be overwritten before/during the call/execution to qsort() in the initialization of the COVER_ctx_t, thus yielding to crashes and other bad things (Tm) as reported on issue facebook#4045. Enters qsort_r(): it was designed to address precisely this situation, to quote from the documention [1]: "the comparison function does not need to use global variables to pass through arbitrary arguments, and is therefore reentrant and safe to use in threads." It is available with small variations for multiple OSes (GNU, BSD[2], Windows[3]), and the ISO C11 [4] standard features on annex B-21 qsort_s() as part of the <stdlib.h>. Let's hope that compilers eventually catch up with it. For now, we have to handle the small variations in function parameters for each platform. The current fix solves the problem by allowing each executing thread pass its own COVER_ctx_t instance to qsort_r(), removing the use of a global pointer and allowing the code to be reentrant. [1] https://man7.org/linux/man-pages/man3/qsort_r.3.html [2] https://man.freebsd.org/cgi/man.cgi?query=qsort_r [3] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170 [4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
I was curious, so I did a little research, focusing on FreeBSD. The version of FreeBSD that adds support for the POSIX standard Up to the FreeBSD 13 series Notably, FreeBSD was the first to introduce Also, I have not actually checked, but it is possible that NetBSD and OpenBSD do not have both illumos seems to have a GNU-compatible To make it portable without confusion and usable even in C99 (C89?), I think one option would be to implement it independently in zstd. |
@dearblue thanks a lot for the comments, I really appreciate it. I've validated the current patch with a VM running FreeBSD 14.1, since I believe new software (e.g. upcoming zstd) should be validated in new OSes first. I incorrectly assumed that all *BSD versions shared the same libc (note: I'm not a *BSD user or developer), but what you are reporting is that NetBSD + OpenBSD can have a diverging libc when compared to FreeBSD. While a a qsort() function can be easily copied & pasted to the current patch, validating it is a bit more tricky: I'm aware that some OS vendors (e.g. Apple) do ship heavily optimized system libraries in their products (e.g. iOS, OSX, etc). A replacement qsort() function would have to demonstrate similar or superior performance than what zstd currently uses (i.e. the default system libc on each OS/compiler) to avoid performance regressions. That validation effort would further delay landing this fix to a real SEGFAULT in zstd thanks to what appears to be a *BSD idiosincrasy. On top of that, I couldn't detect qsort() is a hotspot in a regular 'perf report'. If it was a real hotspot in zstd it would be worthwhile the effort in optimizing it further (e.g. using SIMD techniques exploiting AVX-512 as deployed on https://github.com/intel/x86-simd-sort). The current patch works fine on Linux, OSX and Windows (probably Android, I will test it soon), it seems that *BSD is the one giving us trouble at the moment. Question: since you mentioned that 'Apple brought qsort_r() from FreeBSD'. If I treat *BSD as APPLE would that work on the current stable versions of *BSD? |
Note that strict While it's allowed to use posix-specific symbols, it also means that some backup strategy must exist and be triggered automatically for other |
Aside VStudio compiler, a valid workflow is to build zstd with either gcc or clang on Windows. This minor change ensure that the dictionary fix will build fine on Windows with both gcc 13.3.0 and clang-11 on MSYS environment.
@Cyan4973 to clarify, you suggest that if we can't get something working on all versions of *BSD, we should provide some kind of 'fallback' (e.g. a version of qsort_r()) to allow it to build cleanly. Is that correct? |
int result = COVER_cmp(g_coverCtx, lp, rp); | ||
#if defined(_WIN32) && defined(_MSC_VER) | ||
static int WIN_CDECL COVER_strict_cmp(void* g_coverCtx, const void* lp, const void* rp) { | ||
#elif defined(__APPLE__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got test if treating *BSD as APPLE could do the trick...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't. Newer FreeBSD has the same syntax as GNU, but older has the same syntax as APPLE.
I will let go of FreeBSD for now and revisit the issue in a follow up patch.
lib/dictBuilder/cover.c
Outdated
static int WIN_CDECL COVER_strict_cmp(void* g_coverCtx, const void* lp, const void* rp) { | ||
#elif defined(__APPLE__) | ||
static int WIN_CDECL COVER_strict_cmp(void *g_coverCtx, const void *lp, const void *rp) { | ||
#elif defined(_GNU_SOURCE) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this won't work on Net/OpenBSDs and older FreeBSD.
#elif defined(_WIN32) && defined(_MSC_VER) | ||
qsort_s(ctx->suffix, ctx->suffixSize, sizeof(U32), | ||
(ctx->d <= 8 ? &COVER_strict_cmp8 : &COVER_strict_cmp), | ||
ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we will have to add some kind of fallback for C90.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. The fallback for no is the same code as is being shipped by zstd.
In a follow up patch we can try to provide a reentrant qsort_r() for platforms that don't have it.
Yes, |
@@ -21,8 +21,14 @@ | |||
/*-************************************* | |||
* Dependencies | |||
***************************************/ | |||
/* qsort_r is an extension. */ | |||
#if defined(__linux) || defined(__linux__) || defined(linux) || defined(__gnu_linux__) || defined(__CYGWIN__) || defined(__MSYS__) | |||
#define _GNU_SOURCE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only define it if not already defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
No. I actually checked, and |
This should enable building for C90 compilers.
There is a bot complaining about it, let's see if it will make it happy.
The script 'combine.sh' joins the whole zstd source code in a single file to allow easier integration of the zstd library in projects. The issue arrives that _GNU_SOURCE has to be defined before the inclusion of any standard header to activate usage of qsort_r(). This change will allow 'make all' to pass as it fixes the build for using the 'build_library_test.sh' script.
@Cyan4973 apparently all bots in the CI are green with the latest version of the patchset. I got two questions: b) Should I squash all the 5 commits in a single one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Adenilson !
This is great work, and I think it works as intended.
I've focused my comments on source code maintainability, given that the functionality itself is already in a good place.
*/ | ||
#if defined(_WIN32) && defined(_MSC_VER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way COVER_strict_cmp()
(and below COVER_strict_cmp8()
) are implemented "feels" unusual : they share the same (short) body but feature different signatures depending on target system, controlled through preprocessor branching.
Given that the function signature is a mandated parameter of the qsort_r()
function, and qsort_r()
(of course!) varies depending on target system, I don't have a better solution to propose. Rather, I just forward some minor comments :
- WIN_CDECL
is used even in non-Windows context. Granted, it's reduced to nothing in non-Windows environments, but seeing it listed for systems specifically tested as non Windows feels irrelevant
- _MSC_VER
and __APPLE__
environments seem to share the same signature, so maybe they could be merged (?)
Ultimately, this is a comment focused on "readability". The functionality itself looks good.
*/ | ||
g_coverCtx = ctx; | ||
#if defined(__OpenBSD__) | ||
#if defined(__APPLE__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qsort_r()
is invoked directly from within in COVER_ctx_init()
, resulting in some #if ... #elif ... #else ...
preprocessor logic in the function's body, because qsort_r()
is different depending on target system.
In a bid to lower preprocessor logic within COVER_ctx_init()
, I was wondering if this design could be simplified by an invocation to a single function signature, and this function would be implemented differently depending on target system.
Such a design requires a common signature across all systems, so there might be some limits there.
Yes, |
Github was refusing to accept a push of my local + rebased changes to remote, so I was forced to delete the remote branch to be able to push and as a result the merge request was purged. The new one is at: |
…trant
The two main functions used for dictionary training using the COVER algorithm require initialization of a COVER_ctx_t where a call to qsort() is performed.
The issue is that the standard C99 qsort() function doesn't offer a way to pass an extra parameter for the comparison function callback (e.g. a pointer to a context) and currently zstd relies on a global static variable to hold a pointer to a context needed to perform the sort operation.
If a zstd library user invokes either ZDICT_trainFromBuffer_cover or ZDICT_optimizeTrainFromBuffer_cover from multiple threads, the global context may be overwritten before/during the call/execution to qsort() in the initialization of the COVER_ctx_t, thus yielding to crashes and other bad things (Tm) as reported on issue #4045.
Enters qsort_r(): it was designed to address precisely this situation, to quote from the documention [1]: "the comparison function does not need to use global variables to pass through arbitrary arguments, and is therefore reentrant and safe to use in threads."
It is available with small variations for multiple OSes (GNU, BSD[2], Windows[3]), and the ISO C11 [4] standard features on annex B-21 qsort_s() as part of the <stdlib.h>. Let's hope that compilers eventually catch up with it.
For now, we have to handle the small variations in function parameters for each platform.
The current fix solves the problem by allowing each executing thread pass its own COVER_ctx_t instance to qsort_r(), removing the use of a global pointer and allowing the code to be reentrant.
[1] https://man7.org/linux/man-pages/man3/qsort_r.3.html
[2] https://man.freebsd.org/cgi/man.cgi?query=qsort_r
[3] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170
[4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf