Releases: ridiculousfish/libdivide
libdivide-5.1
This is a maintenance release.
This release mainly fixes a C++ compilation failure by the upcoming GCC 15 compiler: #113
ChangeLog
- Simplify & clean up the AVR constant div test code by @adbancroft in #85
- Constant division templates by @adbancroft in #89
- Tester program: enable vector tests by @adbancroft in #92
- Fix GCC vector alignment and aliasing issues by @adbancroft in #93
- Implement 16-bit SSE2 & AVX2 vector division by @adbancroft in #94
- Fix compilation of primitive types by @ridiculousfish in #98
- Fix minor issue during porting(https://github.com/apache/incubator-nuttx) by @xiaoxiang781216 in #99
- Replace typeid(T).name() with type_tag::get_tag() by @xiaoxiang781216 in #100
- Try to fix the MSVC build by @ridiculousfish in #101
- Increase minimum CMake version to 3.5 by @qak in #106
- Add prefixes to CMake option names by @qak in #107
- Fix
LIBDIVIDE_VERSION
CMake variable by @qak in #108 - Include missing CTest module by @qak in #109
- Only build tests when libdivide is the main project by @qak in #110
- Fix a typo (division/divsion) in README.md by @musicinmybrain in #114
- Compile fix for divider::operator== by @masbug in #113
- Add a Constexpr zero-initializing constructor for divider by @sharkautarch in #115
New Contributors
- @xiaoxiang781216 made their first contribution in #99
- @qak made their first contribution in #106
- @musicinmybrain made their first contribution in #114
- @masbug made their first contribution in #113
- @sharkautarch made their first contribution in #115
Full Changelog: 5.0...v5.1
v5.0.0
- Reference code for narrowing division has been added.
- The C and C++ APIs have been extended to support 16-bit scalar integer division.
- Multiple enhancements to add support for 8-bit microcontrollers
- Compiles cleanly using avr-gcc, used by the Atmel AVR microcontroller family (popular on Arduino boards)
- Code base includes AtMega2560 test & bench marking programs
- Adds predefined macros to speed up division by 16-bit constants: division by a 16-bit constant is not optimized by avr-gcc on 8-bit systems.
- Compiles cleanly using avr-gcc, used by the Atmel AVR microcontroller family (popular on Arduino boards)
v4.0.0
- All SIMD types may now be used simultaneously, instead of selecting one at compile time. For example you may define all of
LIBDIVIDE_SSE2
,LIBDIVIDE_AVX2
, andLIBDIVIDE_AVX512
and use them simultaneously. - ARM NEON types are now supported. New functions take
uint32x4_t
,int32x4_t
,uint64x2_t
, andint64x2_t
. Note: while libdivide is tested on both ARM32 and AArch64, NEON intrinsics have only been tested on AArch64. - Breaking: To support multiple vector types, vector functions have been renamed according to their width (#52). Instead of
libdivide_u32_do_vector
, now uselibdivide_u32_do_vec128
for SSE2 or NEON,libdivide_u32_do_vec256
for AVX2, andlibdivide_u32_do_vec512
for AVX512. - On non-x86 CPUs, generating 64 bit dividers is now faster than before. Previously libdivide used
__uint128_t
when available; however libdivide's fallback code was shown to be several times faster so the__uint128_t
path has been removed. x86 and x86-64 CPUs are unaffected. - Certain code sourced from StackOverflow has been reimplemented; this code had an ambiguous license. All code in libdivide is now covered under the zlib or boost license (at your option).
- libdivide.h no longer requires C++11 or later. The minimum language standards are C99 or C++98.
libdivide-3.0
This release adds C++ support for all 32-bit and 64-bit integer types (#58). Unfortunately this code change required C++11 instead of C++98, hence the major version had to be increased (even though this is a small release). This version also improves libdivide's CMake build system which should make it easier to package libdivide.
libdivide-2.0
I am happy to announce the release of libdivide-2.0 🎉
Libdivide finally supports AVX2 and AVX512 vector division on x86 CPUs. Libdivide now also works with the clang-cl
compiler and the Intel C++ compiler on Windows. There have been many small incremental improvements which should provide minor speedups for many use cases.
Since libdivide is now nearly 10 years old and many features have been added over the years it has become necessary to remove some rarely used functionality. I have removed the unswitch functionality since it was a large amount of code that has never been used by anybody as far as I am aware of. So overall, even with the added support for AVX2 and AVX512, libdivide.h
now contains fewer lines of code than the previous release and compiles faster using both C and C++.
- BREAKING
- Removed unswitch functionality (#46)
- Renamed macro
LIBDIVIDE_USE_SSE2
toLIBDIVIDE_SSE2
- Renamed
divider::recover_divisor()
todivider::recover()
- BUG FIXES
- ENHANCEMENT
- TESTING
tester.cpp
: Convert to modern C++tester.cpp
: Add more test casesbenchmark_branchfreee.cpp
: Convert to modern C++benchmark.c
: Prevent compilers from optmizing too much
- BUILD
- Automatically detect SSE2/AVX2/AVX512
- DOCS
doc/C-API.md
: Add C API referencedoc/CPP-API.md
: Add C++ API referenceREADME.md
: Add vector division and performance tips sections
libdivide-1.1
This release fixes 2 non critical bugs and silences a few compiler warnings. The generation of libdivide divisors has been sped up for MSVC on x64 and for GCC/Clang on 64-bit CPU architectures other than x64. I have also done some general code clean ups, below is the compete changelog:
- BUG FIXES
- ENHANCEMENT
libdivide_128_div_64_to_64()
: optimize using_udiv128()
for MSVC 2019 or laterlibdivide_128_div_64_to_64()
: optimize using__uint128_t
for GCC/Clang on 64-bit CPU architectures- Add
LIBDIVIDE_VERSION
macro tolibdivide.h
- Clean up SSE2 code in
libdivide.h
- Increase runtime of test cases in
primes_benchmark.cpp
- BUILD
- Remove windows directory with legacy Visual Studio project files
- Move test programs to test directory
libdivide-1.0
I am happy to announce the 1.0 release of libdivide 🎉
A lot of effort has been spent to polish libdivide for the 1.0 release. It has also been tested extensively using a plethora of different compilers (GCC, Clang, MSVC, ICC, MinGW, Cygwin), OSes and CPU architectures (i386, x86-64, ARM, ARM64, PowerPC, PPC64) to ensure it passes all tests and compiles without warnings at a high warning level.
Have a look at the ChangeLog to see what's new.