The Ginkgo team is proud to announce the new Ginkgo minor release 1.9.0.
This release brings new features such as:
- Support for half precision (IEEE FP16). The type
gko::half
can now be selected in most instances as the value type
of a matrix, solver, preconditioner, etc. If the selected backend supports FP16 as a native type, the native type is
used within the kernels, otherwise an overhead might occur. The new behavior is enabled by default, but it can be
turned off during configuration. - New implementations of the ILU and IC factorization for CUDA, HIP, OpenMP, and Reference backends. These are
available in addition to the existing implementations based on the vendor libraries cuSPARSE and hipSPARSE. - New (S)SOR and Gauss-Seidel preconditioners.
- Simplified distributed matrix assembly by exchanging local rows between neighboring processes.
And more!
If you face an issue, please first check our known issues page and the open issues list and if you do not
find a solution, feel free to open a new issue or ask a question using the github discussions.
Supported systems and requirements:
- For all platforms, CMake 3.16+
- C++17 compliant compiler
- Linux and macOS
- GCC: 7.0+
- clang: 5.0+
- Intel compiler: 2019+
- Apple Clang: 15.0 is tested. Earlier versions might also work.
- NVHPC: 22.7+
- Cray Compiler: 14.0.1+
- CUDA module: CMake 3.18+, and CUDA 11.0+ or NVHPC 22.7+, Compute Capability 5.3+
- HIP module: CMake 3.21+, and ROCm 4.5+
- DPC++ module: Intel oneAPI 2023.1+ with oneMKL and oneDPL. Set the CXX compiler to
dpcpp
oricpx
. - MPI: standard version 3.1+, ideally GPU Aware, for best performance
- Windows
- MinGW: GCC 7.0+
- Microsoft Visual Studio: VS 2019+
- CUDA module: CUDA 11.0+, Microsoft Visual Studio
- OpenMP module: MinGW.
Version support changes
- Ginkgo now requires a compiler with C++ 17 support #1603
Deprecations
- The
Executor::run
overload taking in multiple functions without a name as first parameter has been deprecated #1667 - The
master
branch has been deprecated in favor of a new branch namedmain
#1739.
Summary of previous deprecations
- The
device_reset
parameter of CUDA and HIP executors no longer has an effect, and itsallocation_mode
parameters have been deprecated in favor of theAllocator
interface. - The CMake parameter
GINKGO_BUILD_DPCPP
has been deprecated in favor ofGINKGO_BUILD_SYCL
. - The
gko::reorder::Rcm
interface has been deprecated in favor ofgko::experimental::reorder::Rcm
based onPermutation
. - The Permutation class'
permute_mask
functionality. - Multiple functions with typos (
set_complex_subpsace()
, range functions such asconj_operaton
etc). gko::lend()
is not necessary anymore.- The classes
RelativeResidualNorm
andAbsoluteResidualNorm
are deprecated in favor ofResidualNorm
. - The class
AmgxPgm
is deprecated in favor ofPgm
. - Default constructors for the CSR
load_balance
andautomatical
strategies - The PolymorphicObject's move-semantic
copy_from
variant - The templated
SolverBase
class. - The class
MachineTopology
is deprecated in favor ofmachine_topology
. - Logger constructors and create functions with the
executor
parameter. - The virtual, protected, Dense functions
compute_norm1_impl
,add_scaled_impl
, etc. - Logger events for solvers and criterion without the additional
implicit_tau_sq
parameter. - The global
gko::solver::default_krylov_dim
, use insteadgko::solver::gmres_default_krylov_dim
. array::get_num_elems()
has been renamed toget_size()
matrix_data::ensure_row_major_order()
has been renamed tosort_row_major()
device_matrix_data::get_num_elems()
has been renamed toget_num_stored_elements()
- The CMake parameter
GINKGO_COMPILER_FLAGS
has been superseded byCMAKE_CXX_FLAGS
, andGINKGO_CUDA_COMPILER_FLAGS
has been superseded byCMAKE_CUDA_FLAGS
- The
std::initializer_list
overloads of matrixcreate
methods and constructors are deprecated in favor of explicitarray
parameters
Added features
- Add
Executor::get_description()
for textual representation of the device #1615 - Add row and column scaling functionality to the distributed matrix #1640
- Add
SolverProgress
logger printing out or storing to disk the individual scalars (and vectors) of an iterative solver after each iteration #1620 - Add new
ortho_method
parameter for GMRES, with classical Gram-Schmidt and classical Gram-Schmidt with re-orthogonalization options in addition to previously-available modified Gram-Schmidt #1646 - Add file config support for Schwarz #1658
- Add overload for
Executor::run
which accepts a name and a closure for the ReferenceExecutor as the first two arguments #1667 - Add function to fill
device_matrix_data
with zeros #1683 - Add (S)SOR and Gauss-Seidel preconditioner #1633, #1634
- Add support for additive
read_distributed
for the distributed matrix #1650 - Add Ginkgo's own ILU and IC implementation #1684
- Add NVIDIA Ada architecture #1733
- Add half precision support #1706, #1708, #1711, #1712, #1713, #1716, #1710, #1736
Improvements
- Add workspace in residual norm check #1687, which reduces the alloc/free and corresponding overhead.
- Add distributed
VectorCache
and use it as workspace inSchwarz
#1688. - Add example to show the file config usage #1662
- Improve compile time for batched solvers #1629
- Reduce conflicting thrust symbols when linking with different thrust libraries by adding a custom thrust namespace #1730
Fixes
- Fix using the same algorithm as the original triangular solver when creating the transposed of the solver #1641
- Fix the inconsistent behavior on the zero diagonal value in scalar Jacobi #1642
- Fix an issue related to GCR and non-default strides in the rhs vector #1656
- Fix an issue related to triangular solvers with CUDA on Windows #1665
- Fix an issue where non-conforming MatrixMarket files were parsed without an error #1628
- Fix finding rocthrust if it's not installed paths included by default #1668
- Fix an issue related to casting between vectors of different value types in the mixed-precision multigrid setup #1663
- Fix some test failures with ROCm 6.x #1670
- Fix a race condition in bicgstab #1676
- Fix an issue with MGS GMRES for complex numbers #1678
- Fix finding ROCm on recent ROCm version (5.0+) #1673
- Fix a compiler error when using NVHPC with MPI enabled #1697
- Fix build issues of OMP backend when using HIPCC as C++ compiler #1695
- Fix build issues for Intel OneAPI 2025.0 #1718
- Fix inconsistencies between declaration and definition of functions and classes/structs, which mainly fixes clang-cl #1725
- Fix undefined symbols in shared library in msys2/clang #1724
- Fix page fault issues when running on multiple Intel GPUs in parallel #1723
- Fix data races in several OMP kernels #1743