Skip to content

Releases: ginkgo-project/ginkgo

Release 1.9.0

09 Dec 14:45
20cfd68
Compare
Choose a tag to compare

The Ginkgo team is proud to announce the new Ginkgo minor release 1.9.0.
This release brings new features such as:

  • Support for half precision (IEEE FP16). The type gko::half can now be selected in most instances as the value type
    of a matrix, solver, preconditioner, etc. If the selected backend supports FP16 as a native type, the native type is
    used within the kernels, otherwise an overhead might occur. The new behavior is enabled by default, but it can be
    turned off during configuration.
  • New implementations of the ILU and IC factorization for CUDA, HIP, OpenMP, and Reference backends. These are
    available in addition to the existing implementations based on the vendor libraries cuSPARSE and hipSPARSE.
  • New (S)SOR and Gauss-Seidel preconditioners.
  • Simplified distributed matrix assembly by exchanging local rows between neighboring processes.

And more!

If you face an issue, please first check our known issues page and the open issues list and if you do not
find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

  • For all platforms, CMake 3.16+
  • C++17 compliant compiler
  • Linux and macOS
    • GCC: 7.0+
    • clang: 5.0+
    • Intel compiler: 2019+
    • Apple Clang: 15.0 is tested. Earlier versions might also work.
    • NVHPC: 22.7+
    • Cray Compiler: 14.0.1+
    • CUDA module: CMake 3.18+, and CUDA 11.0+ or NVHPC 22.7+, Compute Capability 5.3+
    • HIP module: CMake 3.21+, and ROCm 4.5+
    • DPC++ module: Intel oneAPI 2023.1+ with oneMKL and oneDPL. Set the CXX compiler to dpcpp or icpx.
    • MPI: standard version 3.1+, ideally GPU Aware, for best performance
  • Windows
    • MinGW: GCC 7.0+
    • Microsoft Visual Studio: VS 2019+
    • CUDA module: CUDA 11.0+, Microsoft Visual Studio
    • OpenMP module: MinGW.

Version support changes

  • Ginkgo now requires a compiler with C++ 17 support #1603

Deprecations

  • The Executor::run overload taking in multiple functions without a name as first parameter has been deprecated #1667
  • The master branch has been deprecated in favor of a new branch named main #1739.

Summary of previous deprecations

  • The device_reset parameter of CUDA and HIP executors no longer has an effect, and its allocation_mode parameters have been deprecated in favor of the Allocator interface.
  • The CMake parameter GINKGO_BUILD_DPCPP has been deprecated in favor of GINKGO_BUILD_SYCL.
  • The gko::reorder::Rcm interface has been deprecated in favor of gko::experimental::reorder::Rcm based on Permutation.
  • The Permutation class' permute_mask functionality.
  • Multiple functions with typos (set_complex_subpsace(), range functions such as conj_operaton etc).
  • gko::lend() is not necessary anymore.
  • The classes RelativeResidualNorm and AbsoluteResidualNorm are deprecated in favor of ResidualNorm.
  • The class AmgxPgm is deprecated in favor of Pgm.
  • Default constructors for the CSR load_balance and automatical strategies
  • The PolymorphicObject's move-semantic copy_from variant
  • The templated SolverBase class.
  • The class MachineTopology is deprecated in favor of machine_topology.
  • Logger constructors and create functions with the executor parameter.
  • The virtual, protected, Dense functions compute_norm1_impl, add_scaled_impl, etc.
  • Logger events for solvers and criterion without the additional implicit_tau_sq parameter.
  • The global gko::solver::default_krylov_dim, use instead gko::solver::gmres_default_krylov_dim.
  • array::get_num_elems() has been renamed to get_size()
  • matrix_data::ensure_row_major_order() has been renamed to sort_row_major()
  • device_matrix_data::get_num_elems() has been renamed to get_num_stored_elements()
  • The CMake parameter GINKGO_COMPILER_FLAGS has been superseded by CMAKE_CXX_FLAGS, and GINKGO_CUDA_COMPILER_FLAGS has been superseded by CMAKE_CUDA_FLAGS
  • The std::initializer_list overloads of matrix create methods and constructors are deprecated in favor of explicit array parameters

Added features

  • Add Executor::get_description() for textual representation of the device #1615
  • Add row and column scaling functionality to the distributed matrix #1640
  • Add SolverProgress logger printing out or storing to disk the individual scalars (and vectors) of an iterative solver after each iteration #1620
  • Add new ortho_method parameter for GMRES, with classical Gram-Schmidt and classical Gram-Schmidt with re-orthogonalization options in addition to previously-available modified Gram-Schmidt #1646
  • Add file config support for Schwarz #1658
  • Add overload for Executor::run which accepts a name and a closure for the ReferenceExecutor as the first two arguments #1667
  • Add function to fill device_matrix_data with zeros #1683
  • Add (S)SOR and Gauss-Seidel preconditioner #1633, #1634
  • Add support for additive read_distributed for the distributed matrix #1650
  • Add Ginkgo's own ILU and IC implementation #1684
  • Add NVIDIA Ada architecture #1733
  • Add half precision support #1706, #1708, #1711, #1712, #1713, #1716, #1710, #1736

Improvements

  • Add workspace in residual norm check #1687, which reduces the alloc/free and corresponding overhead.
  • Add distributed VectorCache and use it as workspace in Schwarz #1688.
  • Add example to show the file config usage #1662
  • Improve compile time for batched solvers #1629
  • Reduce conflicting thrust symbols when linking with different thrust libraries by adding a custom thrust namespace #1730

Fixes

  • Fix using the same algorithm as the original triangular solver when creating the transposed of the solver #1641
  • Fix the inconsistent behavior on the zero diagonal value in scalar Jacobi #1642
  • Fix an issue related to GCR and non-default strides in the rhs vector #1656
  • Fix an issue related to triangular solvers with CUDA on Windows #1665
  • Fix an issue where non-conforming MatrixMarket files were parsed without an error #1628
  • Fix finding rocthrust if it's not installed paths included by default #1668
  • Fix an issue related to casting between vectors of different value types in the mixed-precision multigrid setup #1663
  • Fix some test failures with ROCm 6.x #1670
  • Fix a race condition in bicgstab #1676
  • Fix an issue with MGS GMRES for complex numbers #1678
  • Fix finding ROCm on recent ROCm version (5.0+) #1673
  • Fix a compiler error when using NVHPC with MPI enabled #1697
  • Fix build issues of OMP backend when using HIPCC as C++ compiler #1695
  • Fix build issues for Intel OneAPI 2025.0 #1718
  • Fix inconsistencies between declaration and definition of functions and classes/structs, which mainly fixes clang-cl #1725
  • Fix undefined symbols in shared library in msys2/clang #1724
  • Fix page fault issues when running on multiple Intel GPUs in parallel #1723
  • Fix data races in several OMP kernels #1743

Release 1.8.0

13 Jun 09:55
586b175
Compare
Choose a tag to compare

The Ginkgo team is proud to announce the new Ginkgo minor release 1.8.0. This
release brings new features such as:

  • A brand new file-based configuration for Ginkgo objects: you can now construct
    Ginkgo objects (solvers, preconditioners, ...) from a JSON configuration file.
    This simplifies interfacing to Ginkgo as well as exploring different settings
    to solve a problem.
  • Expand the batched feature set with: the Batched CSR Matrix format, batched CG
    solver, batched (Block-)Jacobi preconditioner, usage example and other
    features such as scaling,
  • New Distributed Multigrid and the PGM coarsening method,
  • New CUDA and HIP kernels for Reverse Cuthill McKee (RCM) reordering
  • Better Ginkgo and Kokkos interaction thanks to a mapping from simple Ginkgo
    types to native Kokkos types

and more!

If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

  • For all platforms, CMake 3.16+
  • C++14 compliant compiler
  • Linux and macOS
    • GCC: 5.5+
    • clang: 3.9+
    • Intel compiler: 2019+
    • Apple Clang: 14.0 is tested. Earlier versions might also work.
    • NVHPC: 22.7+
    • Cray Compiler: 14.0.1+
    • CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+
    • HIP module: CMake 3.21+, and ROCm 4.5+
    • DPC++ module: Intel oneAPI 2023.1+ with oneMKL and oneDPL. Set the CXX compiler to dpcpp or icpx.
    • MPI: standard version 3.1+, ideally GPU Aware, for best performance
  • Windows
    • MinGW: GCC 5.5+
    • Microsoft Visual Studio: VS 2019+
    • CUDA module: CUDA 10.1+, Microsoft Visual Studio
    • OpenMP module: MinGW.

Version support changes

  • The Ginkgo license header now uses the SPDX format. #1404
  • Ginkgo changes the oneapi support to 2023.1+ #1396
  • Ginkgo's HIP backend now requires CMake 3.21 #1334

Interface changes

  • The gko::dim single-parameter constructor is now explicit to avoid accidental conversion from integers #1474
  • The CMake option GINKGO_BUILD_HWLOC is now set to OFF by default, and if it is set to ON, then HWLOC is required to be available #1513.

Behavior changes

  • gko::write_raw now defaults to writing sparse output unless otherwise specified #1533
  • Ginkgo now adheres to the --prefix option for cmake --install, instead of overwriting it #1534

Deprecations

  • array::get_num_elems() has been renamed to get_size() #1400
  • matrix_data::ensure_row_major_order() has been renamed to sort_row_major() #1400
  • device_matrix_data::get_num_elems() has been renamed to get_num_stored_elements() #1400
  • The CMake parameter GINKGO_COMPILER_FLAGS has been superseded by CMAKE_CXX_FLAGS, and GINKGO_CUDA_COMPILER_FLAGS has been superseded by CMAKE_CUDA_FLAGS #1535
  • The std::initializer_list overloads of matrix create methods and constructors are deprecated in favor of explicit array parameters #1433

Summary of previous deprecations

  • The device_reset parameter of CUDA and HIP executors no longer has an effect, and its allocation_mode parameters have been deprecated in favor of the Allocator interface.
  • The CMake parameter GINKGO_BUILD_DPCPP has been deprecated in favor of GINKGO_BUILD_SYCL.
  • The gko::reorder::Rcm interface has been deprecated in favor of gko::experimental::reorder::Rcm based on Permutation.
  • The Permutation class' permute_mask functionality.
  • Multiple functions with typos (set_complex_subpsace(), range functions such as conj_operaton etc).
  • gko::lend() is not necessary anymore.
  • The classes RelativeResidualNorm and AbsoluteResidualNorm are deprecated in favor of ResidualNorm.
  • The class AmgxPgm is deprecated in favor of Pgm.
  • Default constructors for the CSR load_balance and automatical strategies
  • The PolymorphicObject's move-semantic copy_from variant
  • The templated SolverBase class.
  • The class MachineTopology is deprecated in favor of machine_topology.
  • Logger constructors and create functions with the executor parameter.
  • The virtual, protected, Dense functions compute_norm1_impl, add_scaled_impl, etc.
  • Logger events for solvers and criterion without the additional implicit_tau_sq parameter.
  • The global gko::solver::default_krylov_dim, use instead gko::solver::gmres_default_krylov_dim.

Added features

  • Add a batched CG solver #1598, #1609
  • Add a batched Jacobi (scalar/block) preconditioner, #1542, #1600
  • Add an example for batched iterative solver #1553
  • Add add_scaled_identity and scale_add for batch matrix formats. #1528
  • Add scaling for batch objects (matrix formats and multi-vectors). #1527
  • Add a batch::Csr matrix format class and core and support for batched spmv kernels on CUDA, HIP and SYCL. #1450
  • Add a script for comparing benchmark JSON outputs #1467
  • Add an example for reordered preconditioned linear solver #1465
  • Add single-value access functions load_value and store_value to array #1485
  • Add the BlockOperator format to represent block-matrices #1435
  • Add CUDA and HIP kernels for Reverse Cuthill McKee (RCM) reordering #1503
  • Add FileConfig #1389, #1392, #1395, #1479, #1480, #1607
  • Add Distributed Multigrid #1269 and coarsening method PGM #1403
  • Add a mapping from simple Ginkgo types to native Kokkos types #1358
  • Add a segmented array class #1545
  • Add a class for mapping between global and local indexing #1543

Improvements

  • Ginkgo installation now has separate Ginkgo_Runtime and Ginkgo_Development components for easier packaging #1502
  • The HIP backend now supports complex number operations for sparse matrices based on hipSPARSE #1538
  • The create functions are now documented explicitly instead of using the EnableCreateMethod mixin #1433
  • The solver benchmark now supports Ginkgo's binary format for right-hand side vector inputs #1584
  • The build system now uses native HIP support for CMake, which also provides support for ROCm 6.0 #1334
  • The Multigrid solver generated from distributed::Matrix will use a global scalar Jacobi smoother and a GMRES solver as coarse grid solver #1612

Fixes

  • Compilation with libc++ was fixed #1463
  • Fix the __cplusplus by _MSVC_LANG in MSVC #1496
  • Coo::read(const T&) and Csr::read(const T&) will no longer overwrite the locally stored arrays and instead copy directly into them #1476
  • Fix the interaction of ProfilerHook::create(_nested)_summary, executors and GPU timers, which lead to the summary not being printed #1509
  • Fix compilation in environments where CPATH contains the current working directory #1531
  • Fix read from matrix-market files with CR line endings #1557
  • Fix undefined behavior that shows up with libstdc++ debug builds #1176
  • Fix for CUDA 12.4 bug and METIS detection #1569
  • Fix the pkgconfig installation with DESTDIR [#1597](https://github.com/gink...
Read more

Release 1.7.0

10 Nov 18:53
49242ff
Compare
Choose a tag to compare

The Ginkgo team is proud to announce the new Ginkgo minor release 1.7.0. This release brings new features such as:

  • Complete GPU-resident sparse direct solvers feature set and interfaces,
  • Improved Cholesky factorization performance,
  • A new MC64 reordering,
  • Batched iterative solver support with the BiCGSTAB solver with batched Dense and ELL matrix types,
  • MPI support for the SYCL backend,
  • Improved ParILU(T)/ParIC(T) preconditioner convergence,
    and more!

If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

  • For all platforms, CMake 3.16+
  • C++14 compliant compiler
  • Linux and macOS
    • GCC: 5.5+
    • clang: 3.9+
    • Intel compiler: 2019+
    • Apple Clang: 14.0 is tested. Earlier versions might also work.
    • NVHPC: 22.7+
    • Cray Compiler: 14.0.1+
    • CUDA module: CMake 3.18+, and CUDA 10.1+ or NVHPC 22.7+
    • HIP module: ROCm 4.5+
    • DPC++ module: Intel oneAPI 2022.1+ with oneMKL and oneDPL. Set the CXX compiler to dpcpp or icpx.
    • MPI: standard version 3.1+, ideally GPU Aware, for best performance
  • Windows
    • MinGW: GCC 5.5+
    • Microsoft Visual Studio: VS 2019+
    • CUDA module: CUDA 10.1+, Microsoft Visual Studio
    • OpenMP module: MinGW.

Version support changes

  • CUDA 9.2 is no longer supported and 10.0 is untested #1382
  • Ginkgo now requires CMake version 3.16 (and 3.18 for CUDA) #1368

Interface changes

  • const Factory parameters can no longer be modified through with_* functions, as this breaks const-correctness #1336 #1439

New Deprecations

  • The device_reset parameter of CUDA and HIP executors no longer has an effect, and its allocation_mode parameters have been deprecated in favor of the Allocator interface. #1315
  • The CMake parameter GINKGO_BUILD_DPCPP has been deprecated in favor of GINKGO_BUILD_SYCL. #1350
  • The gko::reorder::Rcm interface has been deprecated in favor of gko::experimental::reorder::Rcm based on Permutation. #1418
  • The Permutation class' permute_mask functionality. #1415
  • Multiple functions with typos (set_complex_subpsace(), range functions such as conj_operaton etc). #1348

Summary of previous deprecations

  • gko::lend() is not necessary anymore.
  • The classes RelativeResidualNorm and AbsoluteResidualNorm are deprecated in favor of ResidualNorm.
  • The class AmgxPgm is deprecated in favor of Pgm.
  • Default constructors for the CSR load_balance and automatical strategies
  • The PolymorphicObject's move-semantic copy_from variant
  • The templated SolverBase class.
  • The class MachineTopology is deprecated in favor of machine_topology.
  • Logger constructors and create functions with the executor parameter.
  • The virtual, protected, Dense functions compute_norm1_impl, add_scaled_impl, etc.
  • Logger events for solvers and criterion without the additional implicit_tau_sq parameter.
  • The global gko::solver::default_krylov_dim, use instead gko::solver::gmres_default_krylov_dim.

Added features

  • Adds a batch::BatchLinOp class that forms a base class for batched linear operators such as batched matrix formats, solver and preconditioners #1379
  • Adds a batch::MultiVector class that enables operations such as dot, norm, scale on batched vectors #1371
  • Adds a batch::Dense matrix format that stores batched dense matrices and provides gemv operations for these dense matrices. #1413
  • Adds a batch::Ell matrix format that stores batched Ell matrices and provides spmv operations for these batched Ell matrices. #1416 #1437
  • Add a batch::Bicgstab solver (class, core, and reference kernels) that enables iterative solution of batched linear systems #1438.
  • Add device kernels (CUDA, HIP, and DPCPP) for batch::Bicgstab solver. #1443.
  • New MC64 reordering algorithm which optimizes the diagonal product or sum of a matrix by permuting the rows, and computes additional scaling factors for equilibriation #1120
  • New interface for (non-symmetric) permutation and scaled permutation of Dense and Csr matrices #1415
  • LU and Cholesky Factorizations can now be separated into their factors #1432
  • New symbolic LU factorization algorithm that is optimized for matrices with an almost-symmetric sparsity pattern #1445
  • Sorting kernels for SparsityCsr on all backends #1343
  • Allow passing pre-generated local solver as factory parameter for the distributed Schwarz preconditioner #1426
  • Add DPCPP kernels for Partition #1034, and CSR's check_diagonal_entries and add_scaled_identity functionality #1436
  • Adds a helper function to create a partition based on either local sizes, or local ranges #1227
  • Add function to compute arithmetic mean of dense and distributed vectors #1275
  • Adds icpx compiler supports #1350
  • All backends can be built simultaneously #1333
  • Emits a CMake warning in downstream projects that use different compilers than the installed Ginkgo #1372
  • Reordering algorithms in sparse_blas benchmark #1354
  • Benchmarks gained an -allocator parameter to specify device allocators #1385
  • Benchmarks gained an -input_matrix parameter that initializes the input JSON based on the filename #1387
  • Benchmark inputs can now be reordered as a preprocessing step #1408

Improvements

  • Significantly improve Cholesky factorization performance #1366
  • Improve parallel build performance #1378
  • Allow constrained parallel test execution using CTest resources #1373
  • Use arithmetic type more inside mixed precision ELL #1414
  • Most factory parameters of factory type no longer need to be constructed explicitly via .on(exec) #1336 #1439
  • Improve ParILU(T)/ParIC(T) convergence by using more appropriate atomic operations #1434

Fixes

  • Fix an over-allocation for OpenMP reductions #1369
  • Fix DPCPP's common-kernel reduction for empty input sizes #1362
  • Fix several typos in the API and documentation #1348
  • Fix inconsistent Threads between generations #1388
  • Fix benchmark median condition #1398
  • Fix HIP 5.6.0 compilation #1411
  • Fix missing destruction of rand_generator from cuda/hip #1417
  • Fix PAPI logger destruction order #1419
  • Fix TAU logger compilation #1422
  • Fix relative criterion to not iterate if the residual is already zero #1079
  • Fix memory_order invocations with C++20 changes #1402
  • Fix check_diagonal_entries_exist report correctly when only missing diagonal value in the last rows. #1440
  • Fix checking OpenMPI version in cross-compilation settings #1446
  • Fix false-positive deprecation warnings in Ginkgo, especially for the old Rcm (it doesn't emit deprecation warnings anymore as a result but is still considered deprecated) [#1444](https://github.com/gi...
Read more

Release 1.6.0

16 Jun 11:16
1f1ed46
Compare
Choose a tag to compare

The Ginkgo team is proud to announce the new Ginkgo minor release 1.6.0. This release brings new features such as:

  • Several building blocks for GPU-resident sparse direct solvers like symbolic
    and numerical LU and Cholesky factorization, ...,
  • A distributed Schwarz preconditioner,
  • New FGMRES and GCR solvers,
  • Distributed benchmarks for the SpMV operation, solvers, ...
  • Support for non-default streams in the CUDA and HIP backends,
  • Mixed precision support for the CSR SpMV,
  • A new profiling logger which integrates with NVTX, ROCTX, TAU and VTune to
    provide internal Ginkgo knowledge to most HPC profilers!

and much more.

If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

  • For all platforms, CMake 3.13+
  • C++14 compliant compiler
  • Linux and macOS
    • GCC: 5.5+
    • clang: 3.9+
    • Intel compiler: 2018+
    • Apple Clang: 14.0 is tested. Earlier versions might also work.
    • NVHPC: 22.7+
    • Cray Compiler: 14.0.1+
    • CUDA module: CUDA 9.2+ or NVHPC 22.7+
    • HIP module: ROCm 4.5+
    • DPC++ module: Intel OneAPI 2021.3+ with oneMKL and oneDPL. Set the CXX compiler to dpcpp.
  • Windows
    • MinGW: GCC 5.5+
    • Microsoft Visual Studio: VS 2019+
    • CUDA module: CUDA 9.2+, Microsoft Visual Studio
    • OpenMP module: MinGW.

Version Support Changes

  • ROCm 4.0+ -> 4.5+ after #1303
  • Removed Cygwin pipeline and support #1283

Interface Changes

  • Due to internal changes, ConcreteExecutor::run will now always throw if the corresponding module for the ConcreteExecutor is not build #1234
  • The constructor of experimental::distributed::Vector was changed to only accept local vectors as std::unique_ptr #1284
  • The default parameters for the solver::MultiGrid were improved. In particular, the smoother defaults to one iteration of Ir with Jacobi preconditioner, and the coarse grid solver uses the new direct solver with LU factorization. #1291 #1327
  • The iteration_complete event gained a more expressive overload with additional parameters, the old overloads were deprecated. #1288 #1327

Deprecations

  • Deprecated less expressive iteration_complete event. Users are advised to now implement the function void iteration_complete(const LinOp* solver, const LinOp* b, const LinOp* x, const size_type& it, const LinOp* r, const LinOp* tau, const LinOp* implicit_tau_sq, const array<stopping_status>* status, bool stopped) #1288

Added Features

  • A distributed Schwarz preconditioner. #1248
  • A GCR solver #1239
  • Flexible Gmres solver #1244
  • Enable Gmres solver for distributed matrices and vectors #1201
  • An example that uses Kokkos to assemble the system matrix #1216
  • A symbolic LU factorization allowing the gko::experimental::factorization::Lu and gko::experimental::solver::Direct classes to be used for matrices with non-symmetric sparsity pattern #1210
  • A numerical Cholesky factorization #1215
  • Symbolic factorizations in host-side operations are now wrapped in a host-side Operation to make their execution visible to loggers. This means that profiling loggers and benchmarks are no longer missing a separate entry for their runtime #1232
  • Symbolic factorization benchmark #1302
  • The ProfilerHook logger allows annotating the Ginkgo execution (apply, operations, ...) for profiling frameworks like NVTX, ROCTX and TAU. #1055
  • ProfilerHook::created_(nested_)summary allows the generation of a lightweight runtime profile over all Ginkgo functions written to a user-defined stream #1270 for both host and device timing functionality #1313
  • It is now possible to enable host buffers for MPI communications at runtime even if the compile option GINKGO_FORCE_GPU_AWARE_MPI is set. #1228
  • A stencil matrices generator (5-pt, 7-pt, 9-pt, and 27-pt) for benchmarks #1204
  • Distributed benchmarks (multi-vector blas, SpMV, solver) #1204
  • Benchmarks for CSR sorting and lookup #1219
  • A timer for MPI benchmarks that reports the longest time #1217
  • A timer_method=min|max|average|median flag for benchmark timing summary #1294
  • Support for non-default streams in CUDA and HIP executors #1236
  • METIS integration for nested dissection reordering #1296
  • SuiteSparse AMD integration for fillin-reducing reordering #1328
  • Csr mixed-precision SpMV support #1319
  • A with_loggers function for all Factory parameters #1337

Improvements

  • Improve naming of kernel operations for loggers #1277
  • Annotate solver iterations in ProfilerHook #1290
  • Allow using the profiler hooks and inline input strings in benchmarks #1342
  • Allow passing smart pointers in place of raw pointers to most matrix functions. This means that things like vec->compute_norm2(x.get()) or vec->compute_norm2(lend(x)) can be simplified to vec->compute_norm2(x) #1279 #1261
  • Catch overflows in prefix sum operations, which makes Ginkgo's operations much less likely to crash. This also improves the performance of the prefix sum kernel #1303
  • Make the installed GinkgoConfig.cmake file relocatable and follow more best practices #1325

Fixes

  • Fix OpenMPI version check #1200
  • Fix the mpi cxx type binding by c binding #1306
  • Fix runtime failures for one-sided MPI wrapper functions observed on some OpenMPI versions #1249
  • Disable thread pinning with GPU executors due to poor performance #1230
  • Fix hwloc version detection #1266
  • Fix PAPI detection in non-implicit include directories #1268
  • Fix PAPI support for newer PAPI versions: #1321
  • Fix pkg-config file generation for library paths outside prefix #1271
  • Fix various build failures with ROCm 5.4, CUDA 12 and OneAPI 6 #1214, #1235, #1251
  • Fix incorrect read for skew-symmetric MatrixMarket files with explicit diagonal entries #1272
  • Fix handling of missing diagonal entries in symbolic factorizations #1263
  • Fix segmentation fault in benchmark matrix construction #1299
  • Fix the stencil matrix creation for benchmarking #1305
  • Fix the additional residual check in IR #1307
  • Fix the cuSPARSE CSR SpMM issue on single strided vector when cuda >= 11.6 #1322 #1331
  • Fix Isai generation for large sparsity powers #1327
  • Fix Ginkgo compilation and test with NVHPC >= 22.7 #1331
  • Fix Ginkgo compilation of 32 bit binaries with MSVC #1349

Release 1.5.0

13 Nov 14:30
234594c
Compare
Choose a tag to compare

The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as:

  • MPI-based multi-node support for all matrix formats and most solvers;
  • full DPC++/SYCL support,
  • functionality and interface for GPU-resident sparse direct solvers,
  • an interface for wrapping solvers with scaling and reordering applied,
  • a new algebraic Multigrid solver/preconditioner,
  • improved mixed-precision support,
  • support for device matrix assembly,

and much more.

If you face an issue, please first check our known issues page and the open issues list and if you do not find a solution, feel free to open a new issue or ask a question using the github discussions.

Supported systems and requirements:

  • For all platforms, CMake 3.13+
  • C++14 compliant compiler
  • Linux and macOS
    • GCC: 5.5+
    • clang: 3.9+
    • Intel compiler: 2018+
    • Apple LLVM: 8.0+
    • NVHPC: 22.7+
    • Cray Compiler: 14.0.1+
    • CUDA module: CUDA 9.2+ or NVHPC 22.7+
    • HIP module: ROCm 4.0+
    • DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to dpcpp.
  • Windows
    • MinGW and Cygwin: GCC 5.5+
    • Microsoft Visual Studio: VS 2019
    • CUDA module: CUDA 9.2+, Microsoft Visual Studio
    • OpenMP module: MinGW or Cygwin.

Algorithm and important feature additions:

  • Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). (#676, #908, #909, #932, #951, #961, #971, #976, #985, #1007, #1030, #1054, #1100, #1148)
  • Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance (#896, #924, #928, #929, #933, #943, #960, #1057, #1110, #1142)
  • Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more (#957, #1058, #1072, #1082)
  • Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings (#1059)
  • Add a Multigrid solver and improve the aggregation based PGM coarsening scheme (#542, #913, #980, #982, #986)
  • Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels (#833, #910, #926)
  • Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface (#904, #973, #1044, #1117)
  • Add a device_matrix_data type for device-side matrix assembly (#886, #963, #965)
  • Add support for mixed real/complex BLAS operations (#864)
  • Add a FFT LinOp for all but DPC++/SYCL (#701)
  • Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP (#775)
  • Add CSR scaling (#848)
  • Add array::const_view and equivalent to create constant matrices from non-const data (#890)
  • Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows (#901)
  • Add mixed precision SparsityCsr SpMV support (#970)
  • Allow creating CSR submatrix including from (possibly discontinuous) index sets (#885, #964)
  • Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense (#942)

Deprecations and important changes:

  • Deprecate AmgxPgm in favor of the new Pgm name. (#1149).
  • Deprecate specialized residual norm classes in favor of a common ResidualNorm class (#1101)
  • Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) (#1031, #1052)
  • Bug fix: restrict gko::share to rvalue references (possible interface break) (#1020)
  • Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter num_rhs is now required when solving for more than one right-hand side, otherwise an exception is thrown (#1184).
  • Drop official support for old CUDA < 9.2 (#887)

Improved performance additions:

  • Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers (#1013, #1028)
  • Add HIP unsafe atomic option for AMD (#1091)
  • Prefer vendor implementations for Dense dot, conj_dot and norm2 when available (#967).
  • Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS (#809)

Fixes:

  • Fix various compilation warnings (#1076, #1183, #1189)
  • Fix issues with hwloc-related tests (#1074)
  • Fix include headers for GCC 12 (#1071)
  • Fix for simple-solver-logging example (#1066)
  • Fix for potential memory leak in Logger (#1056)
  • Fix logging of mixin classes (#1037)
  • Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones (#753)
  • Fix some matrix SpMV and conversion corner cases (#905, #978)
  • Fix uninitialized data (#958)
  • Fix CUDA version requirement for cusparseSpSM (#953)
  • Fix several issues within bash-script (#1016)
  • Fixes for NVHPC compiler support (#1194)

Other additions:

Read more

1.4.0 minor release

23 Aug 17:04
f811917
Compare
Choose a tag to compare

The Ginkgo team is proud to announce the new Ginkgo minor release 1.4.0. This
release brings most of the Ginkgo functionality to the Intel DPC++ ecosystem
which enables Intel-GPU and CPU execution. The only Ginkgo features which have
not been ported yet are some preconditioners.

Ginkgo's mixed-precision support is greatly enhanced thanks to:

  1. The new Accessor concept, which allows writing kernels featuring on-the-fly
    memory compression, among other features. The accessor can be used as
    header-only, see the accessor BLAS benchmarks repository as a usage example.
  2. All LinOps now transparently support mixed-precision execution. By default,
    this is done through a temporary copy which may have a performance impact but
    already allows mixed-precision research.

Native mixed-precision ELL kernels are implemented which do not see this cost.
The accessor is also leveraged in a new CB-GMRES solver which allows for
performance improvements by compressing the Krylov basis vectors. Many other
features have been added to Ginkgo, such as reordering support, a new IDR
solver, Incomplete Cholesky preconditioner, matrix assembly support (only CPU
for now), machine topology information, and more!

Supported systems and requirements:

  • For all platforms, cmake 3.13+
  • C++14 compliant compiler
  • Linux and MacOS
    • gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
    • clang: 3.9+
    • Intel compiler: 2018+
    • Apple LLVM: 8.0+
    • CUDA module: CUDA 9.0+
    • HIP module: ROCm 3.5+
    • DPC++ module: Intel OneAPI 2021.3. Set the CXX compiler to dpcpp.
  • Windows
    • MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
    • Microsoft Visual Studio: VS 2019
    • CUDA module: CUDA 9.0+, Microsoft Visual Studio
    • OpenMP module: MinGW or Cygwin.

Algorithm and important feature additions:

  • Add a new DPC++ Executor for SYCL execution and other base utilities
    #648, #661, #757, #832
  • Port matrix formats, solvers and related kernels to DPC++. For some kernels,
    also make use of a shared kernel implementation for all executors (except
    Reference). #710, #799, #779, #733, #844, #843, #789, #845, #849, #855, #856
  • Add accessors which allow multi-precision kernels, among other things.
    #643, #708
  • Add support for mixed precision operations through apply in all LinOps. #677
  • Add incomplete Cholesky factorizations and preconditioners as well as some
    improvements to ILU. #672, #837, #846
  • Add an AMGX implementation and kernels on all devices but DPC++.
    #528, #695, #860
  • Add a new mixed-precision capability solver, Compressed Basis GMRES
    (CB-GMRES). #693, #763
  • Add the IDR(s) solver. #620
  • Add a new fixed-size block CSR matrix format (for the Reference executor).
    #671, #730
  • Add native mixed-precision support to the ELL format. #717, #780
  • Add Reverse Cuthill-McKee reordering #500, #649
  • Add matrix assembly support on CPUs. #644
  • Extends ISAI from triangular to general and spd matrices. #690

Other additions:

  • Add the possibility to apply real matrices to complex vectors.
    #655, #658
  • Add functions to compute the absolute of a matrix format. #636
  • Add symmetric permutation and improve existing permutations.
    #684, #657, #663
  • Add a MachineTopology class with HWLOC support #554, #697
  • Add an implicit residual norm criterion. #702, #818, #850
  • Row-major accessor is generalized to more than 2 dimensions and a new
    "block column-major" accessor has been added. #707
  • Add an heat equation example. #698, #706
  • Add ccache support in CMake and CI. #725, #739
  • Allow tuning and benchmarking variables non intrusively. #692
  • Add triangular solver benchmark #664
  • Add benchmarks for BLAS operations #772, #829
  • Add support for different precisions and consistent index types in benchmarks.
    #675, #828
  • Add a Github bot system to facilitate development and PR management.
    #667, #674, #689, #853
  • Add Intel (DPC++) CI support and enable CI on HPC systems. #736, #751, #781
  • Add ssh debugging for Github Actions CI. #749
  • Add pipeline segmentation for better CI speed. #737

Changes:

  • Add a Scalar Jacobi specialization and kernels. #808, #834, #854
  • Add implicit residual log for solvers and benchmarks. #714
  • Change handling of the conjugate in the dense dot product. #755
  • Improved Dense stride handling. #774
  • Multiple improvements to the OpenMP kernels performance, including COO,
    an exclusive prefix sum, and more. #703, #765, #740
  • Allow specialization of submatrix and other dense creation functions in solvers. #718
  • Improved Identity constructor and treatment of rectangular matrices. #646
  • Allow CUDA/HIP executors to select allocation mode. #758
  • Check if executors share the same memory. #670
  • Improve test install and smoke testing support. #721
  • Update the JOSS paper citation and add publications in the documentation.
    #629, #724
  • Improve the version output. #806
  • Add some utilities for dim and span. #821
  • Improved solver and preconditioner benchmarks. #660
  • Improve benchmark timing and output. #669, #791, #801, #812

Fixes:

Read more

1.3.0 minor release

27 Aug 09:04
4678668
Compare
Choose a tag to compare

The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.3.0. This release brings CUDA 11 support, changes the default C++ standard to
be C++14 instead of C++11, adds a new Diagonal matrix format and capacity for
diagonal extraction, significantly improves the CMake configuration output
format, adds the Ginkgo paper which got accepted into the Journal of Open Source
Software (JOSS), and fixes multiple issues.

Supported systems and requirements:

  • For all platforms, cmake 3.9+
  • Linux and MacOS
    • gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
    • clang: 3.9+
    • Intel compiler: 2017+
    • Apple LLVM: 8.0+
    • CUDA module: CUDA 9.0+
    • HIP module: ROCm 2.8+
  • Windows
    • MinGW and Cygwin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
    • Microsoft Visual Studio: VS 2017 15.7+
    • CUDA module: CUDA 9.0+, Microsoft Visual Studio
    • OpenMP module: MinGW or Cygwin.

The current known issues can be found in the known issues page.

Additions

  • Add paper for Journal of Open Source Software (JOSS). #479
  • Add a DiagonalExtractable interface. #563
  • Add a new diagonal Matrix Format. #580
  • Add Cuda11 support. #603
  • Add information output after CMake configuration. #610
  • Add a new preconditioner export example. #595
  • Add a new cuda-memcheck CI job. #592

Changes

  • Use unified memory in CUDA debug builds. #621
  • Improve BENCHMARKING.md with more detailed info. #619
  • Use C++14 standard instead of C++11. #611
  • Update the Ampere sm information and CudaArchitectureSelector. #588

Fixes

  • Fix documentation warnings and errors. #624
  • Fix warnings for diagonal matrix format. #622
  • Fix criterion factory parameters in CUDA. #586
  • Fix the norm-type in the examples. #612
  • Fix the WAW race in OpenMP is_sorted_by_column_index. #617
  • Fix the example's exec_map by creating the executor only if requested. #602
  • Fix some CMake warnings. #614
  • Fix Windows building documentation. #601
  • Warn when CXX and CUDA host compiler do not match. #607
  • Fix reduce_add, prefix_sum, and doc-build. #593
  • Fix find_library(cublas) issue on machines installing multiple cuda. #591
  • Fix allocator in sellp read. #589
  • Fix the CAS with HIP and NVIDIA backends. #585

Deletions

  • Remove unused preconditioner parameter in LowerTrs. #587

1.2.0 release

07 Jul 14:15
b4be2be
Compare
Choose a tag to compare

The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.2.0. This release brings full HIP support to Ginkgo, new preconditioners
(ParILUT, ISAI), conversion between double and float for all LinOps, and many
more features and fixes.

Supported systems and requirements:

  • For all platforms, cmake 3.9+
  • Linux and MacOS
    • gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+
    • clang: 3.9+
    • Intel compiler: 2017+
    • Apple LLVM: 8.0+
    • CUDA module: CUDA 9.0+
    • HIP module: ROCm 2.8+
  • Windows
    • MinGW and CygWin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+
    • Microsoft Visual Studio: VS 2017 15.7+
    • CUDA module: CUDA 9.0+, Microsoft Visual Studio
    • OpenMP module: MinGW or CygWin.

The current known issues can be found in the known issues page.

Additions

Here are the main additions to the Ginkgo library. Other thematic additions are listed below.

  • Add full HIP support to Ginkgo #344, #357, #384, #373, #391, #396, #395, #393, #404, #439, #443, #567
  • Add a new ISAI preconditioner #489, #502, #512, #508, #520
  • Add support for ParILUT and ParICT factorization with ILU preconditioners #400
  • Add a new BiCG solver #438
  • Add a new permutation matrix format #352, #469
  • Add CSR SpGEMM support #386, #398, #418, #457
  • Add CSR SpGEAM support #556
  • Make all solvers and preconditioners transposable #535
  • Add CsrBuilder and CooBuilder for intrusive access to matrix arrays #437
  • Add a standard-compliant allocator based on the Executors #504
  • Support conversions for all LinOp between double and float #521
  • Add a new boolean to the CUDA and HIP executors to control DeviceReset (default off) #557
  • Add a relaxation factor to IR to represent Richardson Relaxation #574
  • Add two new stopping criteria, for relative (to norm(b)) and absolute residual norm #577

Example additions

  • Templatize all examples to simplify changing the precision #513
  • Add a new adaptive precision block-Jacobi example #507
  • Add a new IR example #522
  • Add a new Mixed Precision Iterative Refinement example #525
  • Add a new example on iterative trisolves in ILU preconditioning #526, #536, #550

Compilation and library changes

  • Auto-detect compilation settings based on environment #435, #537
  • Add SONAME to shared libraries #524
  • Add clang-cuda support #543

Other additions

  • Add sorting, searching and merging kernels for GPUs #403, #428, #417, #455
  • Add gko::as support for smart pointers #493
  • Add setters and getters for criterion factories #527
  • Add a new method to check whether a solver uses x as an initial guess #531
  • Add contribution guidelines #549

Fixes

Algorithms

  • Improve the classical CSR strategy's performance #401
  • Improve the CSR automatical strategy #407, #559
  • Memory, speed improvements to the ELL kernel #411
  • Multiple improvements and fixes to ParILU #419, #427, #429, #456, #544
  • Fix multiple issues with GMRES #481, #523, #575
  • Optimize OpenMP matrix conversions #505
  • Ensure the linearity of the ILU preconditioner #506
  • Fix IR's use of the advanced apply #522
  • Fix empty matrices conversions and add tests #560

Other core functionalities

  • Fix complex number support in our math header #410
  • Fix CUDA compatibility of the main ginkgo header #450
  • Fix isfinite issues #465
  • Fix the Array::view memory leak and the array/view copy/move #485
  • Fix typos preventing use of some interface functions #496
  • Fix the gko::dim to abide to the C++ standard #498
  • Simplify the executor copy interface #516
  • Optimize intermediate storage for Composition #540
  • Provide an initial guess for relevant Compositions #561
  • Better management of nullptr as criterion #562
  • Fix the norm calculations for complex support #564

CUDA and HIP specific

  • Use the return value of the atomic operations in our wrappers #405
  • Improve the portability of warp lane masks #422
  • Extract thread ID computation into a separate function #464
  • Reorder kernel parameters for consistency #474
  • Fix the use of pragma unroll in HIP #492

Other

  • Fix the Ginkgo CMake installation files #414, #553
  • Fix the Windows compilation #415
  • Always use demangled types in error messages #434, #486
  • Add CUDA header dependency to appropriate tests #452
  • Fix several sonarqube or compilation warnings #453, #463, #532, #569
  • Add shuffle tests #460
  • Fix MSVC C2398 error #490
  • Fix missing interface tests in test install #558

Tools and ecosystem

Benchmarks

Read more

Minor release v1.1.1

02 Dec 11:27
v1.1.1
08d2c52
Compare
Choose a tag to compare

This version of Ginkgo provides a few fixes in Ginkgo's core routines. The
supported systems and requirements are unchanged from version 1.1.0.

Fixes

  • Improve Ginkgo's installation and fix the test_install step (#406),
  • Fix some documentation issues (#406),
  • Fix multiple code issues reported by sonarqube (#406),
  • Update the git-cmake-format repository (#399),
  • Improve the global update header script (#390),
  • Fix broken bounds checks (#388),
  • Fix CSR strategies and improve performance (#379),
  • Fix a small typo in the stencil examples (#381),
  • Fix ELL error on small matrices (#375),
  • Fix SellP read function (#374),
  • Add factorization support in create_new_algorithm.sh (#371).

Ginkgo version 1.1.0

20 Oct 20:48
b9bec82
Compare
Choose a tag to compare

The Ginkgo team is proud to announce the new minor release of Ginkgo version
1.1.0. This release brings several performance improvements, adds Windows support,
adds support for factorizations inside Ginkgo and a new ILU preconditioner
based on ParILU algorithm, among other things. For detailed information, check the respective issue.

Supported systems and requirements:

  • For all platforms, cmake 3.9+
  • Linux and MacOS
    • gcc: 5.3+, 6.3+, 7.3+, 8.1+
    • clang: 3.9+
    • Intel compiler: 2017+
    • Apple LLVM: 8.0+
    • CUDA module: CUDA 9.0+
  • Windows
    • MinGW and CygWin: gcc 5.3+, 6.3+, 7.3+, 8.1+
    • Microsoft Visual Studio: VS 2017 15.7+
    • CUDA module: CUDA 9.0+, Microsoft Visual Studio
    • OpenMP module: MinGW or Cygwin.

The current known issues can be found in the known issues
page
.

Additions:

  • Upper and lower triangular solvers (#327, #336, #341, #342)
  • New factorization support in Ginkgo, and addition of the ParILU
    algorithm (#305, #315, #319, #324)
  • New ILU preconditioner (#348, #353)
  • Windows MinGW and Cygwin support (#347)
  • Windows Visual Studio support (#351)
  • New example showing how to use ParILU as a preconditioner (#358)
  • New example on using loggers for debugging (#360)
  • Add two new 9pt and 27pt stencil examples (#300, #306)
  • Allow benchmarking CuSPARSE spmv formats through Ginkgo's benchmarks (#303)
  • New benchmark for sparse matrix format conversions (#312#317)
  • Add conversions between CSR and Hybrid formats (#302, #310)
  • Support for sorting rows in the CSR format by column idices (#322)
  • Addition of a CUDA COO SpMM kernel for improved performance (#345)
  • Addition of a LinOp to handle perturbations of the form (identity + scalar *
    basis * projector) (#334)
  • New sparsity matrix representation format with Reference and OpenMP
    kernels (#349, #350)

Fixes:

  • Accelerate GMRES solver for CUDA executor (#363)
  • Fix BiCGSTAB solver convergence (#359)
  • Fix CGS logging by reporting the residual for every sub iteration (#328)
  • Fix CSR,Dense->Sellp conversion's memory access violation (#295)
  • Accelerate CSR->Ell,Hybrid conversions on CUDA (#313, #318)
  • Fixed slowdown of COO SpMV on OpenMP (#340)
  • Fix gcc 6.4.0 internal compiler error (#316)
  • Fix compilation issue on Apple clang++ 10 (#322)
  • Make Ginkgo able to compile on Intel 2017 and above (#337)
  • Make the benchmarks spmv/solver use the same matrix formats (#366)
  • Fix self-written isfinite function (#348)
  • Fix Jacobi issues shown by cuda-memcheck

Tools and ecosystem:

  • Multiple improvements to the CI system and tools (#296, #311, #365)
  • Multiple improvements to the Ginkgo containers (#328, #361)
  • Add sonarqube analysis to Ginkgo (#304, #308, #309)
  • Add clang-tidy and iwyu support to Ginkgo (#298)
  • Improve Ginkgo's support of xSDK M12 policy by adding the TPL_ arguments
    to CMake (#300)
  • Add support for the xSDK R7 policy (#325)
  • Fix examples in html documentation (#367)