Skip to content

Latest commit

 

History

History
714 lines (582 loc) · 55.1 KB

CHANGELOG.md

File metadata and controls

714 lines (582 loc) · 55.1 KB

Change Log

3.7.01 (2022-12-01)

Full Changelog

Bug Fixes:

  • Use CRS matrix sort, instead of Kokkos::sort on each row #1553
  • Change template type for StaticCrsGraph in BsrMatrix #1531
  • Remove listing of undefined TPL deps #1568
  • Fix using SpGEMM with nonstandard scalar type, with MKL enabled #1591
  • Move destroying dense vector descriptors out of cuSparse sptrsv handle #1590
  • Fix cuda_data_type_from to return CUDA_C_64F for Kokkos::complex<double> #1604
  • Disable compile-time check in cuda_data_type_from on supported scalar types for cuSPARSE #1605
  • Reduce register pressure in batched dense algorithms #1588

Implemented enhancements:

  • Use new cusparseSpSV TPL for SPTRSV when cuSPARSE is enabled with CUDA >= 11.3 #1574

3.7.00 (2022-08-18)

Full Changelog

Features:

Final Bsr algorithms implemented for multigrid:

  • Sparse: bsr transpose algorithm #1477
  • BSR block SpGEMM implementation #1099

Adding batched dense linear and non-linear system solvers:

  • Add batched GESV #1384
  • Newton solver: serial on device implementation of Newton's method #1479

Add sparse matrix conversion:

  • Add csc2csr #1342
  • csc2csr: update Kokkos_Numeric.hpp header inclusion #1449
  • sparse: Remove csc2csr copy #1375

New documentation in readthedocs

Fix issues with TPLs for mutlivector SPMV

  • Add cuSparse TPL files for CrsMatrix-multivector product #1427

Deprecations:

  • Add template params to forwarding calls in deprecated KokkosKernels::… #1441

Implemented enhancements:

  • SPILUK: Move host allocations to symbolic #1480
  • trsv: remove assumptions about entry order within rows #1463

Hierarchical BLAS algorithms, added and moved from batched:

  • Blas serial axpy and nrm2 #1460
  • Move Set/Scale unit test to KokkosBlas #1455
  • Move {Serial,Team,TeamVector} Set to KokkosBlas #1454
  • Move {Serial,Team,TeamVector}Scale to KokkosBlas #1448

Code base organization and clean-ups:

  • Common Utils: removing dependency on Sparse Utils in Common Utils #1436
  • Common cleanup #1431
  • Clean-up src: re-organizing the src directory #1398
  • Sparse utils namespace #1439

perf tests updates, fixes and clean-ups:

  • dot perf test: adding support for HIP and SYCL backend #1453
  • Add verbosity parameter to GMRES example. Turn off for testing. #1385
  • KokkosSparse_spiluk.cpp perf test: add int-int guards to cusparse codes #1369
  • perf_test/blas: Check ARMPL build version #1352
  • Clean-up batched block tridiag perf test #1343
  • Reduce lots of macro duplication in sparse unit tests #1340

Infrastructure changes: ETI and testing upgrades, minor fixes

  • sycl: re-enabling test now that dpcpp has made progress #1473
  • Only instantiate Kokkos's default Cuda mem space #1361
  • Sparse and CI updates #1411
  • Newer sparse tests were not following the new testing pattern #1356
  • Add ETI for D1 coloring #1401
  • Add ETI to SpAdd (symbolic and numeric) #1399
  • Reformat example/fenl files changed in 1382 #1464
  • Change Controls::getParameter error message from stdout to stderr #1416

Kokkos alignment: update our implementations to use newer Kokkos features

  • Arith traits integral nan #1438
  • Kokkos_ArithTraits: re-implementation using Kokkos Core #1406
  • Value-initialize result of MaxLoc reduction to avoid maybe uninitialized warning #1383
  • Remove volatile qualifiers in reducer join(), init(), and operator+= methods #1382

BLAS and batched algorithms updates

  • Update Batched GMRES #1392
  • GEMV: accumulate in float for scalar = bhalf_t #1360
  • Restore BLAS-1 MV paths for 1 column #1354

Sparse and Graph updates

  • Minor updates to cluster Gauss-Seidel #1372
  • Add unit test for BsrMatrix and BlockCrsMatrix spmv #1338
  • Refactor SPGEMM MKL Impl #1244
  • D1 coloring: remove unused but set variable #1403

half precision paper

  • Minor changes for half precision paper #1429
  • Add benchmarks for us-rse escience 2022 half precision paper #1422

Bug Fixes:

  • TPLs: adding CUBLAS in the list of dependencies #1482
  • Fix MKL build errors #1478
  • Fixup drop layout template param in rank-0 views #1476
  • BLAS: fixing test that access results before synching #1472
  • Fix D1 color ETI with both CudaSpace and UVM #1471
  • Fix arithtraits warning #1468
  • Fix build when double not instantiated #1467
  • Fix -Werror #1466
  • Fix GitHub CI failing on broken develop #1461
  • HIP: fix warning from ExecSpaceUtils and GEMV #1459
  • Removes a duplicate cuda_data_type_from when KOKKOS_HALF_T_IS_FLOAT #1456
  • Fix incorrect function call in KokkosBatched::TeamGEMV unit test #1444
  • Fix SYCL nightly test #1419
  • Fix issues with cuSparse TPL availability for BsrMatrix SpMV #1418
  • SpMV: fixing issues with unit-tests tolerance #1412
  • Address 1409 #1410
  • Fix colliding include guards (copy-paste mistake) #1408
  • src/sparse: Fix & check for fence post errors #1405
  • Bspgemm fixes #1396
  • Fix unused parameter warnings in GEMM test. #1381
  • Fixes code deprecation warnings. #1379
  • Fix sign-compare warning in SPMV perf test #1371
  • Minor MKL fixes #1365
  • perf_test/batched: Temporarily disable tests #1359
  • Fix nightly builds following promotion of the math functions in Kokkos #1339

3.6.01 (2022-05-23)

Full Changelog

Bug Fixes and Improvements:

  • Improve spiluk numeric phase to avoid race conditions and processing in chunks #1390
  • Improve sptrsv symbolic phase performance (level scheduling) #1380
  • Restore BLAS-1 MV paths for 1 column #1354
  • Fix check that view has const type #1370
  • Fix check that view has const type part 2 #1394

3.6.00 (2022-02-18)

Full Changelog

Features:

Batched Sparse Linear algebra

  • Kokkos Kernels is adding a new component to the library: batched sparse linear algebra.

  • Similarly to the current dense batched algorithms, the new algorithms are called from

  • the GPU and provide Team and TeamVector level of parallelism, SpMV also provides a Serial

  • call on GPU.

  • Add Batched CG and Batched GMRES #1155

  • Add Jacobi Batched preconditioner #1219

Bsr and Tensor core algorithm for sparse linear algebra

  • After introducing the BsrMatrix in release 3.5.0 new algorithms are now supporting this format.

  • For release 3.6.0 we are adding matrix-vector (matvec) multiplication and Gauss-Seidel as well as an

  • implementation of matvec that leverages tensor cores on Nvidia GPUs. More kernels are expected to

  • support the Bsr format in future releases.

  • Add Spmv for BsrMatrix #1255

  • Add BLAS to SpMV operations for BsrMatrix #1297

  • BSR format support in block Gauss-Seidel #1232

  • Experimental tensor-core SpMV for BsrMatrix #1090

Improved AMD math libraries support

  • rocBLAS and rocSPARSE TPLs are now officially supported, they can be enabled at configure time.

  • Initial kernels that can call rocBLAS are GEMV, GEMM, IAMAX and SCAL, while rocSPARSE can be

  • called for matrix-vector multiplication. Further support for TPL calls can be requested on slack

  • and by GitHub issues.

  • Tpl rocBLAS and rocSPARSE #1153

  • Add rocBLAS GEMV wrapper #1201

  • Add rocBLAS wrappers for GEMM, IAMAX, and SCAL #1230

  • SpMV: adding support for rocSPARSE TPL #1221

Additional new features

  • bhalf: Unit test Batched GEMM #1251
  • and demostrate GMRES example convergence with bhalf_t (kokkos#1300)
  • Stream interface: adding stream support in GEMV and GEMM #1131
  • Improve double buffering batched gemm performance #1217
  • Allow choosing coloring algorithm in multicolor GS #1199
  • Batched: Add armpl dgemm support #1256

Deprecations:

  • Deprecation warning: SpaceAccessibility move out of impl, see #1140 #1141

Backends and Archs Enhancements:

SYCL:

  • Full Blas support on SYCL #1270
  • Get sparse tests enabled and working for SYCL #1269
  • Changes to make graph run on SYCL #1268
  • Allow querying free/total memory for SYCL #1225
  • Use KOKKOS_IMPL_DO_NOT_USE_PRINTF instead of printf in kernels #1162

HIP:

  • Work around hipcc size_t/int division with remainder bug #1262

Other Improvements:

  • Replace std::abs with ArithTraits::abs #1312
  • Batched/dense: Add Gemm_DblBuf LayoutLeft operator #1299
  • KokkosKernels: adding variable that returns version as a single number #1295
  • Add KOKKOSKERNELS_FORCE_SIMD macro (Fix #1040) #1290
  • Rename KOKKOS_IF_{HOST,DEVICE} -> KOKKOS_IF_ON_{HOST,DEVICE} #1278
  • Algo::Level{2,3}::Blocked::mb() #1265
  • Batched: Use SerialOpt2 for 33 to 39 square matrices #1261
  • Prune extra dependencies #1241
  • Improve double buffering batched gemm perf for matrix sizes >64x64 #1239
  • Improve graph color perf test #1229
  • Add custom implementation for strcasecmp #1227
  • Replace restrict with KOKKOS_RESTRICT #1223
  • Replace array reductions in BLAS-1 MV reductions #1204
  • Update MIS-2 and aggregation #1143
  • perf_test/blas/blas3: Update SHAs for benchmarking #1139

Implemented enhancements BuildSystem

  • Bump ROCm version 4.2 -> 4.5 in nightly Jenkins CI build #1279
  • scripts/cm_test_all_sandia: Add A64FX ci checks #1276
  • github/workflows: Add osx CI #1254
  • Update SYCL compiler version in CI #1247
  • Do not set Kokkos variables when exporting CMake configuration #1236
  • Add nightly CI check for SYCL #1190
  • Update cmake minimum version to 3.16 #866

Incompatibilities:

  • Kokkos::Impl: removing a few more instances of throw_runtime_exception #1320
  • Remove Kokkos::Impl::throw_runtime_exception from Kokkos Kernels #1294
  • Remove unused memory space utility #1283
  • Clean up Kokkos header includes #1282
  • Remove private Kokkos header include (Cuda/Kokkos_Cuda_Half.hpp) #1281
  • Avoid using #ifdef KOKKOS_ACTIVE_EXECUTION_MEMORY_SPACE_* macro guards #1266
  • Rename enumerator Impl::Exec_{PTHREADS -> THREADS} #1253
  • Remove all references to the Kokkos QThreads backend #1238
  • Replace more occurences of Kokkos::Impl::is_view #1234
  • Do not use Kokkos::Impl::is_view #1214
  • Replace Kokkos::Impl::if_c -> std::conditional #1213

Bug Fixes:

  • Fix bug in spmv_mv_bsrmatrix() for Ampere GPU arch #1315
  • Fix std::abs calls for rocBLAS/rocSparse #1310
  • cast literal 0 to fragment scalar type #1307
  • Fix 1303: maintain correct #cols on A in twostage #1304
  • Add dimension checking to generic spmv interface #1301
  • Add missing barriers to TeamGMRES, fix vector len #1285
  • Examples: fixing some issues related to type checking #1267
  • Restrict BsrMatrix specialization for AMPERE and VOLTA to CUDA #1242
  • Fix compilation errors for multi-vectors in kk_print_1Dview() #1231
  • src/batched: Fixes #1224 #1226
  • Fix SpGEMM crashing on empty rows #1220
  • Fix issue #1212 #1218
  • example/gmres: Specify half_t namespace #1208
  • Check that ordinal types are signed #1188
  • Fixing a couple of small issue with tensor core spmv #1185
  • Fix #threads setting in pcg for OpenMP #1182
  • SpMV: fix catch all case to avoid compiler warnings #1179
  • using namespace should be scoped to prevent name clashes #1177
  • using namespace should be scoped to prevent name clashes, see issue #1170 #1171
  • Fix bug with mkl impl of spgemm #1167
  • Add missing $ to KOKKOS_HAS_TRILINOS in sparse_sptrsv_superlu check #1160
  • Small fixes to spgemm, and plug gaps in testing #1159
  • SpMV: mismatch in #ifdef check and kernel specialization #1151
  • Fix values dimension for block sparse matrices #1147

3.5.00 (2021-10-19)

Full Changelog

Features:

  • Batched serial SVD #1107
  • Batched: Add BatchedDblBufGemm #1095
  • feature/gemv rps test -- RAJAPerf Suite Version of the BLAS2 GEMV Test #1085
  • Add new bsrmatrix #1077
  • Adding Kokkos GMRES example #1028
  • Add fast two-level mode N GEMV (#926) #939
  • Batched: Add BatchedGemm interface #935
  • OpenMPTarget: adding ETI and CMake logic for OpenMPTarget backend #886

Implemented enhancements Algorithms and Archs:

  • Use float as accumulator for GEMV on half_t (Fix #1081) #1082
  • Supernodal SpTRSV: add option to use MAGMA TPL for TRTRI #1069
  • Updates for running GMRES example with half precision #1067
  • src/blas/impl: Explicitly cast to LHS type for ax #1073
  • Update BatchedGemm interface to match design proposal #1054
  • Move dot-based GEMM out of TPL CUBLAS #1050
  • Adding ArmPL option to spmv perf_test #1038
  • Add (right) preconditioning to GMRES #1078
  • Supernodal SpTRSV: perform TRMM only if TPL CuBLAS is enabled #1027
  • Supernodal SpTRSV: support SuperLU version < 5 #1012
  • perf_test/blas/blas3: Add dgemm armpl experiment #1005
  • Supernodal SpTRSV: run TRMM on device for setup #983
  • Merge pull request #951 from vqd8a/move_sort_ifpack2riluk #972
  • Point multicolor GS: faster handling of long/bulk rows #993
  • Make CRS sorting utils work with unmanaged #963
  • Add sort and make sure using host mirror on host memory in kspiluk_symbolic #951
  • GEMM: call GEMV instead in certain cases #948
  • SpAdd performance improvements, better perf test, fix mtx reader columns #930

Implemented enhancements BuildSystem:

  • Automate documentation generation #1116
  • Move the batched dense files to specific directories #1098
  • cmake: Update SUPERLU tpl option for Tribits #1066
  • cmake/Modules: Allow user to use MAGMA_DIR from env #1007
  • Supernodal SpTRSV: update TPLs requirements #997
  • cmake: Add MAGMA TPL support #982
  • Host only macro: adding macro to check for any device backend #940
  • Prevent redundant spmv kernel instantiations (reduce library size) #937
  • unit-test: refactor infrastructure to remove most *.cpp #906

Implemented enhancements Other:

  • Allow reading integer mtx files into floating-point matrices #1100
  • Warnings: remove -Wunused-parameter warnings in Kokkos Kernels #962
  • Clean up CrsMatrix raw pointer constructor #949
  • unit_test/batched: Remove *_half fns from gemm unit tests #943
  • Move sorting functionality out of Impl:: #932

Incompatibilities:

  • Deprecation warning: SpaceAccessibility move out of impl #1141
  • Rename CUDA_SAFE_CALL to KOKKOS_IMPL_CUDA_SAFE_CALL #1130
  • Workaround error with intel #1128
  • gmres: disable examples for builds with ibm/xl #1123
  • CrsMatrix: deprecate constructor without ncols input #1115
  • perf_test/blas/blas3: Disable simd verify for cuda/10.2.2 #1093
  • Replace impl/Kokkos_Timer.hpp includes with Kokkos_Timer.hpp #1074
  • Remove deprecated ViewAllocateWithoutInitializing #1058
  • src/sparse: spadd resolve deprecation warnings #1053
  • Give full namespace path for D2 coloring #999
  • Fix -Werror=deprecated errors with c++20 standard #964
  • Deprecation: a deprecated function is called in the SpADD perf_test #954

Enabled tests:

  • HIP: enabling all unit tests #968
  • Fix build and add CI coverage for LayoutLeft=OFF #965
  • Enable SYCL tests #927
  • Fixup HIP nightly builds #907

Fixed Bugs:

  • Fix SpGEMM for Nvidia Turing/Ampere #1118
  • Fix #1111: spmv tpl instantiations #1112
  • Fix C's numCols in spadd simplified interface #1102
  • Fix #1089 (failing batched UTV tests) #1096
  • Blas GEMM: fix early exit logic, see issue #1088 #1091
  • Fix #1048: handle mode C spmv correctly in serial/openmp #1084
  • src/batched: Fix multiple definitions of singleton #1072
  • Fix host accessing View in non-host space #1057
  • Fix559: Intel 18 has trouble with pointer in ternary expr #1042
  • Work around team size AUTO issue on kepler #1020
  • Supernodal SpTrsv: fix out-of-bound error #1019
  • Some fixes for MAGMA TPL and gesv #1008
  • Merge pull request #981 from Tech-XCorp/4005-winllvmbuild #984
  • This is a PR for 4005 vs2019build, which fixes a few things on Windows #981
  • Fix build for no-ETI build #977
  • Fix invalid mem accesses in new GEMV kernel #961
  • Kokkos_ArithTraits.hpp: Fix isInf and isNan with complex types #936

3.4.01 (2021-05-19)

Full Changelog

Fixed Bugs:

  • Windows: Fixes for Windows #981
  • Sycl: ArithTraits fixes for Sycl #959
  • Sparse: Added code to allow KokkosKernels coloring to accept partial colorings #938
  • Sparse: Include sorting within spiluk #972
  • Sparse: Fix CrsMatrix raw pointer constructor #971
  • Sparse: Fix spmv Serial beta==-1 code path #947

3.4.00 (2021-04-25)

Full Changelog

Features:

  • SYCL: adding ETI and CMake logic for SYCL backend #924

Implemented enhancements Algorithms and Archs:

  • Two-stage GS: add damping factors #921
  • Supernodal SpTRSV, improve symbolic performance #899
  • Add MKL SpMV wrapper #895
  • Serial code path for spmv #893

Implemented enhancements BuildSystem:

  • Cmake: Update ArmPL support #901
  • Cmake: Add ARMPL TPL support #880
  • IntelClang guarding __assume_aligned with !defined(clang) #878

Implemented enhancements Other:

  • Add static_assert/throw in batched eigendecomp #931
  • Workaround using new/delete in kernel code #925
  • Blas perf_test updates #892

Fixed bugs:

  • Fix ctor CrsMat mirror with CrsGraph mirror #918
  • Fix nrm1, removed cublas nrminf, improved blas tests #915
  • Fix and testing coverage mainly in graph coarsening #910
  • Fix KokkosSparse for nightly test failure #898
  • Fix view types across ternary operator #894
  • Make work_view_t typedef consistent #885
  • Fix supernodal SpTRSV build with serial+openmp+cuda #884
  • Construct SpGEMM C with correct ncols #883
  • Matrix Converter: fixing issue with deallocation after Kokkos::fininalize #882
  • Fix >1024 team size error in sort_crs_* #872
  • Fixing seg fault with empty matrix in kspiluk #871

3.3.01 (2021-01-18)

Full Changelog

Fixed Bugs:

  • With CuSparse enabled too many variants of SPMV were instantiated even if not requested. Up to 1GB executable size increase.

3.3.00 (2020-12-16)

Full Changelog

Implemented enhancements:

  • Add permanent RCM reordering interface, and a basic serial implementation #854
  • Half_t explicit conversions #849
  • Add batched gemm performance tests #838
  • Add HIP support to src and perf_test #828
  • Factor out coarsening #827
  • Allow enabling/disabling components at configuration time #823
  • HIP: CMake work on tests and ETI #820
  • HIP: KokkosBatched - hip specialization #812
  • Distance-2 maximal independent set #801
  • Use batched TRTRI & TRMM for Supernode-sptrsv setup #797
  • Initial support for half precision #794

Fixed bugs:

  • Fix issue with HIP and Kokkos_ArithTraits #844
  • HIP: fixing round of issues on AMD #840
  • Throw an exception if BLAS GESV is not enabled #837
  • Fixes -Werror for gcc with c++20 #836
  • Add fallback condition to use spmv_native when cuSPARSE does not work #834
  • Fix install testing refactor for inline builds #811
  • HIP: fix ArithTraits to support HIP backend #809
  • cuSPARSE 11: fix spgemm and spmv_struct_tunning compilation error #804

Incompatibilities:

  • Remove pre-3.0 deprecated code #825

3.2.01 (2020-11-17)

Full Changelog

Fixed bugs:

3.2.00 (2020-08-19)

Full Changelog

Implemented enhancements:

  • Add CudaUVMSpace specializations for cuBLAS IAMAX and SCAL #758
  • Add wiki examples #735
  • Support complex_float, complex_double in cuSPARSE SPMV wrapper #726
  • Add performance tests for trmm and trtri #711
  • SpAdd requires output values to be zero-initialized, but this shouldnt be needed #694
  • SpAdd doesnt merge entries correctly #685
  • cusparse SpMV merge algorithm #670
  • TPL support for SpMV #614
  • Add two BLAS/LAPACK calls needed by: Sptrsv supernode #552 #589
  • HashmapAccumulator has several unused members, misnamed parameters #508

Fixed bugs:

  • Nightly test failure: spgemm unit tests failing on White (Power8) #780
  • supernodal does not build with UVM enabled #633

3.1.01 (2020-05-04)

Full Changelog

** Fixed bugs:**

  • KokkosBatched QR PR breaking nightly tests #691

3.1.00 (2020-04-14)

Full Changelog

Implemented enhancements:

  • Two-stage & Classical Gauss-Seidel #672
  • Test transpose utilities #664
  • cuSPARSE spmv wrapper doesn't actually use 'mode' #650
  • Distance-2 improvements #625
  • FindMKL module: which mkl versions to prioritize #480
  • Add SuperLU as optional CMake TPL #545
  • Revamp the ETI system #460

Fixed bugs:

  • 2-stage GS update breaking cuda/10+rdc build #673
  • Why CrsMatrix::staticcrsgraph_type uses execution_space and not device_type? #665
  • TRMM and TRTRI build failures with clang/7+cuda9+Cuda_OpenMP and gcc/5.3+OpenMP #657
  • cuSPARSE spmv wrapper doesn't actually use 'mode' #650
  • Block Gauss-Seidel test fails when cuSPARSE is enabled #648
  • cuda uvm test failures without launch blocking - expected behavior? #636
  • graph_color_d2_symmetric_double_int_int_TestExecSpace seg faults in cuda/10.1 + Volta nightly test on kokkos-dev-2 #634
  • Build failures on kokkos-dev with clang/7.0.1 cuda/9.2 and blas/cublas/cusparse tpls #629
  • Distance-2 improvements #625
  • trsv - internal compiler error with intel/19 #607
  • complex_double misalignment still breaking SPGEMM #598
  • PortableNumericCHASH can't align shared memory #587
  • Remove all references to Kokkos::Impl::is_same #586
  • Can I run KokkosKernels spgemm with float or int32 type? #583
  • Kokkos Blas: gemv segfaults #443
  • Generated kokkos-kernels file names are too long and are crashing cloning Trilinos on Windows #395

3.0.00 (2020-01-27)

Full Changelog

Implemented enhancements:

  • BuildSystem: Standalone Modern CMake support #491
  • Cluster GS and SGS: add cluster gauss-seidel implementation #455
  • spiluk: Add sparse ILUK implementation #459
  • BLAS gemm: Dot-based GEMM Cuda optimization for C = betaC + alphaA^TB - [#490]kokkos#490)
  • Sorting utilities: #461
  • SGS: Support multiple rhs in SGS efficiently #488
  • BLAS trsm: Add support and interface for trsm #513
  • BLAS iamax: Implement iamax #87
  • BLAS gesv: #449
  • sptrsv supernodal: Add supernodal sparse triangular solver #552
  • sptrsv: Add cusparse tpl support for sparse triangular solve, cudagraphs to fallback #555
  • KokkosGraph: Output colors assigned during graph coloring #444
  • MatrixReader: Full matrix market support #466

Fixed bugs:

  • gemm: Fix bug for complex types in fallback impl #550
  • gemv: Fix degenerate matrix cases #514
  • spgemm: Fix cuda build with complex_double misaligned shared memory access #500
  • spgemm: Wrong team size heuristic used for SPGEMM when Kokkos deprecated=OFF #474
  • dot: Improve accuracy for float and complex_float #574
  • SpMV Struct: Fix bug with intel_17_0_1 #456
  • readmtx: Fix invalid read due to loop condition #453
  • spgemm: Fix hashmap accumulator bug yielding crashes and wrong results #402
  • KokkosGraph: Fix distance-1 graph coloring segfault #275
  • UniformMemoryPool: does not re-initialize chunks that are freed #530

2.9.00 (2019-06-24)

Full Changelog

Implemented enhancements:

  • KokkosBatched: Add specialization for float2, float4 and double4 #427
  • KokkosBatched: Reduce VectorLength (16 to 8) #432
  • KokkosBatched: Remove experimental name space for batched blas #371
  • Capability: Initial sparse triangular solve capability #435
  • Capability: Add support for MAGMA GESV TPL #409
  • cuBLAS: Add CudaUVMSpace specializations for GEMM #397

Fixed bugs:

  • Deprecated Code Fixes #411
  • BuildSystem: Compilation error on rzansel #401

2.8.00 (2019-02-05)

Full Changelog

Implemented enhancements:

  • Capability, Tests: C++14 Support and Testing #351
  • Capability: Batched getrs #332
  • More Kernel Labels for KokkosBlas #239
  • Name all parallel kernels and regions #124

Fixed bugs:

  • BLAS TPL: BLAS underscore mangling #369
  • BLAS TPL, Complex: Promotion 2.7.24 broke MV unit tests in Tpetra with complex types #360
  • GEMM: GEMM uses wrong function for computing shared memory allocation size #368
  • BuildSystem: BLAS TPL macro not properly enabled with MKL BLAS #347
  • BuildSystem: make clean - errors #353
  • Compiler Workaround: Internal compiler error in KokkosBatched::Experimental::TeamGemm #349
  • KokkosBlas: Some KokkosBlas kernels assume default execution space #14

2.7.24 (2018-11-04)

Full Changelog

Implemented enhancements:

  • Enhance test_all_sandia script to set scalar and ordinal types #315
  • Batched getri need #305
  • Deterministic Coloring #271
  • MKL - guard minor version for MKL v. 18 #268
  • TPL Support for all BLAS functions using CuBLAS #247
  • Add L1 variant to multithreaded Gauss-Seidel #240
  • Multithreaded Gauss-Seidel does not support damping #221
  • Guard 1-phase SpGEMM in Intel MKL #217
  • generate makefile with-spaces option #98
  • Add MKL version check #7

Fixed bugs:

  • Perf test failures w/ just CUDA enabled #257
  • Wrong signature for axpy blas functions #329
  • Failing unit tests with float - unit test error checking issue #322
  • cuda.graph_graph_color* COLORING_VBD test failures with cuda/9.2 + gcc/7.2 on White #317
  • KokkosBatched::Experimental::SIMD<T> does not build with T=complex<float> #316
  • simple test program fails using 3rdparty Eigen library #309
  • KokkosBlas::dot is broken for complex, due to incorrect assumptions about Fortran ABI #307
  • strides bug in kokkos tpl interface. #292
  • Failing spgemm unit test with MKL #289
  • Fix the block_pcg perf-test when offsets are size_t #287
  • spotcheck warnings from kokkos #284
  • Linking error in tpl things #282
  • Build failure with clang 3.9.0 #281
  • CMake modification for TPLs. #276
  • KokkosBatched warnings #259
  • KokkosBatched contraction length bug #258
  • Small error in KokkosBatched_Gemm_Serial_Imp.hpp with SerialGemm<Trans::Transpose,*,*> #147

2.7.00 (2018-05-24)

Full Changelog

Implemented enhancements:

  • Tests: add capability to build a unit test standalone #233
  • Make KokkosKernels work without KOKKOS_ENABLE_DEPRECATED_CODE #223
  • Replace KOKKOS_HAVE_* FLAGS with KOKKOS_ENABLE_* #219
  • Add team-based scal, mult, update, nrm2 #214
  • Add team based abs #209
  • Generated CPP files moving includes inside the ifdef's #199
  • Implement BlockCRS in Kokkoskernels #184
  • Spgemm hash promotion #171
  • Batched BLAS enhancement #170
  • Document & check CMAKE_CXX_USE_RESPONSE_FILE_FOR_OBJECTS=ON in CUDA build #148

Fixed bugs:

  • Update drivers in perf_tests/graph to use Kokkos::initialize() #200
  • unit tests failing/hanging on Volta #188
  • Inner TRSM: SIMD build error; manifests in Ifpack2 #183
  • d2_graph_color doesn't have a default coloring mechanism #168
  • Unit tests do not build with Serial backend #154

2.6.00 (2018-03-07)

Full Changelog

Implemented enhancements:

  • Spgemm hash promotion #171
  • Batched BLAS enhancement #170

Fixed bugs:

  • d2_graph_color doesn't have a default coloring mechanism #168
  • Build error when MKL TPL is enabled #135

2.5.00 (2017-12-15)

Full Changelog

Implemented enhancements:

  • KokkosBlas: Add GEMM interface #105
  • KokkosBlas: Add GEMM default Kernel #125
  • KokkosBlas: Add GEMV that wraps BLAS (and cuBLAS) #16
  • KokkosSparse: Make SPMV test not print GBs of output if something goes wrong. #111
  • KokkosSparse: ETI SpGEMM and Gauss Seidel and take it out of Experimental namespace #74
  • BuildSystem: Fix Makesystem to correctly build library after aborted install #104
  • BuildSystem: Add option ot generate_makefile.bash to define memoryspaces for instantiation #89
  • BuildSystem: generate makefile tpl option #66
  • BuildSystem: Add a simpler compilation script, README update etc #96

Fixed bugs:

  • Internal Compiler Error GCC in GEMM #129
  • Batched Team LU: bug for small team_size #110
  • Compiler BUG in IBM XL pragma unrolling #92
  • Fix Blas TPL enables build #77
  • Batched Gemm Failure #73
  • CUDA 7.5 (GCC 4.8.4) build errors #72
  • Cuda BLAS tests fail with UVM if CUDA_LAUNCH_BLOCKING=1 is not defined on Kepler #51
  • CrsMatrix: sumIntoValues and replaceValues incorrectly count the number of valid column indices. #11
  • findRelOffset test assumes UVM #32

0.10.03 (2017-09-11)

Implemented enhancements:

  • KokkosSparse: Fix unused variable warnings in spmv_impl_omp, spmv Test and graph color perf_test #63
  • KokkosBlas: dot: Add unit test #15
  • KokkosBlas: dot: Add special case for multivector * vector (or vector * multivector) #13
  • BuildSystem: Make KokkosKernels build independently of Trilinos #1
  • BuildSystem: Fix ETI System not to depend on Tpetra ETI #5
  • BuildSystem: Change CMake to work with new ETI system #19
  • BuildSystem: Fix TpetraKernels names to KokkosKernels #4
  • BuildSystem: Trilinos/KokkosKernels reports no ETI in almost any circumstance #29
  • General: Kokkos::ArithTraits<double>::nan() is very slow #35
  • General: Design and Define New UnitTest infrastructure #28
  • General: Move Tpetra::Details::OrdinalTraits to KokkosKernels #22
  • General: Rename files and NameSpace to KokkosKernels #12
  • General: PrepareStandalone: Get rid of Teuchos usage #2
  • General: Fix warning with char being either signed or unsigned in ArithTraits #60
  • Testing: Make all tests run with -Werror #68

Fixed bugs:

  • SPGEMM Test Fails for Cuda when compiled through Trilinos #49
  • Fix ArithTraits min for floating points #47
  • Pthread ETI error #25
  • Fix CMake Based ETI for Threads backend #46
  • KokkosKernels_ENABLE_EXPERIMENTAL causes build error #59
  • ArithTraits warnings in CUDA build #71
  • Graph coloring build warnings #3

* This Change Log was automatically generated by github_changelog_generator