Improve Symbolic Cholesky performance #1758

upsj · 2024-12-19T20:44:57Z

This improves the symbolic Cholesky performance by preprocessing the matrix on the GPU with a Minimum Spanning Tree algorithm.

Example rgg_22 from SuiteSparse with METIS nested dissection on H100:

Before: 0.76 s
After: 0.5 s

The performance improvements are split between device-host transfer (transferring a spanning tree instead of the full matrix) and the elimination tree computation (operating on a sparser graph)

upsj added 10 commits December 18, 2024 15:55

add ani4 to symbolic cholesky test set

9768d64

fix sparsity pattern output

f581bd2

fix incorrect include

4161ff0

add Cholesky skeleton tree computation kernel

3c5d2db

add more precise atomic operations

1d9c18d

fix matrix symmetry

2b20fb1

fix reference MST algorithm

7f47085

add GPU MST algorithm for Cholesky preprocessing

0ad691f

add AMD support

d9c6fe5

add benchmark for MST-enhanced symbolic Cholesky

c8f3ca1

upsj requested a review from a team December 19, 2024 20:44

upsj self-assigned this Dec 19, 2024

upsj added the 1:ST:ready-for-review This PR is ready for review label Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Symbolic Cholesky performance #1758

Improve Symbolic Cholesky performance #1758

upsj commented Dec 19, 2024

Improve Symbolic Cholesky performance #1758

Are you sure you want to change the base?

Improve Symbolic Cholesky performance #1758

Conversation

upsj commented Dec 19, 2024