WIP: Testing some `par_dispatch` stuff #1156

lroberts36 · 2024-08-21T05:35:22Z

PR Summary

This is just an attempt to see if some the template magic in #1142 could be written in a little different way. Basically just copies the ideas there but structures the code differently. Seems to be working both on cpu and on device.

All loops can be called with either sets of integers (as currently supported) or with sets of IndexRanges defining the loop bounds. This is enabled by LoopBoundTranslator.
All loop patterns should work for any rank of loop (e.g. except for LoopPatternTPTTR which requires at least rank 2)
Things are set up to try different types of work partitioning in hierarchical loops like TPTTR
If an unsupported loop pattern for a particular loop is requested, should automatically fail through to a pattern that does support that type of loop (generally FlatRange).
Adds ThreadVectorRange as an option for par_for_inner

PR Checklist

lroberts36 · 2024-08-22T22:30:42Z

Testing vectorization using this branch in Riot with gcc/9.4.0 on a skylake-platinum node with -O3 -march=skylake-avx512 I get that with the old version of par_dispatch the simd for loop is vectorized in 191 places and with the new version it vectorizes in 200 places. Not sure what to make of this difference. It is hard to determine more from the gcc vectorization report.

The run time difference for a 3D AMR test problem between these two branches is negligible, the old one wins by maybe a couple of percent but I think the run to run variance is larger than that.

lroberts36 added 9 commits August 20, 2024 14:16

extra stuff

01e7406

working for unit tests

3620a79

Add test for object being callable

6d82680

Switch to IndexRange based bounds

a04698d

revert test to original

f0dab9a

remove stuff

727da4c

format and lint

e0c13a6

Add fallbacks

1e992c8

fix constexpr issue on device

edef10d

lroberts36 mentioned this pull request Aug 21, 2024

Unify par_dispatch, par_for_outer & par_for_inner overloads #1142

Open

12 tasks

lroberts36 added 20 commits August 20, 2024 23:54

make public

6bb150a

cleanup

9b4d10b

fix possible future bug

df98b2a

genralize hierarchical loops

aafd266

small

59f615b

make meshblock::par* calls just directly call parthenon::par*

d3bdc34

cleanup some cuda complaints

a9c0db2

remove all tuple stuff

e9a8e0e

suppress warning

a73ccf4

switch to Kokkos array

064fd4d

fix weird cuda issues

d92a6d2

revert to std::array in indexer

8d22898

format and lint

d68d4d1

fix?

5e7f6a3

Remove some duplication

a701aea

clarify intention

64f0f13

format

524d6dc

cleanup

a67aa91

small

46af369

fix bug

78d81a2

lroberts36 added 2 commits August 22, 2024 13:58

Merge branch 'develop' into lroberts36/generalize-par-dispatch

1b76508

remove comment

955fcb7

Merge branch 'develop' into lroberts36/generalize-par-dispatch

8e9df23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Testing some `par_dispatch` stuff #1156

WIP: Testing some `par_dispatch` stuff #1156

lroberts36 commented Aug 21, 2024 •

edited

Loading

lroberts36 commented Aug 22, 2024 •

edited

Loading

WIP: Testing some par_dispatch stuff #1156

Are you sure you want to change the base?

WIP: Testing some par_dispatch stuff #1156

Conversation

lroberts36 commented Aug 21, 2024 • edited Loading

PR Summary

PR Checklist

lroberts36 commented Aug 22, 2024 • edited Loading

WIP: Testing some `par_dispatch` stuff #1156

WIP: Testing some `par_dispatch` stuff #1156

lroberts36 commented Aug 21, 2024 •

edited

Loading

lroberts36 commented Aug 22, 2024 •

edited

Loading