[DRAFT] GR1: Additional Vectorization Pass supporting more fusion potentials. #870

sweetpellegrino · 2024-10-19T15:08:55Z

@philipportner @pdamme
Here is my work of the new vectorization capabilities in DAPHNE reduced to the first changes, that includes more fusion potentials by increasing the number of compatible operations and situations (mainly Horizontal/Sibling Fusion).

For reproducing the unexpected behaviour of slower execution, in case we are using horz. fusion, you will find a python script named run_horz.py in the root directory of the repo. It allows for generating and measuring of performance of the current implementation with and without horz. fusion.

e.g. python3 run_horz.py --tool PAPI_STD --script ADD --verbose-output --explain --num-ops 10 --threads 1 --rows 30000 --cols 30000 --batchSize 0 --samples 2.

--tool allows for selection of measuring tool PAPI_STD, PAPI_L1, PAPI_MPLX, NOW (can be found and configured in shared.py). NOW allows for measuring with now() inside the DAPHNE script.

--script: two selection method ADD and ADD_SUM.

ADD: will generate N ops, in the form of v1 = X + 0.1. Where X is the shared input and 0.1 scalar value that is different for the N ops.
ADD_SUM: will also generate N add-Ops like ADD but also inserts for each addition a sum op: v1 = X + 0.1; s1 = sum(v1);

--num-ops: for specifiying the number of ops/pairs N.

--threads: how many threads should be used for a vectorized execution.

--rows and --cols specifying the size of the shared input matrix X.

--batchSize: number of rows per vectorized task. If 0 means normal behaviour of the MTWrapper (calculation based on 8mb)

--samples: number of executions for each of both settings (with and without horz. fusion)

--verbose-output: allows for printing out the stdout and stderr of each run of the DAPHNE executable

--explain: inserts a --explain=vectorized to the command for running DAPHNE.

The generated DAPHNE script, that will get executed, can be found in the CWD where the python script was executed; it is named _horz.py

Needed packages: numpy, tabulate, pandas (latest should work)

For DAPHNE itself, we introduced these following arguments:

--vec-type (GREEDY_1 or DAPHNE): for selection of the Vectorization strategy (DAPHNE is not tested)
--no-hf: to deactivate the horizontal fusion pass
--batchSize: to experiment with the task size of a vectorized execution

Let me know, if you need anything else.

init

927d89c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] GR1: Additional Vectorization Pass supporting more fusion potentials. #870

[DRAFT] GR1: Additional Vectorization Pass supporting more fusion potentials. #870

sweetpellegrino commented Oct 19, 2024

[DRAFT] GR1: Additional Vectorization Pass supporting more fusion potentials. #870

Are you sure you want to change the base?

[DRAFT] GR1: Additional Vectorization Pass supporting more fusion potentials. #870

Conversation

sweetpellegrino commented Oct 19, 2024