Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] GR1: Additional Vectorization Pass supporting more fusion potentials. #870

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sweetpellegrino
Copy link

@philipportner @pdamme
Here is my work of the new vectorization capabilities in DAPHNE reduced to the first changes, that includes more fusion potentials by increasing the number of compatible operations and situations (mainly Horizontal/Sibling Fusion).

For reproducing the unexpected behaviour of slower execution, in case we are using horz. fusion, you will find a python script named run_horz.py in the root directory of the repo. It allows for generating and measuring of performance of the current implementation with and without horz. fusion.

e.g. python3 run_horz.py --tool PAPI_STD --script ADD --verbose-output --explain --num-ops 10 --threads 1 --rows 30000 --cols 30000 --batchSize 0 --samples 2.

--tool allows for selection of measuring tool PAPI_STD, PAPI_L1, PAPI_MPLX, NOW (can be found and configured in shared.py). NOW allows for measuring with now() inside the DAPHNE script.

--script: two selection method ADD and ADD_SUM.

  • ADD: will generate N ops, in the form of v1 = X + 0.1. Where X is the shared input and 0.1 scalar value that is different for the N ops.
  • ADD_SUM: will also generate N add-Ops like ADD but also inserts for each addition a sum op: v1 = X + 0.1; s1 = sum(v1);

--num-ops: for specifiying the number of ops/pairs N.

--threads: how many threads should be used for a vectorized execution.

--rows and --cols specifying the size of the shared input matrix X.

--batchSize: number of rows per vectorized task. If 0 means normal behaviour of the MTWrapper (calculation based on 8mb)

--samples: number of executions for each of both settings (with and without horz. fusion)

--verbose-output: allows for printing out the stdout and stderr of each run of the DAPHNE executable

--explain: inserts a --explain=vectorized to the command for running DAPHNE.

The generated DAPHNE script, that will get executed, can be found in the CWD where the python script was executed; it is named _horz.py

Needed packages: numpy, tabulate, pandas (latest should work)

For DAPHNE itself, we introduced these following arguments:

--vec-type (GREEDY_1 or DAPHNE): for selection of the Vectorization strategy (DAPHNE is not tested)
--no-hf: to deactivate the horizontal fusion pass
--batchSize: to experiment with the task size of a vectorized execution

Let me know, if you need anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant