Allow merging graphs with multiple clients in `FusionOptimizer` #1237

ricardoV94 · 2022-10-05T14:43:32Z

Right now, Elemwise composite graphs will be split at nodes that have multiple clients. This can be avoided by allowing the fusion rewriter to replace multiple outputs. One example where this is very common is in value_and_grad functions.

import aesara
import aesara.tensor as at

x = at.vector("x")
y = at.exp(x/5)
w = y + 1
z = y * 2

f = aesara.function([x], [z, w])
aesara.dprint(f)

Elemwise{Mul}[(0, 1)] [id A] 2
 |TensorConstant{(1,) of 2.0} [id B]
 |Elemwise{Composite{exp((i0 * i1))}} [id C] 0
   |TensorConstant{(1,) of 0.2} [id D]
   |x [id E]
Elemwise{add,no_inplace} [id F] 1
 |TensorConstant{(1,) of 1.0} [id G]
 |Elemwise{Composite{exp((i0 * i1))}} [id C] 0

f = aesara.function([x], [z])
aesara.dprint(f)

Elemwise{Composite{(i0 * exp((i1 * i2)))}} [id A] 0
 |TensorConstant{(1,) of 2.0} [id B]
 |TensorConstant{(1,) of 0.2} [id C]
 |x [id D]

Discussed with @aseyboldt and @aesara-devs/aesara

See rationale and proposed changes in #1237 (comment)

The text was updated successfully, but these errors were encountered:

ricardoV94 · 2022-10-05T15:24:10Z

@brandonwillard Do you have any pointers for non-local rewriters that replace multiple nodes? It sounds like the functionality for multiple outputs is already present in the scalar Composite, the only thing we need is to ramp up the FusionOptimizer

brandonwillard · 2022-10-05T15:49:45Z

@brandonwillard Do you have any pointers for non-local rewriters that replace multiple nodes? It sounds like the functionality for multiple outputs is already present in the scalar Composite, the only thing we need is to ramp up the FusionOptimizer

Some of the dict-based replacement code from the non-global rewriters might come in handy; otherwise, there shouldn't be anything special about the local vs. global approaches.

brandonwillard · 2022-10-18T17:29:46Z

Elemwise{Composite{(i0 * exp((i1 * i2)))}} [id A] 0
|TensorConstant{(1,) of 2.0} [id B]
|TensorConstant{(1,) of 0.2} [id C]
|x [id D]

We should probably look into these kinds of fusions that turn scalar constants into inputs. It's not clear to me that this is any better than—say—making i0 and i1 constants and removing those two inputs.

In other words, why don't we produce the following instead?

Elemwise{Composite{(2.0 * exp((0.2 * i0)))}} [id A] 0
 |x [id B]

brandonwillard · 2022-10-18T17:33:01Z

Also, can you clarify the kinds of graphs you think we should be producing in these multi-client cases?

ricardoV94 · 2022-10-18T20:08:32Z

In other words, why don't we produce the following instead?

I was thinking the same. I can only think of wanting to reuse the same c_code across different Applys, but that's not as relevant for Composites

ricardoV94 · 2022-10-19T08:06:45Z

Clarifying what is the idea behind allowing Composites with multiple clients. The following example:

import aesara
import aesara.tensor as at

x = at.vector("x")
y = at.exp(x/5)
w = y + 1
z = y * 2

f1 = aesara.function([x], [z, w])
aesara.dprint(f1)

Right now produces this graph:

Elemwise{Mul}[(0, 1)] [id A] 2
 |TensorConstant{(1,) of 2.0} [id B]
 |Elemwise{Composite{exp((i0 * i1))}} [id C] 0
   |TensorConstant{(1,) of 0.2} [id D]
   |x [id E]
Elemwise{add,no_inplace} [id F] 1
 |TensorConstant{(1,) of 1.0} [id G]
 |Elemwise{Composite{exp((i0 * i1))}} [id C] 0

After #1242, it produces this graph:

Elemwise{Composite{(i2 * exp((i0 * i1))), (i3 + exp((i0 * i1)))}}.0 [id A] 0
 |TensorConstant{(1,) of 0.2} [id B]
 |x [id C]
 |TensorConstant{(1,) of 2.0} [id D]
 |TensorConstant{(1,) of 1.0} [id E]
Elemwise{Composite{(i2 * exp((i0 * i1))), (i3 + exp((i0 * i1)))}}.1 [id A] 0

The expected savings come from having to iterate only once over the data vector x, vs having to iterate 3 times before, once over x and twice over exp(i0 * i1).

Another example is the following graph:

x = at.dvector("x")
mu = at.dvector("mu")
logp = (- ((x - mu) **2) / 2)
grad = at.grad(logp.sum(), x)
f2 = aesara.function([mu, x], [logp, grad])
aesara.dprint(f2)

Before:

Elemwise{Composite{(i0 * sqr(i1))}}[(0, 1)] [id A] 2
 |TensorConstant{(1,) of -0.5} [id B]
 |Elemwise{sub,no_inplace} [id C] 0
   |x [id D]
   |mu [id E]
Elemwise{neg,no_inplace} [id F] 1
 |Elemwise{sub,no_inplace} [id C] 0

After:

Elemwise{Composite{(-(i0 - i1)), (i2 * sqr((i0 - i1)))}}.1 [id A] 0
 |x [id B]
 |mu [id C]
 |TensorConstant{(1,) of -0.5} [id D]
Elemwise{Composite{(-(i0 - i1)), (i2 * sqr((i0 - i1)))}}.0 [id A] 0

Again we replace 3 loops by 1.

Regarding CSE, I don't think anything is changed. The code generated by the scalar Composite seems to be clever enough and reuse repeated computations in intermediate variables. Here is the C code of the Composite scalar in the first function:

// function 1
{
npy_float64 V13_tmp1;
// i0 * i1
V13_tmp1 = V11_i * V9_i;

npy_float64 V13_tmp2;
// exp(i0 * i1)
V13_tmp2 = exp((npy_float64)V13_tmp1);

// First output
// i2 * exp(i0 * i1)
V1_i = V7_i * V13_tmp2;

// Second output
// i3 + exp(i0 * i1)
V3_i = V5_i + V13_tmp2;
}

// function 2
{
npy_float64 V11_tmp1;
// i0 - i1
V11_tmp1 = V9_i - V7_i;

// First output
// - (i0 - i1)
V1_i = -V11_tmp1;

npy_float64 V11_tmp2;
// sqr(i0 - i1)
V11_tmp2 = V11_tmp1 * V11_tmp1;

// Second output
// i2 * sqr(i0 - i1)
V3_i = V5_i * V11_tmp2;
}

You can see that it avoids recomputing the same sub-expressions in both cases. It does store more intermediate variables than needed, but hopefully the compiler will be able to get rid of those.

I think the idea of inplacing scalar constants is worth investigating, but deserves a separate issue. It was already relevant for the old Composite graphs. Edit: I see it was already separated into #1270

brandonwillard · 2022-10-19T18:55:34Z

Do you have a comparison like #1242 (comment) for these new CSE-related changes to your approach?

ricardoV94 transferred this issue from aesara-devs/aeppl Oct 5, 2022

brandonwillard added enhancement New feature or request help wanted Extra attention is needed performance concern Op implementation Involves the implementation of an Op labels Oct 5, 2022

ricardoV94 changed the title ~~Allow multiple outputs in Composite~~ Allow multiple outputs in FusionOptimizer Oct 5, 2022

ricardoV94 self-assigned this Oct 5, 2022

ricardoV94 changed the title ~~Allow multiple outputs in FusionOptimizer~~ Allow merging graphs with multiple outputs in FusionOptimizer Oct 5, 2022

brandonwillard changed the title ~~Allow merging graphs with multiple outputs in FusionOptimizer~~ Allow merging graphs with multiple outputs in FusionOptimizer Oct 5, 2022

ricardoV94 changed the title ~~Allow merging graphs with multiple outputs in FusionOptimizer~~ Allow merging graphs with multiple clients in FusionOptimizer Oct 6, 2022

brandonwillard mentioned this issue Oct 18, 2022

How should we handle scalar constants in Elemwise fusions? #1270

Open

brandonwillard added the request discussion label Oct 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow merging graphs with multiple clients in `FusionOptimizer` #1237

Allow merging graphs with multiple clients in `FusionOptimizer` #1237

ricardoV94 commented Oct 5, 2022 •

edited

Loading

ricardoV94 commented Oct 5, 2022

brandonwillard commented Oct 5, 2022

brandonwillard commented Oct 18, 2022 •

edited

Loading

brandonwillard commented Oct 18, 2022

ricardoV94 commented Oct 18, 2022 •

edited

Loading

ricardoV94 commented Oct 19, 2022 •

edited

Loading

brandonwillard commented Oct 19, 2022 •

edited

Loading

Allow merging graphs with multiple clients in FusionOptimizer #1237

Allow merging graphs with multiple clients in FusionOptimizer #1237

Comments

ricardoV94 commented Oct 5, 2022 • edited Loading

ricardoV94 commented Oct 5, 2022

brandonwillard commented Oct 5, 2022

brandonwillard commented Oct 18, 2022 • edited Loading

brandonwillard commented Oct 18, 2022

ricardoV94 commented Oct 18, 2022 • edited Loading

ricardoV94 commented Oct 19, 2022 • edited Loading

brandonwillard commented Oct 19, 2022 • edited Loading

Allow merging graphs with multiple clients in `FusionOptimizer` #1237

Allow merging graphs with multiple clients in `FusionOptimizer` #1237

ricardoV94 commented Oct 5, 2022 •

edited

Loading

brandonwillard commented Oct 18, 2022 •

edited

Loading

ricardoV94 commented Oct 18, 2022 •

edited

Loading

ricardoV94 commented Oct 19, 2022 •

edited

Loading

brandonwillard commented Oct 19, 2022 •

edited

Loading