Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow merging graphs with multiple clients in FusionOptimizer #1237

Open
ricardoV94 opened this issue Oct 5, 2022 · 7 comments · May be fixed by #1242
Open

Allow merging graphs with multiple clients in FusionOptimizer #1237

ricardoV94 opened this issue Oct 5, 2022 · 7 comments · May be fixed by #1242
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed Op implementation Involves the implementation of an Op performance concern request discussion

Comments

@ricardoV94
Copy link
Contributor

ricardoV94 commented Oct 5, 2022

Right now, Elemwise composite graphs will be split at nodes that have multiple clients. This can be avoided by allowing the fusion rewriter to replace multiple outputs. One example where this is very common is in value_and_grad functions.

import aesara
import aesara.tensor as at

x = at.vector("x")
y = at.exp(x/5)
w = y + 1
z = y * 2

f = aesara.function([x], [z, w])
aesara.dprint(f)
Elemwise{Mul}[(0, 1)] [id A] 2
 |TensorConstant{(1,) of 2.0} [id B]
 |Elemwise{Composite{exp((i0 * i1))}} [id C] 0
   |TensorConstant{(1,) of 0.2} [id D]
   |x [id E]
Elemwise{add,no_inplace} [id F] 1
 |TensorConstant{(1,) of 1.0} [id G]
 |Elemwise{Composite{exp((i0 * i1))}} [id C] 0
f = aesara.function([x], [z])
aesara.dprint(f)
Elemwise{Composite{(i0 * exp((i1 * i2)))}} [id A] 0
 |TensorConstant{(1,) of 2.0} [id B]
 |TensorConstant{(1,) of 0.2} [id C]
 |x [id D]

Discussed with @aseyboldt and @aesara-devs/aesara

See rationale and proposed changes in #1237 (comment)

@ricardoV94 ricardoV94 transferred this issue from aesara-devs/aeppl Oct 5, 2022
@brandonwillard brandonwillard added enhancement New feature or request help wanted Extra attention is needed performance concern Op implementation Involves the implementation of an Op labels Oct 5, 2022
@ricardoV94 ricardoV94 changed the title Allow multiple outputs in Composite Allow multiple outputs in FusionOptimizer Oct 5, 2022
@ricardoV94 ricardoV94 self-assigned this Oct 5, 2022
@ricardoV94 ricardoV94 changed the title Allow multiple outputs in FusionOptimizer Allow merging graphs with multiple outputs in FusionOptimizer Oct 5, 2022
@ricardoV94
Copy link
Contributor Author

@brandonwillard Do you have any pointers for non-local rewriters that replace multiple nodes? It sounds like the functionality for multiple outputs is already present in the scalar Composite, the only thing we need is to ramp up the FusionOptimizer

@brandonwillard
Copy link
Member

@brandonwillard Do you have any pointers for non-local rewriters that replace multiple nodes? It sounds like the functionality for multiple outputs is already present in the scalar Composite, the only thing we need is to ramp up the FusionOptimizer

Some of the dict-based replacement code from the non-global rewriters might come in handy; otherwise, there shouldn't be anything special about the local vs. global approaches.

@brandonwillard brandonwillard changed the title Allow merging graphs with multiple outputs in FusionOptimizer Allow merging graphs with multiple outputs in FusionOptimizer Oct 5, 2022
@ricardoV94 ricardoV94 changed the title Allow merging graphs with multiple outputs in FusionOptimizer Allow merging graphs with multiple clients in FusionOptimizer Oct 6, 2022
@brandonwillard
Copy link
Member

brandonwillard commented Oct 18, 2022

Elemwise{Composite{(i0 * exp((i1 * i2)))}} [id A] 0
|TensorConstant{(1,) of 2.0} [id B]
|TensorConstant{(1,) of 0.2} [id C]
|x [id D]

We should probably look into these kinds of fusions that turn scalar constants into inputs. It's not clear to me that this is any better than—say—making i0 and i1 constants and removing those two inputs.

In other words, why don't we produce the following instead?

Elemwise{Composite{(2.0 * exp((0.2 * i0)))}} [id A] 0
 |x [id B]

@brandonwillard
Copy link
Member

Also, can you clarify the kinds of graphs you think we should be producing in these multi-client cases?

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Oct 18, 2022

In other words, why don't we produce the following instead?

I was thinking the same. I can only think of wanting to reuse the same c_code across different Applys, but that's not as relevant for Composites

@ricardoV94
Copy link
Contributor Author

ricardoV94 commented Oct 19, 2022

Clarifying what is the idea behind allowing Composites with multiple clients. The following example:

import aesara
import aesara.tensor as at

x = at.vector("x")
y = at.exp(x/5)
w = y + 1
z = y * 2

f1 = aesara.function([x], [z, w])
aesara.dprint(f1)

Right now produces this graph:

Elemwise{Mul}[(0, 1)] [id A] 2
 |TensorConstant{(1,) of 2.0} [id B]
 |Elemwise{Composite{exp((i0 * i1))}} [id C] 0
   |TensorConstant{(1,) of 0.2} [id D]
   |x [id E]
Elemwise{add,no_inplace} [id F] 1
 |TensorConstant{(1,) of 1.0} [id G]
 |Elemwise{Composite{exp((i0 * i1))}} [id C] 0

After #1242, it produces this graph:

Elemwise{Composite{(i2 * exp((i0 * i1))), (i3 + exp((i0 * i1)))}}.0 [id A] 0
 |TensorConstant{(1,) of 0.2} [id B]
 |x [id C]
 |TensorConstant{(1,) of 2.0} [id D]
 |TensorConstant{(1,) of 1.0} [id E]
Elemwise{Composite{(i2 * exp((i0 * i1))), (i3 + exp((i0 * i1)))}}.1 [id A] 0

The expected savings come from having to iterate only once over the data vector x, vs having to iterate 3 times before, once over x and twice over exp(i0 * i1).

Another example is the following graph:

x = at.dvector("x")
mu = at.dvector("mu")
logp = (- ((x - mu) **2) / 2)
grad = at.grad(logp.sum(), x)
f2 = aesara.function([mu, x], [logp, grad])
aesara.dprint(f2)

Before:

Elemwise{Composite{(i0 * sqr(i1))}}[(0, 1)] [id A] 2
 |TensorConstant{(1,) of -0.5} [id B]
 |Elemwise{sub,no_inplace} [id C] 0
   |x [id D]
   |mu [id E]
Elemwise{neg,no_inplace} [id F] 1
 |Elemwise{sub,no_inplace} [id C] 0

After:

Elemwise{Composite{(-(i0 - i1)), (i2 * sqr((i0 - i1)))}}.1 [id A] 0
 |x [id B]
 |mu [id C]
 |TensorConstant{(1,) of -0.5} [id D]
Elemwise{Composite{(-(i0 - i1)), (i2 * sqr((i0 - i1)))}}.0 [id A] 0

Again we replace 3 loops by 1.


Regarding CSE, I don't think anything is changed. The code generated by the scalar Composite seems to be clever enough and reuse repeated computations in intermediate variables. Here is the C code of the Composite scalar in the first function:

// function 1
{
npy_float64 V13_tmp1;
// i0 * i1
V13_tmp1 = V11_i * V9_i;

npy_float64 V13_tmp2;
// exp(i0 * i1)
V13_tmp2 = exp((npy_float64)V13_tmp1);

// First output
// i2 * exp(i0 * i1)
V1_i = V7_i * V13_tmp2;

// Second output
// i3 + exp(i0 * i1)
V3_i = V5_i + V13_tmp2;
}
// function 2
{
npy_float64 V11_tmp1;
// i0 - i1
V11_tmp1 = V9_i - V7_i;

// First output
// - (i0 - i1)
V1_i = -V11_tmp1;

npy_float64 V11_tmp2;
// sqr(i0 - i1)
V11_tmp2 = V11_tmp1 * V11_tmp1;

// Second output
// i2 * sqr(i0 - i1)
V3_i = V5_i * V11_tmp2;
}

You can see that it avoids recomputing the same sub-expressions in both cases. It does store more intermediate variables than needed, but hopefully the compiler will be able to get rid of those.


I think the idea of inplacing scalar constants is worth investigating, but deserves a separate issue. It was already relevant for the old Composite graphs. Edit: I see it was already separated into #1270

@brandonwillard
Copy link
Member

brandonwillard commented Oct 19, 2022

Do you have a comparison like #1242 (comment) for these new CSE-related changes to your approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed Op implementation Involves the implementation of an Op performance concern request discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants