Explore repeated terms in a function #22

odow · 2022-09-12T22:20:50Z

A lot of problems have the form: sum_i f(x_i), for example machine learning problems where the sum is over observations in a dataset. We can improve the performance of this if we identify the common f term.

This structure appears in the inverse ising problems [private link]:
https://github.com/lanl-ansi/nlp-jump-examples/blob/47d6ccb26c6041390b0662b1d8c1041f28d787ed/inverse_ising/inverse_ising-logrise.jl#L37-L40

The text was updated successfully, but these errors were encountered:

odow · 2023-02-08T20:09:45Z

This also appears in @michel2323's GO models. At the very least, we should be able to deal with objective functions that are separable, even ones with a subset of terms that have a similar structure.

odow · 2024-02-14T03:59:39Z

See also https://github.com/JuliaSmoothOptimizers/ExpressionTreeForge.jl

cc @paraynaud

baggepinnen · 2024-02-14T05:08:26Z

The experimental MadDiff does this well IIRC. Since they type everything, compilation time absolutely blows up if the tool fails to spot such a repeated pattern though.

paraynaud · 2024-02-14T20:55:43Z

Hello !
I am uncertain about what you are trying to calculate.
If you are interested in partially-separable functions, $x_i$ (from the original post) are the variables parametrizing the element function $f_i$.
In order to fulfill the requirements of partial separability, $x_i$ must be a subset of the decision variables.
Note that in machine learning, the loss function applied to a minibatch sums (or mean) the loss function applied to labelled observations.
If the loss function has no partially-separable structure, such as the negative log likelihood, each term of the sum involves all the variables/weights parametrizing the neural network.
In that case, there is no partial separability.

A proper example of partial separability may be $||A x - b ||^2$ where $A$ is a sparse matrix, i.e., each row of $A$ must have at least one null component.
Some loss functions are partially-separable, but it generally requires reworking the neural network architecture to reduce the subset of variables/weights parametrizing each element function.
I wrote a paper mentioning more details : Partially-separable loss to parallelize partitioned neural network training (a GERAD preprint).

ExpressionTreeForge (ETF) is able to detect the partial separability of a function as long as its operators are supported by ETF.
Practically, it retrieves the element functions $f_i$ and $U_i \in \mathbb{R}^{n_i \times n}$ which selects the subset of variables for $f_i$.
I usually write $\hat{f}_i : \mathbb{R}^{n_i} \to \mathbb{R}$ such that $\hat{f}_i(U_i x) = f_i(x)$.
From there, ETF considers several possibilities, such as :

apply automatic differentiation on the expression tree of $\hat{f}_i$;
build a JuMP model for each $\hat{f}_i$, and use the JuMP AD;
...

However, none of my tries were able to compete with the gradient computation from JuMP.
I thought a lot about how computing $\nabla \hat{f}_i$ during the computation of $\nabla f$, but it would require modifying the ReverseDiff tape, which is, unfortunately, out of my reach for now.

Final note, as it stands now, ETF is not a module particularly fast.
Nonetheless, it works on a tree interface allowing to implement "easily" new expression tree data structures without making many changes in the algorithms detecting partial separability (which is its original purpose).

I hope this helps !

odow · 2024-02-15T02:55:13Z

Currently, MathOptSymbolicAD exploits problems in which there is repeated structure in the Jacobian. So if you have a bunch of constraints log(a[i] + x[i]) <= 0 then we create one symbolic derivative for J(ai, xi) = 1 / (ai + xi) and then we can map that function across the static data a and x.

But at the moment we don't recurse into a partially separable function. So if the objective is sum(log(a[i] + x[i]) for i in 1:N) we currently try to build a symbolic gradient with N terms. It'd be much cheaper if we could identify the repeated structure and exploit that.

The experimental MadDiff does this well IIRC

Yes. https://github.com/exanauts/ExaModels.jl does exactly what we do here, except they (ab)use the Julia type system instead of Symbolics.jl :) Their input format also forces the user to provide the repeated structure, instead of detecting it automatically like we do here.

paraynaud · 2024-02-15T17:51:13Z

If I understand correctly, you use Symbolics.jl to compute symbolic differentiation J(ai, xi).
In the case log(a[i] + x[i]^2) then does J = [1/(ai + xi^2) ; 2 xi/(ai + xi^2)] ?

ETF uses Symbolics only to retrieve an Expr from a julia function, onto which several tree manipulation routines can be applied.
If you want to detect partial separability in the objective, you must use several tree manipulation routines.
In my case, the detection of partial separability is done in by PartiallySeparableSolvers.jl in the method partitioned_structure with several routines from ETF.
At some point it returns the vector containing the expression tree of each distinct element function ($M \leq N$) and the indices $p_i, , 1 \leq i \leq N$ such that $1 \leq p_i \leq M$.
For sum(log(a[i] + x[i]) for i in 1:N) then M=1 and p = ones(N).
The return expression tree can be an Expr, my handmade expression tree or JuMP's model.
Does that fit your requirements ?

odow · 2024-02-15T18:38:16Z

Perhaps to clarify, I don't intend to use ETF in this package. It was just a link I was chatting to @amontoison about that seemed relevant and adjacent. It has some useful ideas 😄

paraynaud · 2024-02-15T20:43:20Z

Perhaps to clarity

I am glad you did, I misunderstood :')
Feel free to contact me if you need to use ETF in the future !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore repeated terms in a function #22

Explore repeated terms in a function #22

odow commented Sep 12, 2022

odow commented Feb 8, 2023

odow commented Feb 14, 2024

baggepinnen commented Feb 14, 2024

paraynaud commented Feb 14, 2024

odow commented Feb 15, 2024

paraynaud commented Feb 15, 2024

odow commented Feb 15, 2024 •

edited

Loading

paraynaud commented Feb 15, 2024

Explore repeated terms in a function #22

Explore repeated terms in a function #22

Comments

odow commented Sep 12, 2022

odow commented Feb 8, 2023

odow commented Feb 14, 2024

baggepinnen commented Feb 14, 2024

paraynaud commented Feb 14, 2024

odow commented Feb 15, 2024

paraynaud commented Feb 15, 2024

odow commented Feb 15, 2024 • edited Loading

paraynaud commented Feb 15, 2024

odow commented Feb 15, 2024 •

edited

Loading