Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore repeated terms in a function #22

Open
odow opened this issue Sep 12, 2022 · 8 comments
Open

Explore repeated terms in a function #22

odow opened this issue Sep 12, 2022 · 8 comments

Comments

@odow
Copy link
Collaborator

odow commented Sep 12, 2022

A lot of problems have the form: sum_i f(x_i), for example machine learning problems where the sum is over observations in a dataset. We can improve the performance of this if we identify the common f term.

This structure appears in the inverse ising problems [private link]:
https://github.com/lanl-ansi/nlp-jump-examples/blob/47d6ccb26c6041390b0662b1d8c1041f28d787ed/inverse_ising/inverse_ising-logrise.jl#L37-L40

@odow
Copy link
Collaborator Author

odow commented Feb 8, 2023

This also appears in @michel2323's GO models. At the very least, we should be able to deal with objective functions that are separable, even ones with a subset of terms that have a similar structure.

@odow
Copy link
Collaborator Author

odow commented Feb 14, 2024

@baggepinnen
Copy link

The experimental MadDiff does this well IIRC. Since they type everything, compilation time absolutely blows up if the tool fails to spot such a repeated pattern though.

@paraynaud
Copy link

Hello !
I am uncertain about what you are trying to calculate.
If you are interested in partially-separable functions, $x_i$ (from the original post) are the variables parametrizing the element function $f_i$.
In order to fulfill the requirements of partial separability, $x_i$ must be a subset of the decision variables.
Note that in machine learning, the loss function applied to a minibatch sums (or mean) the loss function applied to labelled observations.
If the loss function has no partially-separable structure, such as the negative log likelihood, each term of the sum involves all the variables/weights parametrizing the neural network.
In that case, there is no partial separability.

A proper example of partial separability may be $||A x - b ||^2$ where $A$ is a sparse matrix, i.e., each row of $A$ must have at least one null component.
Some loss functions are partially-separable, but it generally requires reworking the neural network architecture to reduce the subset of variables/weights parametrizing each element function.
I wrote a paper mentioning more details : Partially-separable loss to parallelize partitioned neural network training (a GERAD preprint).

ExpressionTreeForge (ETF) is able to detect the partial separability of a function as long as its operators are supported by ETF.
Practically, it retrieves the element functions $f_i$ and $U_i \in \mathbb{R}^{n_i \times n}$ which selects the subset of variables for $f_i$.
I usually write $\hat{f}_i : \mathbb{R}^{n_i} \to \mathbb{R}$ such that $\hat{f}_i(U_i x) = f_i(x)$.
From there, ETF considers several possibilities, such as :

  • apply automatic differentiation on the expression tree of $\hat{f}_i$;
  • build a JuMP model for each $\hat{f}_i$, and use the JuMP AD;
  • ...

However, none of my tries were able to compete with the gradient computation from JuMP.
I thought a lot about how computing $\nabla \hat{f}_i$ during the computation of $\nabla f$, but it would require modifying the ReverseDiff tape, which is, unfortunately, out of my reach for now.

Final note, as it stands now, ETF is not a module particularly fast.
Nonetheless, it works on a tree interface allowing to implement "easily" new expression tree data structures without making many changes in the algorithms detecting partial separability (which is its original purpose).

I hope this helps !

@odow
Copy link
Collaborator Author

odow commented Feb 15, 2024

Currently, MathOptSymbolicAD exploits problems in which there is repeated structure in the Jacobian. So if you have a bunch of constraints log(a[i] + x[i]) <= 0 then we create one symbolic derivative for J(ai, xi) = 1 / (ai + xi) and then we can map that function across the static data a and x.

But at the moment we don't recurse into a partially separable function. So if the objective is sum(log(a[i] + x[i]) for i in 1:N) we currently try to build a symbolic gradient with N terms. It'd be much cheaper if we could identify the repeated structure and exploit that.

The experimental MadDiff does this well IIRC

Yes. https://github.com/exanauts/ExaModels.jl does exactly what we do here, except they (ab)use the Julia type system instead of Symbolics.jl :) Their input format also forces the user to provide the repeated structure, instead of detecting it automatically like we do here.

@paraynaud
Copy link

If I understand correctly, you use Symbolics.jl to compute symbolic differentiation J(ai, xi).
In the case log(a[i] + x[i]^2) then does J = [1/(ai + xi^2) ; 2 xi/(ai + xi^2)] ?

ETF uses Symbolics only to retrieve an Expr from a julia function, onto which several tree manipulation routines can be applied.
If you want to detect partial separability in the objective, you must use several tree manipulation routines.
In my case, the detection of partial separability is done in by PartiallySeparableSolvers.jl in the method partitioned_structure with several routines from ETF.
At some point it returns the vector containing the expression tree of each distinct element function ($M \leq N$) and the indices $p_i, , 1 \leq i \leq N$ such that $1 \leq p_i \leq M$.
For sum(log(a[i] + x[i]) for i in 1:N) then M=1 and p = ones(N).
The return expression tree can be an Expr, my handmade expression tree or JuMP's model.
Does that fit your requirements ?

@odow
Copy link
Collaborator Author

odow commented Feb 15, 2024

Perhaps to clarify, I don't intend to use ETF in this package. It was just a link I was chatting to @amontoison about that seemed relevant and adjacent. It has some useful ideas 😄

@paraynaud
Copy link

Perhaps to clarity

I am glad you did, I misunderstood :')
Feel free to contact me if you need to use ETF in the future !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants