Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking AdvancedVI #24

Closed
theogf opened this issue Feb 12, 2021 · 19 comments
Closed

Rethinking AdvancedVI #24

theogf opened this issue Feb 12, 2021 · 19 comments
Assignees

Comments

@theogf
Copy link
Member

theogf commented Feb 12, 2021

Alright! It's time to seriously take care of AdvancedVI :D

Here are some of the things we talked about in the meeting back in October:

  • There should be two distinct methods of optimization when the variational distribution is given as a function (like update_q) or a distribution from which the parameters change.
  • Hyperparameter optimization should be nicely implemented, a proposition was :
    makelogπ(logπ, ::Nothing) = logπ
    makelogπ(logπ, hyperparams) = logπ(hyperparams)
    function vi(..., logπ; hyperparams = nothing)
        ...
        while not_converged
            logjoint = makelogπ(logπ, hyperparams)
            for i in 1:n_inner
                ...
            end
        end
    end
  • We should condensate the updates on the variational parameters via a more "atomic" step! function

And here are some more personal points (disclaimer: I will be happy to take care of these different points)

  • I don't think the current ELBO approach is good, the ELBO can always be splitted between an entropy term (depending only of the distribution) and an expectation term over the log joint. Most VI methods take advantage of this by computing the entropy gradient analytically (and smartly!), see "Doubly Stochastic Variational Inference" by Titias for instance. My proposition would be to split the gradient into two parts (grad_entropy + grad_expeclog), where one can specialize given the problem.
  • I would personally argue that update_q only makes sense with the current obsolete implementation using distributions with immutable fields like TuringMvNormal. See again Titsias using the reparametrization trick.
@theogf theogf self-assigned this Feb 12, 2021
@Red-Portal
Copy link
Member

Hi, is there any update on a complete rewrite of AdvancedVI? Or even an expected time frame for release?

@theogf
Copy link
Member Author

theogf commented Jun 6, 2022

Hey, there is no update and I would say that this has gone stale. I don't have the bandwidth for it anymore and neither does @torfjelde (I guess), so unless someone takes over...

@Red-Portal
Copy link
Member

Hi @theogf , that's sad news. Then at the given moment, the VI ecosystem of Turing will not see much improvement? I heard early this year that @torfjelde is currently improving the turning model APIs, which I think will be quite coupled to anything done to AdvancedVI.jl is there any timeline on that?

@theogf
Copy link
Member Author

theogf commented Jun 6, 2022

I really hope @torfjelde has the time for it (we haven't talked in a while). If the package becomes easier to work with I would definitely be happy to add a couple of algorithms like SVGD and others. But I generally think that a revamping is very necessary. The ML ecosystem evolved a lot and there are now new solutions like ParametersHandling.jl for problems we had here.

@Red-Portal
Copy link
Member

Is there a straightforward way to deal with the covariance of a full-rank multivariate normal variational family though? I have been using AdvancedVI.jl as the basis of one of my recent research projects, but couldn't come up with a way to elegantly unpack/repack the parameters of the covariance. I think taking gradients independently for each symbolic variable a la Flux.jl could be a solution. Any thoughts on this?

@theogf
Copy link
Member Author

theogf commented Jun 6, 2022

You should have a look at ParameterHandling.jl and the positive_definite function. However, there is no specific optimization for VI, but that's a topic on its own!

@Red-Portal
Copy link
Member

@theogf That looks great. I would really like to know about the future/current state of Turing.jl's model API before doing anything though.

@Red-Portal
Copy link
Member

Red-Portal commented Jun 9, 2022

I will start pursuing a PhD starting this Fall, and this might give me some bandwidth to work full-time on AdvancedVI.jl I personally think there is a lot of potential for it being a research platform for cutting edge VI research. There are some things that we are missing and need some major work.

  • Support a diverse set of variational families like the convex update and structured normalizing flows for example. These need to inspect the probabilistic program.
  • User-defined structured variational families. I think it would be useful to use Turing to describe a probabilistic program of variational families. This would need additional functionalities like inferring the variational parameters that do not contain a prior. Not sure if this is easy to do with Turing at the moment.
  • Recently proposed diagnostics for VI.
  • A way to express factorizable likelihoods for minibatching and amortized inference.

@theogf could you list of changes that you planned to introduce into AdvancedVI? I might be able to pick them up at some point.

@torfjelde
Copy link
Member

Hey! I'm back now; been away for the past 4 months, so sorry for not being responsive here.

Then at the given moment, the VI ecosystem of Turing will not see much improvement? I heard early this year that @torfjelde is currently improving the turning model APIs, which I think will be quite coupled to anything done to AdvancedVI.jl is there any timeline on that?

So it depends on what we're talking about here.

The work I'm doing on the model-side of Turing.jl will be very useful for any interaction AdvancedVI.jl wants to have with Turing.jl-models, e.g. perform VI on a Turing.jl model, use a Turing.jl model to define a variational approximation, etc. But solely for AdvancedVI.jl, i.e. ignoring any relation to the rest of Turing.jl-ecosystem, we're still not happy with what we have set-up this far; the general API needs to improve, as partially outlined by @theogf above. There are also some significant improvements in the ecosystem that we might want to take advantage of here in AdvancedVI.jl:

And so on.

It requires a bit more thought and outlining what we want here though, but I'm keen on getting something rolling now!:)

@Red-Portal
Copy link
Member

Hi @torfjelde , nice to have you back. If you haven't noticed, I'm one of the guys that was on the Turing.jl salespitch at the University of Liverpool.

Some additional thoughts: People have been talking about SVGD in this repo for quite some time, but I don't think it will make a good fit here. Its algorithmic structure is quite different from BBVI/MCVI such that I don't see good abstraction opportunities. And given that we'll not see a shortage of variational particle methods any time soon, I think it will be good to have a separate package like AdvancedParticles.jl or something.

@theogf
Copy link
Member Author

theogf commented Jun 10, 2022

Some additional thoughts: People have been talking about SVGD in this repo for quite some time, but I don't think it will make a good fit here.

I don't agree, the representation is different but just as relevant.

Even if we move it to a different package, we would still need a common API. So it's probably preferable to think of this in one package before starting to split things up.

@Red-Portal
Copy link
Member

@theogf Given that you already have #25 open, do you plan on coming back to #25 or how should we attack rewriting AdvancedVI?

@theogf
Copy link
Member Author

theogf commented Jun 10, 2022

No I think it's probably better to start back from scratch, you can eventually take ideas from there if you want

@Red-Portal
Copy link
Member

Okay. Thanks, @theogf @torfjelde the discussions were really helpful.

@Red-Portal
Copy link
Member

Hi @torfjelde , I'm thinking about how to restructure the overall project.

I'm thinking to restructure the project as:

  • estimators/
  • diagonstics/
  • algorithms/

Currently, AdvancedVI.jl has a separate notion of a variational objective (implemented in objectives.jl) and an algorithm (implemented in advi.jl; I'm proposing to change this terminology into esimator) for estimating the objective's gradient, but I don't think this distinction is necessary. After all, most of the gradient estimators proposed in the literature target specific objectives, so I think an objective should be an attribute of an estimator rather its own object.

Under algorithms, I'm planning to put higher-level algorithms that utilize the output of the estimators. For example, stochastic varianced-reduced gradient descent could be one, or methods of combining the output of multiple estimators like [1,2] could also be considered.

For diagonstics, I'm thinking of the various VI-specific diagnostics that have been proposed over the years, like the ones in [3], and the R-hat diagnostics [4]. Though [4] would need an online version of R-hat. I think I saw some heresay about this but not sure what happened on that front.

[1] "A Rule for Gradient Estimator Selection, with an Application to Variational Inference," https://arxiv.org/abs/1911.01894
[2] "Using Large Ensembles of Control Variates for Variational Inference," https://arxiv.org/abs/1810.12482
[3] "Validated Variational Inference via Practical Posterior Error Bounds," http://proceedings.mlr.press/v108/huggins20a.html
[4] "Robust, Accurate Stochastic Optimization for Variational Inference," https://arxiv.org/abs/2009.00666

@yebai
Copy link
Member

yebai commented Feb 27, 2023

Hi @Red-Portal, it looks like a sensible plan. I suggest we keep things simple until there is a genuine need for generalisation. For example, estimators and algorithms can be kept the same if they are always coupled in practice.

Some diagnostics are definitely helpful, but this is likely a challenging area as we don't have good ways of checking convergence from the VI approximation to the true target. One way is to run expensive MCMC simulations and compute the divergence between VI approximation and MCMC samples. But we don't have guarantees that MCMC converges either.

For a concrete start, maybe you can focus on refactoring the current algorithms to improve clarity, documentation, and design consistency. We can add new algorithms or diagnostics at an advanced project stage.

@Red-Portal
Copy link
Member

Red-Portal commented Feb 27, 2023

Hi @yebai ,

For a concrete start, maybe you can focus on refactoring the current algorithms to improve clarity, documentation, and design consistency. We can add new algorithms or diagnostics at an advanced project stage.

Absolutely! With the talk around diagnostics and algorithms, I wanted to illustrate the potential uses of the new structure. The actual content would be a long-term goal, if feasible.

I'll start with refactoring the existing functionalities.

@Red-Portal
Copy link
Member

Hi @yebai @torfjelde ,

What is the current policy about LogDensityProblems.jl ? It seems AdvancedHMC.jl chose to go with it. Should AdvancedVI.jl also follow suite?

@yebai
Copy link
Member

yebai commented Mar 7, 2023

That sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants