Natural Gradients + Monte Carlo VI #1

willtebbutt · 2019-09-03T11:02:09Z

There's been quite a bit of interesting work recently looking at natural gradients for variational inference with exponential family q-distributions, with non-conjugate / non-exponential family likelihoods / priors. See [1] (applied to GPs, but important bits aren't really GP-specific) and [2]. These turn out to be really quite straightforward to implement, so would be a great target for us. As a starting point, you could imagine extending our current mean field implementation to employ natural gradient descent in the parameters of the diagonal Gaussian q-distribution.

There's even work moving slightly beyond exponential family distributions now [3], but this is quite early work. Might be nice to have though.

[1] - Salimbeni, Hugh, Stefanos Eleftheriadis, and James Hensman. "Natural gradients in practice: Non-conjugate variational inference in Gaussian process models." arXiv preprint arXiv:1803.09151 (2018).
[2] - Khan, Mohammad Emtiyaz, and Didrik Nielsen. "Fast yet simple natural-gradient descent for variational inference in complex models." 2018 International Symposium on Information Theory and Its Applications (ISITA). IEEE, 2018.
[3] - Lin, Wu, Mohammad Emtiyaz Khan, and Mark Schmidt. "Fast and Simple Natural-Gradient Variational Inference with Mixture of Exponential-family Approximations." arXiv preprint arXiv:1906.02914 (2019).

yorkerlin · 2021-07-17T14:40:48Z

Hi
Thanks for being interesting in our papers. We have a new ICML 2021 paper on natural-gradient descent (including natural-gradient VI and Newton-like methods) on structured Gaussian and its mixtures.

Lin, Wu, et al. "Tractable structured natural gradient descent using local parameterizations." arXiv preprint arXiv:2102.07405 (2021).

Wu

Red-Portal · 2023-06-10T22:31:27Z

This is a little late for the discussion, but is there any evidence/personal experience that natural gradient descent is more robust/faster than regular ADVI?

torfjelde · 2023-06-11T09:07:06Z

Natural gradient descent is just a way of doing, well, gradient descent on parameters of distributions:) ADVI is a very specific VI method. I'm not sure these are even comparable?

Red-Portal · 2023-06-11T20:18:33Z

I think many consider NGVI as a competitor to good ol' ADVI. At least the literature certainly make it look like that.

yebai transferred this issue from TuringLang/Turing.jl Dec 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Natural Gradients + Monte Carlo VI #1

Natural Gradients + Monte Carlo VI #1

willtebbutt commented Sep 3, 2019

yorkerlin commented Jul 17, 2021 •

edited

Loading

Red-Portal commented Jun 10, 2023

torfjelde commented Jun 11, 2023

Red-Portal commented Jun 11, 2023

Natural Gradients + Monte Carlo VI #1

Natural Gradients + Monte Carlo VI #1

Comments

willtebbutt commented Sep 3, 2019

yorkerlin commented Jul 17, 2021 • edited Loading

Red-Portal commented Jun 10, 2023

torfjelde commented Jun 11, 2023

Red-Portal commented Jun 11, 2023

yorkerlin commented Jul 17, 2021 •

edited

Loading