Estimating the noise level hyperparameter \sigma_n #451

KukumavMozolo · 2016-01-07T14:57:15Z

As far as i know there is no implemented method that supports to estimate the autocorrelated noise (noise_variance) from points_sampled via maximum_likelihood or leave_one_out. I wonder why that is and if you are planning to implement it?

KukumavMozolo · 2016-01-07T15:31:50Z

Ok i found the answer in parts in another tread.
As suntzu86 said:

For each data point, you'd want to provide MOE with the mean & variance. For us, we were > observing user clicks, so we computed mean/var as in a binomial distribution. I'm not sure what > you're measuring, but you'd want either that or mean/var from beta.

P.S. I don't have the reference handy but there are more scientific ways of picking this value (by > based on max likelihood). But implementing these isn't trivial b/c it introduces a dependency on the > hyperparameters and you'd need to optimize everything together. Plus, these estimates seem to > do pretty badly early on when you don't have much data. So I haven't done it here.

KukumavMozolo · 2016-01-07T16:03:41Z

So in case i want to model the CTR what u are suggesting is to use a beta distribution. But we are pretending that the ctr is normal distributed. I could imagine that the right side tail of the beta might be much longer than a normal distribution. Dont we underestimate the variance of the normal in that case?
Another question, how would you update the sigma_n with new information we acquire while we are optimizing the objective. In other words how do to avoid that the gp explains all parameter dependent variations with noise.

suntzu86 · 2016-01-10T22:24:17Z

Yeah if there's a lot of demand for automatic noise estimation, I could add that in as an option... don't hold your breath though :) But from our experience, cases we ran across had either measured or estimated noise or were known to be noise-free. It seemed better to use these facts about the black box being optimized rather than select an arbitrary-ish (and universal... same noise at every point) noise value based on likelihood.

As for your new question, I'm not sure I totally understand. A couple of points, maybe this helps

A GP is a Gaussian at "every point." But that does not mean the GP assumes the underlying system it is modeling is Gaussian or even Gaussian-like. (Deviate too far like something with lots of discontinuities and performance will suffer, but it will still work.)
A Gaussian has mean (1st moment) and variance (2nd moment) with all higher moments being 0. I believe it is the only nontrivial distribution with this property. So at locations where mean & variance are known, we tell MOE the appropriate values.
Computing mean/variance of CTR using "normal" statistics is actually incorrect b/c the CTR distribution is not Gaussian.

Also not sure what you mean by "update sigma_n". If you re-sample an old point and get a new noise value, you can just change it when you pass data to MOE. Technically such a change would require re-tuning the hyperparameters.
If you mean what to do when the GP "collapses" and essentially yields 0 variance everywhere (so there's nothing more to optimize/test)... well I don't have a great answer for that. I've really only seen this happen in very low dimensions, but things to try include:

randomly throw out some points
shorten or lengthen hyperparameters (e.g., say we have samples at 0.1, 0.2, 0.7, 0.8 with length scale = 0.7 and the GP "collapses." You're probably thinking, how does MOE know nothing interesting happens btwn 0.2 and 0.7? It doesn't, but it assumes it does b/c the length scale is relatively large.)
It didn't come up in any experiments we ran on real data, so I haven't spent much time thinking about it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Estimating the noise level hyperparameter \sigma_n #451

Estimating the noise level hyperparameter \sigma_n #451

KukumavMozolo commented Jan 7, 2016

KukumavMozolo commented Jan 7, 2016

KukumavMozolo commented Jan 7, 2016

suntzu86 commented Jan 10, 2016

Estimating the noise level hyperparameter \sigma_n #451

Estimating the noise level hyperparameter \sigma_n #451

Comments

KukumavMozolo commented Jan 7, 2016

KukumavMozolo commented Jan 7, 2016

KukumavMozolo commented Jan 7, 2016

suntzu86 commented Jan 10, 2016