Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown Source of Variation in gp_next_points() #452

Open
dgreis opened this issue Jan 14, 2016 · 10 comments
Open

Unknown Source of Variation in gp_next_points() #452

dgreis opened this issue Jan 14, 2016 · 10 comments

Comments

@dgreis
Copy link

dgreis commented Jan 14, 2016

I have been using MOE to tune parameters of a black box function. One thing I haven't been able to understand is why -- holding constant the number of historical points in an experiment -- consecutive calls to gp_next_points() will return points that seem to vary randomly within the X domain.

After reading the EGO paper, this doesn't seem to make sense. As I understand it, the idea behind EI is that the suggested points at any given stage are supposed to be deterministic. That is, EI returns the X point where the GP thinks there will be the greatest improvement in the underlying objective function.

To get to the bottom of this, I've tried running MOE on a known function (the Branin function), removing all the usual sources of variation. Yet, I still see the randomness mentioned above. What's more, benchmarked against GPyOpt, MOE doesn't seem close to finding the answer even after 50 iterations.

Here's what I did. First, the imports outside of MOE:

import numpy as np
import numpy.random as npr
import math
import pandas as pd
import time

Now, define the Branin Function:

def branin(x):
    x1 = x[0]
    x2 = x[1]
    a = 1.
    b = 5.1 / (4.*np.pi**2)
    c = 5. / np.pi
    r = 6.
    s = 10.
    t = 1. / (8.*np.pi)
    ret  = a*(x2-b*x1**2+c*x1-r)**2+s*(1-t)*np.cos(x1)+s
    #print ret
    return ret

Random points are generated in the domain described in the Branin description

x1 = 15*npr.random_sample(5) - 5
x2 = 15*npr.random_sample(5)
data = np.vstack([x1,x2]).T
fval = [branin(x)  for x in data]

Next, I add these points to a moe.easy_interface.experiment.Experiment. I included no variance at these points, as I wanted to see whether I could reproduce the problematic randomness by removing all input sources of variance. Variance can be added to these points with the same end result.

exp = Experiment([[-5,10],[0,15]])
exp.historical_data.append_sample_points([
    SamplePoint(data[0], fval[0], 0)])
exp.historical_data.append_sample_points([
    SamplePoint(data[1], fval[1], 0)])
exp.historical_data.append_sample_points([
    SamplePoint(data[2], fval[2],0)])
exp.historical_data.append_sample_points([
    SamplePoint(data[3], fval[3], 0)])
exp.historical_data.append_sample_points([
    SamplePoint(data[4], fval[4], 0)])

Next, I cycle MOE for 50 iterations using the function defined below. I included a covariance re-tuning every 5 iterations (as suggested in the MOE FAQ). This function returns -- among other things -- a dataframe rdf that summarizes the suggested points at each iteration, as well as the function value at each of these points (with no variance, again in an effort to isolate the problematic randomness). Additionally, the function also returns the experiment itself** and latest covariance_info, along with some metadata regarding time taken for each covariance tuning (tdf)

**Side Note: There is some weird bug where I wasn't able to add new points to an experiment after I send it to the REST server, so as a workaround, I would just add the points to a new experiment at each iteration.

def cycle_moe(exp, R=50):
    rdf = pd.DataFrame({})
    tdf = pd.DataFrame({})
    for i in range(R):
        bounds = get_bounds(exp)
        if (i % 5 == 0):
            points_sampled = grab_pts_sampled(exp)
            t0 = time.time()
            covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.59.103')
            print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
            tdf[str(0)] = [time.time() - t0]
        next_point = gp_next_points(exp,rest_host='192.168.59.103',**{'covariance_info':covariance_info})[0]
        f_next = branin(next_point) #+  npr.normal(scale=std)
        rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next},ignore_index=True)

        points = exp.historical_data.to_list_of_sample_points()
        new_point = SamplePoint(next_point, f_next, 0)
        exp = Experiment(bounds)
        points.append(new_point)
        exp.historical_data.append_sample_points(points)
    return rdf,tdf,exp,covariance_info

This function had two helper functions:

def get_bounds(exp):
    num_dims = exp.domain.dim
    x_intls = [exp.domain.get_bounding_box()[i] for i in range(num_dims)]
    intl_list = list()
    for intl in x_intls:
        intl_list.append([intl.min, intl.max])
    return intl_list

and

def grab_pts_sampled(exp):
    pts = exp.historical_data.points_sampled
    f_val = exp.historical_data.points_sampled_value
    f_var = exp.historical_data.points_sampled_noise_variance
    return [[list(pts[i,:]),f_val[i],f_var[i]] for i in range(len(pts))]

After running exp through cycle_moe() for 50 iterations, when I make consecutive calls with the following line:

gp_next_points(exp,rest_host='192.168.59.103',**{'covariance_info':covariance_info})[0]

I still see random variation in the next points suggested, which I don't understand. Additionally, when examining the results of each iteration in rdf, it doesn't look like MOE is getting close to finding one of the global optima in the Branin description

I ran the same number of iterations using GPyOpt on the same function, using the same method ('EI'), and it converged to one of the global optima.

Hopefully, with all my code and each step outlined above, somebody can help me figure out exactly what I'm doing wrong here. But up to now, this has been a serious head scratcher...

@suntzu86
Copy link
Contributor

Sorry I didn't respond sooner... just started a new job so I've been pretty busy. Plus I don't have a working linux installation at the moment to test on, lol.

I believe Branin should work b/c we have tested with it before. Although while it is a very commonly considered case in academic papers, it never seemed particularly similar to our present MOE applications so we stopped focusing on it.

So this should work. That said, a few things stand out to me...

  1. All the http endpoints return status messages (like indicating whether the solver converged/found an update). Are these coming back as true? (If not, we just return a random value or best guess but it is not trustworthy.)
  2. You don't pass a hyperparameter_domain to gp_hyper_opt. I'm actually really surprised this doesn't fail. Could you tell me what the result of this line is in your run?
    https://github.com/Yelp/MOE/blob/master/moe/views/rest/gp_hyper_opt.py#L110

I'm not entirely sure what domain it thinks it's getting. My only guess could be that the domain is coming out as [1.0, 1.0] for all hyperparameters. (Because the C++ accepts the domain in log10, so after exponentiating, 10 ** 0 = 1.) But I really don't know... that should have failed validation, sorry :(
3. Following 2, can you print out the covariance hyperparams you're getting? Are they all 1.0?

Also, I didn't build or interact with the easyinterface stuff like at all, so please bear with me.

Lastly, you mean that doing somethign like this:
https://github.com/Yelp/MOE/blob/master/moe_examples/next_point_via_simple_endpoint.py
you were not able to update the Experiment object from easyinterface? That doesn't make any sense b/c that previous example is one of our integration tests which are all passing in the most recent revision.

@dgreis
Copy link
Author

dgreis commented Feb 3, 2016

Answering your q's:

  1. All the JSON responses in my docker console have messages like "status" : "optimizer_success", so it seems the solver is working...

  2. I'm running this from ipython notebook. My kernel always dies when I run:
    from moe.views.rest import gp_hyper_opt
    So I'm using
    from moe.easy_interface.simple_endpoint import gp_hyper_opt
    instead.

I still tried adding a print statement to moe/views/rest/gp_hyper_opt to get the value of hyperparameter_domain but the call from simple_endpoint doesn't seem to run thru that file. Or maybe I'm doing something wrong...

Nonetheless, I refactored the cycle_moe() and grab_points_sampled(exp) functions as follows, to include the hyperparameter_domain_info:

def cycle_moe(exp, R=50):
    rdf = pd.DataFrame({})
    tdf = pd.DataFrame({})
    for i in range(R):
        bounds = get_bounds(exp)
        if (i % 5 == 0):
            points_sampled = grab_pts_sampled(exp)
            t0 = time.time()
            gp_dim = len(points_sampled[0][0])
            hyper_dim = gp_dim + 1
            covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.59.103',**{'hyperparameter_domain_info' : { \
                'dim': hyper_dim, \
                'domain_bounds': [{'min': 0.1, 'max': 2.0}] * hyper_dim  \
                }})
            print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
            tdf[str(0)] = [time.time() - t0]
        next_point = gp_next_points(exp,rest_host='192.168.59.103',**{'covariance_info':covariance_info})[0]
        f_next = branin(next_point) #+  npr.normal(scale=std)
        rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next,
                         'cov_hyps': covariance_info['hyperparameters']},ignore_index=True)

        #points = exp.historical_data.to_list_of_sample_points()
        new_point = SamplePoint(next_point, f_next, 0)
        #exp = Experiment(bounds)
        #points.append(new_point)
        #exp.historical_data.append_sample_points(points)
        exp.historical_data.append_sample_points([new_point])
    return rdf[['x1','x2','f','cov_hyps']],tdf,exp,covariance_info

def grab_pts_sampled(exp):
    pts = exp.historical_data.points_sampled
    f_val = exp.historical_data.points_sampled_value
    f_var = exp.historical_data.points_sampled_noise_variance
    return [SamplePoint(list(pts[i,:]),f_val[i],f_var[i]) for i in range(len(pts))] #Changed this line

As a side note, it seems like I can actually add to experiments from the easy interface, so it doesn't seem like that is actually an issue anymore.

  1. When I cycle this 50 times with 5 training points to start (just like I did in my first post), the hyperparameter values stick around [2.0, 2.0, 2.0] in rdf. When I ran this for 100 iterations, some of the values did depart from 2.0, but the overall values are still not converging to any of the global minima.

@suntzu86
Copy link
Contributor

  1. optimizer_success: True is what you're looking for. That field is the name of a boolean. Both hyper_opt and next_points produce something along those lines.
  2. Your kernel dies...? That seems problematic. Like the linux kernel dies? Any error msgs or logging info or whatnot before death? I've never seen that happen before.

Your code edit needs to happen on the server (like if you're running in docker or something, it needs to be in the code running there; so relaunch after edits or log in to docker). easy_interface is just a wrapper around urllib2 to send POST and GET requests; it doesn't do any compute.

But that function is definitely being called. It's the only "view" that handles hyperparameter optimization.

  1. Just to be clear, the hyperparameter values won't be related to the minima of the branin fcn. Unless you have a reference for like, what the ideal hyperparams for branin should be. I'd set the hyperparam domain to be something like [[0.01, 3], [0.03, 15], [0.03, 15]]. max length scale of 2 is pretty small for a domain with length 15.
  2. You're running the C++ right? Not forcing the stuff in python_version to run somehow?
  3. what does the scatter plot of points explored look like?

@dgreis
Copy link
Author

dgreis commented Feb 17, 2016

  1. optimizer_success: True is what you're looking for. That field is the name of a boolean. Both hyper_opt and next_points produce something along those lines.

I see things like:

'status': {'optimizer_success': {'log_marginal_likelihood_newton_found_update': True}

Your kernel dies...? That seems problematic. Like the linux kernel dies? Any error msgs or logging info or whatnot before death? I've never seen that happen before.

Not linux kernel. iPython Kernel. I don't think this is important because this isn't even where the code is executing; it's executing on Docker, as you pointed out.

Your code edit needs to happen on the server (like if you're running in docker or something, it needs to be in the code running there; so relaunch after edits or log in to docker). easy_interface is just a wrapper around urllib2 to send POST and GET requests; it doesn't do any compute.

I spent hours trying to print the value of hyperparamter_domain in views/rest/gp_hyper_opt.py (print to stdout, pass to json output, write to file) but all of them caused the build to fail tests and resulted in a bad (500 error) request answer from Docker. I'm out of ideas to debug this. Is knowing this absolutely necessary?

(as a side note, #401 turned out to be a blocker to do code edits easily. I commented there too)

Just to be clear, the hyperparameter values won't be related to the minima of the branin fcn. Unless you have a reference for like, what the ideal hyperparams for branin should be. I'd set the hyperparam domain to be something like [[0.01, 3], [0.03, 15], [0.03, 15]]. max length scale of 2 is pretty small for a domain with length 15.

When I said, "the overall values are still not converging to any of the global minima," I meant the overall values of the branin function.

I changed the line in my cycle_moe() from

'domain_bounds': [{'min': 0.1, 'max': 2.0}] * hyper_dim

to:

'domain_bounds': [{'min': 0.01, 'max': 3.0},{'min': 0.03, 'max': 15},{'min': 0.03, 'max': 15}]

This didn't seem to have any impact.

You're running the C++ right? Not forcing the stuff in python_version to run somehow?

I'm not sure how to verify whether I'm doing this or not, but I haven't explicitly or intentionally done this I don't think...

what does the scatter plot of points explored look like?

moe_points_explored_branin

This is 50 iterations. For reference, the global optima of the branin are at: (-pi, 12.275), (pi, 2.275) and (9.42478, 2.475). It doesn't look like MOE is getting close to finding any of them. It's hard to understand the sequence of points explored in a scatter plot, so I tried to color the points so the earlier ones would be more red and the later ones more blue. If you run cycle_moe(), it's much easier to see the progression just by looking at the columns for X1 and X2 in rdf

@suntzu86
Copy link
Contributor

  • Ok. There should be a msg like that returned by the next_points call too.
  • Still weird. Idk why it would cause ipython to freeze. I don't use docker (compile, build, and run locally) but I have no guesses there.
  • The hyperparam domain should be passed correctly esp now that you're specifying it explicitly. Don't worry about this anymore. I just wasn't sure what the default value (when you didn't specify it) was. Sorry that has been such a pain for you :( I didn't really connect the dots there... been on antibiotics for the last week and my brain is wonky. Dev'ing on docker is a huge pain; I avoid it. You would probably need to turn off line 35 of the dockerfile to prevent test failure and I'm not sure where print statements to stdout go inside of docker, probably some configurable log.
  • Docker image build process should be more explicit about basing off a fresh repository #401 sorry again about the lack of docker guidance & it being painful. I know very little about it. I thought that was already handled previously. Original intent was for only maintainers to mess with it and have people build locally if they were doing dev stuff, I think?
  • No impact... ok. I haven't run locally yet. Got my stuff setup this weekend but I upgraded ubuntu too far and now all kinds of tests are failing on gcc 5. Working on it...
  • Again brainfart. If you're in docker then the C++ is running.
  • Scatter plot: :( :( Blah. Could you give me one more piece of debug info? After each result for next_points, can you call gp_ei for the point that MOE suggests as well as the true minima? (e.g., to check if EI optimization even has a hope of finding the it.) If it's too much trouble no worries, I'll look into it anyway.

Steps if interested: Copy the gp_mean_var function in easy_interface to like gp_ei. Change GP_MEAN_VAR_ROUTE_NAME to GP_EI_ROUTE_NAME. Change the return ... line to return output.get('expected_improvement'). And points_to_evaluate is a list of lists, e.g., [[0.3, 0.2], [0.5, 0.7], [-3.14, 12.25]]. The schema for gp_ei is here: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_ei.py compared to mean_var:
https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_mean_var.py
The points_sampled input is just like in hyper_opt. You need to specify covariance_info as a kwarg just like in next_points.

Anyway, sorry I don't have a working install yet but I'll get there. Should have looked into gcc default version before upgrading to ubuntu 15.10.
Speaking of which, can you tell me which version of gcc and which distro/version of linux you're on? (edit: although probably irrelevant b/c the docker image should be independent of those things)

@suntzu86
Copy link
Contributor

Also does this guy in moe_examples folder work for you? Basically does the same thing as your script, I think...

https://github.com/Yelp/MOE/blob/master/moe_examples/combined_example.py

may want to disable the noise in function_to_minimize. Looks like the test for the files in moe_examples don't verify correctness (just that they run).

@dgreis
Copy link
Author

dgreis commented Feb 24, 2016

Scatter plot: :( :( Blah. Could you give me one more piece of debug info? After each result for next_points, can you call gp_ei for the point that MOE suggests as well as the true minima? (e.g., to check if EI optimization even has a hope of finding the it.) If it's too much trouble no worries, I'll look into it anyway.

There are multiple minima (all equal to 0.39) in the branin function see desc, so this is not entirely straightforward

Steps if interested: Copy the gp_mean_var function in easy_interface to like gp_ei. Change GP_MEAN_VAR_ROUTE_NAME to GP_EI_ROUTE_NAME. Change the return ... line to return output.get('expected_improvement'). And points_to_evaluate is a list of lists, e.g., [[0.3, 0.2], [0.5, 0.7], [-3.14, 12.25]]. The schema for gp_ei is here: https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_ei.py compared to mean_var:
https://github.com/Yelp/MOE/blob/master/moe/views/schemas/rest/gp_mean_var.py
The points_sampled input is just like in hyper_opt. You need to specify covariance_info as a kwarg just like in next_points.

These are "steps"... to what? I didn't understand what this was for, but I did it anyway. I couldn't add gp_mean_var to views.rest.gp_ei.py because of the same iPython-kernel-dying issue as before (let's leave that issue aside), but I was able to get it to work by keeping it in simple_endpoint.py. I had to make a few more changes to the file, and I also changed the name so it wouldn't clash with the existing gp_mean_var. Renamed to my_grab_ei, it looks like this:

from moe.views.schemas.rest.gp_ei import GpEiResponse #Added
from moe.views.constant import GP_EI_ROUTE_NAME, GP_EI_PRETTY_ROUTE_NAME #Added

def my_grab_ei(
        points_sampled,
        points_to_evaluate,
        rest_host=DEFAULT_HOST,
        rest_port=DEFAULT_PORT,
        testapp=None,
        **kwargs
):
    """Hit the rest endpoint for calculating the posterior mean and variance of a gaussian process, given points already sampled."""
    endpoint = ALL_REST_ROUTES_ROUTE_NAME_TO_ENDPOINT[GP_EI_ROUTE_NAME] ##Changed
    raw_payload = kwargs.copy()  # Any options can be set via the kwargs ('covariance_info' etc.)

    raw_payload['points_to_evaluate'] = points_to_evaluate

    # Sanitize input points
    points_sampled_clean = [SamplePoint._make(point) for point in points_sampled]
    historical_data = HistoricalData(
            len(points_to_evaluate[0]),  # The dim of the space
            sample_points=points_sampled_clean,
            )

    if 'gp_historical_info' not in raw_payload:
        raw_payload['gp_historical_info'] = historical_data.json_payload()

    if 'domain_info' not in raw_payload:
        raw_payload['domain_info'] = {'dim': len(points_to_evaluate[0])}

    json_payload = json.dumps(raw_payload)

    json_response = call_endpoint_with_payload(rest_host, rest_port, endpoint, json_payload, testapp)

    output = GpEiResponse().deserialize(json_response) ##Changed

    return output.get('expected_improvement') ##Changed

Then I added this to my cycle_moe function, which now looks like this:

def cycle_moe(exp, R=50):
    rdf = pd.DataFrame({})
    tdf = pd.DataFrame({})
    for i in range(R):
        bounds = get_bounds(exp)
        if (i % 5 == 0):
            points_sampled = grab_pts_sampled(exp)
            t0 = time.time()
            gp_dim = len(points_sampled[0][0])
            hyper_dim = gp_dim + 1
            covariance_info = gp_hyper_opt(points_sampled,rest_host='192.168.99.100'
                ,**{'hyperparameter_domain_info' : { \
                'dim': hyper_dim, \
                'domain_bounds': [{'min': 0.01, 'max': 3.0},{'min': 0.03, 'max': 15},{'min': 0.03, 'max': 15}]  \
                #'domain_bounds': [{'min': 0.01, 'max': 2.0}] * hyper_dim                                                                                                      
                }}
                                          )
            print (str(i) +', time taken: '+str(round(time.time() - t0,3)))
            tdf[str(0)] = [time.time() - t0]
        next_point = gp_next_points(exp,rest_host='192.168.99.100',**{'covariance_info':covariance_info})[0]
        f_next = branin(next_point) # +  npr.normal(scale=std)
        points_sampled = grab_pts_sampled(exp)  ##Added
        ei = my_grab_ei(points_sampled, [next_point],rest_host='192.168.99.100',**{'covariance_info':covariance_info})[0] ##Added
        rdf = rdf.append({ 'x1': next_point[0], 'x2':next_point[1], 'f':f_next, 'ei': ei,   ##Modified
                         'cov_hyps': covariance_info['hyperparameters']},ignore_index=True)
        #points = exp.historical_data.to_list_of_sample_points()
        new_point = SamplePoint(next_point, f_next, 0)
        exp = Experiment(bounds)
        #points.append(new_point)
        #exp.historical_data.append_sample_points(points)
        exp.historical_data.append_sample_points([new_point])
    return rdf[['x1','x2','f','ei','cov_hyps']],tdf,exp,covariance_info ##Modified

I don't know how best to summarize this. It will probably be best to run it yourself and look at the contents of rdf, but here is the output from rdf.ei.describe()

count     50.000000
mean      47.673862
std       48.591592
min        2.522790
25%       10.211796
50%       22.794516
75%       85.673810
max      188.925347

Speaking of which, can you tell me which version of gcc and which distro/version of linux you're on? (edit: although probably irrelevant b/c the docker image should be independent of those things)

I'm on OS X Yosemite (10.10.5 ) and gcc --version returns 4.9.2. But I think you're right re: Docker handling this stuff...

Also, I got moe_examples to work. The output is below. I guess it is converging to [1,2.6]? But still, I need to make sure it converges in all cases, not just this one.

If I can't turn the corner on this problem soon, I think I'm gonna have to abandon MOE :(

Sampled f([1.88891236067, 4.0]) = 3.024281515239389817E-01
Sampled f([0.0, 2.9618376054]) = -9.838875168897800449E-01
Sampled f([0.0, 3.82570246393]) = -7.749819391510606170E-01
Sampled f([0.786838585499, 2.62777516463]) = -1.579648346558538918E+00
Sampled f([1.20312582365, 2.08982756487]) = -1.451454321864257269E+00
Updated covariance_info with {'hyperparameters': [0.823360900509, 1.01663242597, 1.28659618664], 'covariance_type': u'square_exponential'}
Sampled f([0.758268089974, 2.2331219232]) = -1.411622052413112893E+00
Sampled f([1.19599259406, 2.57708248725]) = -1.593354050232974606E+00
Sampled f([1.03968887999, 2.56816139896]) = -1.617582421292888650E+00
Sampled f([2.0, 2.24190928674]) = -1.018767759754114710E+00
Sampled f([1.05579477819, 2.56319110027]) = -1.616923815055614444E+00
Updated covariance_info with {'hyperparameters': [0.789039406073, 1.23435905217, 1.30371483948], 'covariance_type': u'square_exponential'}
Sampled f([1.0082757694, 2.62978522016]) = -1.616789052180044095E+00
Sampled f([2.0, 0.639342249258]) = -1.468008111448600994E-01
Sampled f([1.07167325368, 2.5252282002]) = -1.614562666452685313E+00
Sampled f([0.863802755471, 2.92560868142]) = -1.540054847282929629E+00
Sampled f([1.03653075905, 2.56537496547]) = -1.617587842392238073E+00
Updated covariance_info with {'hyperparameters': [0.737969968335, 1.24276395575, 1.37894109572], 'covariance_type': u'square_exponential'}
Sampled f([1.01903375152, 2.59415144623]) = -1.617994034119857982E+00
Sampled f([1.03422357833, 2.56392785481]) = -1.617583817670746216E+00
Sampled f([1.03150566466, 2.56489031089]) = -1.617640394284993066E+00
Sampled f([1.02916662686, 2.56574681541]) = -1.617681716433355454E+00
Sampled f([1.02711746293, 2.56651917111]) = -1.617712335919329059E+00
GP mean at (0, 0), (0.1, 0.1), ...: [0.98974570661, 0.963411209612, 0.911656230847, 0.835017255388, 0.735036207222, 0.614126994898, 0.475415337389, 0.322579331855, 0.159711048117, -0.00879206630745]

@suntzu86
Copy link
Contributor

I can understand having to abandon MOE... I'd be annoyed too if I were stuck for so long :( I was hoping to spend some time on this over the weekend but I ended up having to spend the weekend working on things for my job.

So I have a couple more suggestions for you; I'll try to debug it myself this week as hopefully work has settled down. Anyway...

  • The point of the EI thing was to try and see if MOE's internal optimizer was failing to find good "next_points" choices OR if MOE's internal model was wrong. EI (expected improvement) is MOE's proxy for the real world--it optimizes that internally and "next_points" is the point of best EI. You can pass a list of lists into points_to_evaluate, so before updating historical_data, query EI with coordinate list [[next_point], [optima_1], [optima_2], [optima_3]]. And we could see if EI is better at next_point or at any of the true optima. This isn't critical though, it's more something for me to look at. Sorry about the confusion, but that's what I was thinking.

Quick things to try:

  • It's possible that the default optimizer settings for EI are not appropriate here. Try adding the following to your POST request (so pass it in as a kwarg to gp_next_points function, just like you pass in covariance_info now): {"optimizer_info": {"optimizer_type": "gradient_descent_optimizer", "num_multistarts": 2000, "optimizer_parameters": {"gamma": 0.7, "pre_mult": 1.2, "max_relative_change": 0.5, "tolerance": 1.0e-7, "max_num_steps": 1200}}}. Try "gamma": 0.5 too perhaps. (This may be a lot slower... if it's too slow, try decreasing num_multistarts. The value you're currently using is 400.)
  • It's possible that gradient descent is just sucking for this task (newton would be vastly better; working on getting that together). So let's try dumb search: {"optimizer_info": {"optimizer_type": "null_optimizer", "num_random_samples": 10000000}}. (You may have to put "optimizer_parameters": {} in there too but I don't think so.) 10 million is a bit of a guess; if that's too slow, decrease it. If it's still super fast, try increasing it.

Hopefully I spelled all those fields correctly, lol.

Edit:

  • if you give up on MOE, I'll still eventually fix this b/c this is a pretty common case. I'll keep you posted.
  • if you got moe_examples working (that does look like it is converging), can you try putting branin into that routine you ran? I don't think there's anything wrong w/your script but maybe let's verify.

@johnroach
Copy link

Thank you for helping out @suntzu86 . Have you been able to work on this? I am having a similar problem.

@suntzu86
Copy link
Contributor

Nope, I haven't had a chance to dive deeper into this.

A team at Cornell where some of the original research for MOE was done put a paper up on arxiv recently:
http://arxiv.org/abs/1602.05149

They run a branin case successfully, code here:
http://pastebin.com/2syieQKU

It's not going through the REST API though, but you can hopefully see how the python objects map into the api's json fields (like TensorProductDomain -> domain).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants