Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next points hangs on precisely defined dimension #444

Open
orasis opened this issue Oct 14, 2015 · 6 comments
Open

next points hangs on precisely defined dimension #444

orasis opened this issue Oct 14, 2015 · 6 comments
Assignees

Comments

@orasis
Copy link

orasis commented Oct 14, 2015

I'm attempt to constrain one of the dimensions to a single value, in this case 3752 as evidenced by the min and max of 3752. This query never seems to return.

Why would one want to do this? I don't want the GP to search those dimensions that are invariant.

The work around is to set the max to min+1. This is, however, not ideal because my domain_bounds dynamically change and I don't want to have to keep this edge case in mind.

INFO [moe.views.pretty_view][worker 1] {'content': {u'domain_info': {u'dim': 2, u'domain_bounds': [{u'max': 3752, u'min': 3752}, {u'max': 3600, u'min': 60}]}, u'gp_historical_info': {u'points_sampled': [{u'value_var': 0.01, u'value': 0.25, u'point': [3669, 1000]}, {u'value_var': 0.01, u'value': 0.25, u'point': [3669, 1000]}, {u'value_var': 0.01, u'value': 0.16666666666666666, u'point': [3752, 1000]}]}, u'num_to_sample': 1}, 'endpoint': 'gp_next_points_epi', 'type': 'request'}

@jialeiwang
Copy link
Contributor

If you only search along one of the dimensions while the other is fixed, you should consider using 1-dimension GP instead of 2-d, because the dimension which is fixed is useless to the search problem, while adding a lot of "distraction" to the search problem.

@orasis
Copy link
Author

orasis commented Oct 22, 2015

Is that true? If we had 2 dimensions that are day of week and hour of day.
Let's say on most days 8am is optimal, but on Saturday and Sunday 10am is
optimal. What is the best way to search for this? I think you'd want the
extra information from the day of week correlations even if you are just
searching for the optimal time on Sunday. Maybe I misunderstood how the GP
works internally.

Thanks!

On Wed, Oct 14, 2015 at 03:02 Jialei Wang [email protected] wrote:

If you only search along one of the dimensions while the other is fixed,
you should consider using 1-dimension GP instead of 2-d, because the
dimension which is fixed is useless to the search problem, while adding a
lot of "distraction" to the search problem.


Reply to this email directly or view it on GitHub
#444 (comment).

@jialeiwang
Copy link
Contributor

Interesting, you are right. In this case you should use 2-d GP for a better modeling. The current implementation of MOE assumes the search dimension is the same as the problem dimension, and the goal of optimization is to find the maxima in this whole space. In your example, current implementation of MOE finds which day AND which time is the optimal, while your goal is to find which time on Sunday is optimal. You should do two tweaks to fit your problem:

  1. the best value sampled so far, which is used to compute ExpectedImprovement, in your case, is the best value sampled on Sunday, because your optimization objective is the value on Sunday;
  2. modify the optimization module to search the point along the dimension of time while the day is fixed to Sunday.

I am not sure when this new feature will be added, although it seems an important feature to pursue. If you like to play with the code instead of waiting, I can point out the starting places for you. Note that the python interface gives you greater flexibility to work on your problem, therefore, I assume you will use python interface, in particular, python_version instead of cpp_wrappers.

For point 1, modify the way of computing ExpectedImprovement._best_so_far in moe/optimal_learning/python/python_version/expected_improvement.py

Point 2 is trickier, GradientDescentOptimizer in moe/optimal_learning/python/python_version/optimization.py is a good starting point if you want to use gradient decent to optimize, or LBFGSOptimizer if you want to use BFGS. If you sample one point at a time, I recommend gradient decent, and BFGS if you sample multiple points at a time.

If you have difficulty in building the source on your local machine due to the C++ component, use this branch https://github.com/Yelp/MOE/tree/jwang_445_install_pure_python_modules_only
You need to export env variable MOE_NO_BUILD_CPP=True, for this build, you cannot use web interface for now. Hope it helps!

@orasis
Copy link
Author

orasis commented Nov 3, 2015

Unfortunately my Python is weak, so it may be some time before I can be motivated.

I think this raises two separate issues. The original bug is still a bug because it is a trivial denial-of-service attack on the MOE REST interface.

The second issue is this contextual search optimization. The concrete target I would give for contextual search would be Q-Learning using a Guassian Process where the GP maps (state, action) pairs to the reward value (Q). Typically in the case, the action space is being searched while the state space is fixed. So your points would look like (stateX, stateY, stateZ, actionX, actionY) -> Q

@suntzu86
Copy link
Contributor

suntzu86 commented Nov 4, 2015

Sorry I've been gone for so long! See: #448 for details.

Whoa whoa. First, @jialeiwang, I don't think any of those modifications are necessary right now. This is definitely a bug. Keep reading for more details, a workaround, and a fix.

This is the notion of an "untunable parameter." MOE wasn't built with these in mind, but the basic premise is if I'm searching over x,y, then to search in a subspace where we already know y=2.5. For now, MOE would only support this by setting a 0-length domain boundary as @orasis did (keep reading for how). In the future, it'd be preferable to carry the notion of untunable-ness into the internals so we do not attempt to optimize untunables (not high priority b/c the cost is low).

Time is a great example of an untunable parameter. Material and environment properties are often untunable as well (like an airplane cannot pick the temperature at which it flies; I may want to design a structure with a new type of steel given some existing designs; etc).

Note that you still need to tune hyperparameters for your untunable dimensions. Additionally, say I have some untunables in [8, 12] and MOE estimates a length scale of 2.0. If I then set an untunable parameter with value 30, the minimum scaled distance is 9! This is huge and will cause MOE to revert to the prior (b/c the covariance will be nearly 0). So be careful when estimating very far away from previously sampled values; you might be getting random noise back.

@orasis, you have two options.
Easy:
Similar to what you tried, set your domain to [a, b] where b = a*(1.0 + 1.0e-14). So the domain width is TINY. This will do a lot better than setting b = a + 1, as for all intents and purposes, you're forcing MOE to always use a for that dimension.

Small caveat: You need to ensure that the hyperparameter length scale for this dimension is say > a*1.0e-11 (1000x the edge length). Then MOE knows there is no variation over the (tiny) domain edge width.

Harder: (I'll be doing this hopefully soon)
The reason this happens is MOE's internal optimizer multistarts on a latin hypercube. This kernalizes finding random points on [min, max) (each pair of domain bounds). So your query would run [a,a) (a degenerate range). B/c we didn't start with C++11, this uses boost::uniform_real (in boost::random), which in turn says [a,a) is undefined behavior, and it infinite loops :(

C++11's equiv object, std::uniform_real_distribution (in <random>), does not do this. A random value in [a,a) returns a.

This happens in moe/optimal_learning/cpp/gpp_random.cpp. So add
#include <random>
at the top of that file (right above #include <vector>), and change line 184 to:
std::shuffle(index_array.begin(), index_array.end(), uniform_generator->engine);
and change line 186 to:
std::uniform_real_distribution<double> uniform_double_distribution(0.0, subcube_edge_length);

This will cause some unit tests to fail b/c testing with randomness is hard, and the stdlib version and the boost version don't generate identical output (due to how uniform_real_distribution and uniform_real are implemented).

@orasis
Copy link
Author

orasis commented Nov 4, 2015

Excellent suggestions Eric. Thank you.
On Tue, Nov 3, 2015 at 22:16 Eric Liu [email protected] wrote:

Sorry I've been gone for so long! See: #448
#448 for details.

Whoa whoa. First, @jialeiwang https://github.com/jialeiwang, I don't
think any of those modifications are necessary right now. This is
definitely a bug. Keep reading for more details, a workaround, and a fix.

This is the notion of an "untunable parameter." MOE wasn't built with
these in mind, but the basic premise is if I'm searching over x,y, then to
search in a subspace where we already know y=2.5. For now, MOE would only
support this by setting a 0-length domain boundary as @orasis
https://github.com/orasis did (keep reading for how). In the future,
it'd be preferable to carry the notion of untunable-ness into the internals
so we do not attempt to optimize untunables (not high priority b/c the cost
is low).

Time is a great example of an untunable parameter. Material and
environment properties are often untunable as well (like an airplane cannot
pick the temperature at which it flies; I may want to design a structure
with a new type of steel given some existing designs; etc).

Note that you still need to tune hyperparameters for your untunable
dimensions. Additionally, say I have some untunables in [8, 12] and MOE
estimates a length scale of 2.0. If I then set an untunable parameter
with value 30, the minimum scaled distance is 9! This is huge and will
cause MOE to revert to the prior (b/c the covariance will be nearly 0). So
be careful when estimating very far away from previously sampled values;
you might be getting random noise back.

@orasis https://github.com/orasis, you have two options.
Easy:
Similar to what you tried, set your domain to [a, b] where b = a*(1.0 +
1.0e-14). So the domain width is TINY. This will do a lot better than
setting b = a + 1, as for all intents and purposes, you're forcing MOE to
always use a for that dimension.

Small caveat: You need to ensure that the hyperparameter length scale for
this dimension is say > 1.0e-11 (1000x the edge length). Then MOE knows
there is no variation over the (tiny) domain edge width.

Harder: (I'll be doing this hopefully soon)
The reason this happens is MOE's internal optimizer multistarts on a latin
hypercube. This kernalizes finding random points on [min, max) (each pair
of domain bounds). So your query would run [a,a) (a degenerate range).
B/c we didn't start with C++11, this uses boost::uniform_real (in
boost::random), which in turn says [a,a) is undefined behavior, and it
infinite loops :(

C++11's equiv object, std::uniform_real_distribution (in ), does
not do this. A random value in [a,a) returns a.

This happens in moe/optimal_learning/cpp/gpp_random.cpp. So add
#include
at the top of that file (right above #include ), and change line
184 to:
std::shuffle(index_array.begin(), index_array.end(),
uniform_generator->engine);
and change line 186 to:
std::uniform_real_distribution uniform_double_distribution(0.0,
subcube_edge_length);

This will cause some unit tests to fail b/c testing with randomness is
hard, and the stdlib version and the boost version don't generate identical
output (due to how uniform_real_distribution and uniform_real are
implemented).


Reply to this email directly or view it on GitHub
#444 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants