-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch Optimization #71
Comments
We have been talking about batch optimization internally and think it would certainly be interesting to have (although not for the first version) and of course you are welcome to participate. Did you have any specific acquisition function in mind? |
Hi @nrontsis thank you for your interest in GPflowOpt! Batch optimization is definitely on our list. In fact, I have some local code and some preliminary tests on it but it is not suited for a PR at all, furthermore I wasn't entirely sure if the things I did were all the right choices so I would be very interested to get a discussion going and work towards extending the framework. If you are willing to contribute thats great. Depending on the outcome of the discussion, the code could be used as a starting point if we decide that is the right way to go. If it is ok for you, I'll initiate the discussion but feel free to steer the discussion in the right direction. A fundamental philosophy for GPflowOpt is that we want to provide a framework to allow easy implementation of BO strategies, rather than implementing a broad range of algorithms. When it comes to batch optimization it seems that there is a fundamental split between algorithms selecting a batch of N points sequentially (e.g. LP) or optimize the entire batch at once (e.g., q-KG, batch PES etc). Both approaches have quite different requirements in terms of implementation, so I have been in doubt what would be the best road for GPflowOpt. The latter seems more optimal but I don't know how feasible the optimization is as the optimization of the acquisition function can become high-dimensional. I would be very interested to hear your thoughts on this matter. |
Thanks @nknudde @javdrher for the quick replies! I believe that both approaches can be easily incorporated in the current framework.
Sequential optimisation is generally easier and that would be the vanilla choice for the practitioner. Moreover the ideas of, e.g. But either way, the first step would be to extend the framework to consider the acquisition function What do you think? :) |
I agree that supporting both is possible and should fit within the design. Your last remark for acquisition objects to be multi-point, I believe this is actually a requirement for the batch optimization rather than the sequential optimization? I actually think I already got that extension covered a while ago locally, are the changes in batch_support what you were thinking of? It replicates the domain for the optimizer, such that the candidate dimension becomes batch_size * dimension, then its already split so its easier to use it in Acquisition. If so, we could actually proceed by deciding on an acquisition to add to the framework itself and start with batch optimization itself. For sequential optimization, I don't think this belongs in optimizer for a couple of reasons. Right now an optimizer is simply something which optimizes some function over a domain and nothing more. Making it accept only acquisition objects would change that completely and make them acquisition specific. This would make optimizers less applicable (i.e., in LP you could now easily use them to obtain the Lipschnitz estimate by optimizing the gradient norm over the domain). Additionally, what with the other optimizers (mc, candidates, bayesianoptimizer itself). Finally, together with some of the changes we are planning on the domain to implement discrete parameters I believe this will mix up several requirements in a single class. The way I see it (though I might be missing something), in sequential mode the acquisition is optimized batch_size times over the same domain, but in a batch between each step something changes in the acquisition (penalizer, updating model with pseudo points etc.). So optimizer and its domain are the same, but acquisition is transformed. Furthermore, only acquisition knows exactly how to transform for a specific algorithm. We could add this logic partially in BayesianOptimizer by issuing a batch_update in an extra batch loop, but this morning I thought of something different: with acquisition.batch(steps=batch_size) as batch:
result = batch.optimize(self.optimizer) the yielded batch object from the context can be generic and simply alternate between optimizing and calling a batch_update method on an acquisition function (which defaults to NotImplementedError and should be implemented in order to support sequential batch optimization). For generic approaches like LP, we have them wrap another acquisition and implement the batch_update in the LPAcquisition. The returned value of batch.optimize are the batch candidates for evaluation. Finally, when the context ends, the batch object should revert the acquisition to its original state so it can be updated the normal way with the obtained observations, ready for another round. This snippet would then go in BayesianOptimizer. Let me know what you think of this, I'd be happy to help out with its implementation if needed. Last but not least, while browsing through the papers of the sequential batch algorithms there is often some form of more GP specific stuff going on to cope with the fact that no observations are available. This would imply proper checking if the model is actually a GPR, but it also means the fact that the scaling happens between acquisition and the model will pop up there. We have been talking about doing the scaling in Acquisition instead so code in build_acquisition is always operating on the same scale as the models, but we have been in doubt about it. This might be an extra argument to move the scaling. Don't worry about this for now. |
@javdrher Thanks, I think I agree to all of your points. Sorry I didn't notice Will come back for more discussion. |
I believe that this line should change to:
But, more generally, why do we require the input |
I pushed that branch this morning, it was a prototype on my laptop. You are right about the missing axis=1 parameter. I just found there are a few more required changes in my stash. I'll update the code. The reason we currently take 2D as input is to allow evaluation of many candidates at the same time in parallel (e.g., evaluate 500 MC points on EI at once, then optimize the best candidate so far). This makes sense when the dimensionality grows to evaluate a large starting grid: because the acquisition is coded in tensorflow this computation is multi-core/can happen on GPU. |
Hey, here are some thoughts after working a bit with the code: I don't see a benefit of using a context manager for optimising a batch. It would require additional code, and I cannot see any use for
while batch acquisition functions would overwrite this. Moreover, I think it would be good to make the input tensor
What do you think? Should I go on implementing it as I suggest? Thanks! |
I discussed this today with @icouckuy, and I'd like to make sure we are all on the same page. I'm sorry if this seems a bit slow but as the changes affect the framework quite a bit we want to proceed carefully and have requirements clear for everyone, so here is a rewind with some remarks towards your suggestions. Two general remarks:
Taking a small step away from the code, let's decompose both problems and map it to the architecture following the same principles we applied to come up with the design of the other parts of the framework.
batch = np.empty((0, domain.size))
for i in number_of_batch_points:
acquisition.set_batch(batch)
result = self.optimizer.optimize(inverse_acquisition)
batch = np.vstack((batch, result.x))
acquisition.set_batch(np.empty((0, domain.size))) # clears all modifications for batch The only difference with I think this plan clearly reflects that both approaches are distinct: they can even be developed independently as there are no feature dependencies. Let us know what you think, know that we highly appreciate your input in the discussion as in only a couple of posts the contours of how this should look became much sharper. If this seems like a good plan, I can probably write the tests for the |
Okay, that sounds like a solid plan, and I definitely understand you wanting to keep the framework clean. So, if I understand you correctly:
Although it makes sense to see the batch and sequential cases as separate, I believe that we should think of how they will merge on the same branch [1]. This could be implemented as following. When
For example, this would allow us to set If all that sounds good, I could work for the Sequential case on a new branch forked from
Should I proceed implementing the above and come back with a first PR for further discussion? As you suggested, you could work in [1]: The truth is that I'm biased on this: My work is on "true" batch optimisation, but I need to compare against the "vanilla" choice (i.e. "sequential" batch). So I prefer a framework that does both at the same time. |
Lets continue the design in the PR, thanks a lot. I agree on [1], right now we are in the process of releasing GPflowOpt 0.1 so we can not merge into master yet. It does make sense to merge our two branches together first and come up with a single batch branch which is then prepared for merging into master. |
Hi,
I am really interested in the project and, since I am working on Batch Bayesian Optimisation, I would be keen on extending GPflowOpt for it.
Would you be interested in discussion and pull requests for this topic?
Thanks!
Nikitas
The text was updated successfully, but these errors were encountered: