Surrogate scaling #315

AdrianSosic · 2024-07-17T08:00:55Z

This PR refactors our scaling logic, introducing a mechanism that gives fine control over the applied scaling approach.

Preparation for use with sklearn's ColumnTransformer, which spits out arrays

baybe/utils/scaling.py

baybe/surrogates/base.py

CHANGELOG.md

baybe/searchspace/continuous.py

baybe/searchspace/core.py

baybe/searchspace/discrete.py

baybe/utils/scaling.py

baybe/surrogates/base.py

streamlit/surrogate_scaling.py

baybe/surrogates/gaussian_process/core.py

baybe/surrogates/base.py

Became necessary due to the known-third-party ruff flag

For this PR, we leave the mechanism untouched: * Parameters are normalized based on search space bounds * Targets are standardized based on observed measurements

In order to differentiate from target scaling

baybe/surrogates/base.py

AVHopp

Just two minor comments

baybe/surrogates/base.py

The decorator is no longer compatible with the generalized surrogate layout. Instead of upgrading the decorator, a replacement mechanism using Python's built-in typing.Protocol will be introduced.

baybe/surrogates/base.py

PR #315 introduced the new configurable class-based scaling approach, but: * did so using some rather heavy hierarchy of methods * introduced a conceptual bug in that the scaling logic was not consistently applied in the correct way (e.g. `best_f` should not have been scaled) * the resulting layout implied a rather unclear interface for a `SurrogateProtocol`, where users providing such protocols would also need to expose transform related methods that should actually stay surrogate internal. This PR refactors scaling mechanism with the final interfaces in mind, in particular the newly introduced `SurrogateProtocol` class. ### Problems solved * Much of the transform-related machinery was removed. Left are only two scaler attributes (one for input, one for output) which are purely class internal. * The result is a clean `SurrogateProtocol` interface, which imposes the two required mechanisms on the user: * `fit` (i.e. how is the custom-defined surrogate to be trained) and * `to_botorch` (i.e. how is the trained surrogate converted to be made compatible with `botorch`'s machinery) * The `Surrogate` base class now clearly offers the three layers of connection that are dictated from the surrounding: * a posterior method intended for the user, interfacing experimental representations * a posterior method for the computational layer operating with tensors in computational representation, for interplay with `botorch` * a posterior method that focusses only on the surrogate architecture where transformation and scaling is abstracted away, intended for overriding in subclasses * As a result, scaling is now completely encapsulated inside the surrogate so that objects outside do not need to bother about surrogate internals. That means, questions like "do we need to scale certain quantities before passing them to the surrogate" (such as `best_f` or `X_pending`) are trivially answered with "No", since scaling is not visible outside the surrogate. * Because scaling happens inside the torch layer, it is now part of the computational torch graph, meaning that backpropagation through the entire surrogate model is supported.

Completes the surrogate factoring, which extended over #278, #309, #315, #325, #337. ### Most important changes * The transition point from experimental to computational representation has been moved from the recommender to the surrogate. From an architecture/responsibility perspective, this is reasonable since the recommend should not have to bother about algorithmic/computational details. * The desired consequence is that public `Surrogate` methods like `posterior` and `fit` can now operate on dataframes in experimental representation, meaning they can also be exposed directly to the user. * The new posterior methods now all return a general `Posterior` object instead of implicitly assuming Gaussian distributions. This paves the way for arbitrary surrogate extensions, such as Bernoulli/Categorical surrogates, etc. At the moment, this introduces an explicit coupling to botorch, which is fine because botorch remains a core dependency and the only backend used for complex surrogate modeling. In the future, this can be further abstracted by introducing our own `Posterior` class. * The `Surrogate` layout has been refined such that the extracted `SurrogateProtocol`, which now defines the formal interface for all surrogates, imposes minimal requirements to the user. * Scaling has been completely redesigned, offering the possibility to configure input/output scaling down to the level of individual parameters and targets. The configuration is currently class-specific, but can be extended to allow surrogate instance specific rules in the future.

AdrianSosic added 2 commits July 15, 2024 21:54

Remove current scaling functionality

d9aefe5

Make to_tensor also handle numpy arrays

369da45

Preparation for use with sklearn's ColumnTransformer, which spits out arrays

AdrianSosic added enhancement Expand / change existing functionality refactor labels Jul 17, 2024

AdrianSosic self-assigned this Jul 17, 2024

AdrianSosic changed the base branch from main to dev/surrogates July 17, 2024 08:01

AdrianSosic force-pushed the refactor/surrogates/scaling branch 2 times, most recently from c3ade11 to 59eed75 Compare July 17, 2024 08:06

Scienfitz reviewed Jul 18, 2024

View reviewed changes

baybe/utils/scaling.py Outdated Show resolved Hide resolved

baybe/utils/scaling.py Outdated Show resolved Hide resolved

baybe/utils/scaling.py Outdated Show resolved Hide resolved

AdrianSosic force-pushed the refactor/surrogates/scaling branch from 2f5f851 to e44c145 Compare July 19, 2024 07:32

AdrianSosic commented Jul 19, 2024

View reviewed changes

baybe/surrogates/base.py Show resolved Hide resolved

Replace param_bounds_comp with comp_rep_bounds

0ede1cc

AVHopp reviewed Jul 21, 2024

View reviewed changes

AdrianSosic force-pushed the refactor/surrogates/scaling branch from e44c145 to 53ccd4c Compare July 22, 2024 08:13

Draft input scaling mechanism

00c40ae

AdrianSosic force-pushed the refactor/surrogates/scaling branch from 53ccd4c to b6c56e9 Compare July 22, 2024 08:21

AdrianSosic added 2 commits July 22, 2024 10:28

Introduce ScalerProtocol class

79f8f44

Make transformation return a dataframe

24f2c49

AdrianSosic force-pushed the refactor/surrogates/scaling branch from b6c56e9 to 1a39f62 Compare July 22, 2024 08:28

AdrianSosic marked this pull request as ready for review July 22, 2024 09:21

Scienfitz reviewed Jul 22, 2024

View reviewed changes

streamlit/surrogate_scaling.py Outdated Show resolved Hide resolved

baybe/surrogates/gaussian_process/core.py Outdated Show resolved Hide resolved

baybe/surrogates/base.py Outdated Show resolved Hide resolved

AdrianSosic added 9 commits July 22, 2024 21:36

Update streamlit dev script

2938c48

Fix handling of dropped columns in ColumnTransformer

ae1a366

Remove obsolete TODO note

5068148

Make surrogate scaling work with continuous parameters

fb14927

Rename _get_parameter_scaler to _make_parameter_scaler

c3a4cc6

Draft output scaling mechanism

64b5450

Silence warning by allowing extra columns

6dad04a

Improve signatures

25e356a

Harmonize terminology

2a2849b

AdrianSosic added 2 commits July 22, 2024 21:36

Update test for empty bounds

920b079

Fix import order

cdf6688

Became necessary due to the known-third-party ruff flag

AdrianSosic force-pushed the refactor/surrogates/scaling branch from 8508b23 to cdf6688 Compare July 22, 2024 19:36

AdrianSosic added 5 commits July 22, 2024 21:55

Decide for transformation approach

6e052f7

For this PR, we leave the mechanism untouched: * Parameters are normalized based on search space bounds * Targets are standardized based on observed measurements

Update docstrings

ef84a35

Remove separate scaling logic from GPs

2b3dcab

Rename ScalerProtocol to ParameterScalerProtocol

161bddb

In order to differentiate from target scaling

Update CHANGELOG.md

e7f3f67

Scienfitz reviewed Jul 23, 2024

View reviewed changes

baybe/surrogates/base.py Outdated Show resolved Hide resolved

baybe/surrogates/base.py Show resolved Hide resolved

Scienfitz approved these changes Jul 23, 2024

View reviewed changes

Replace literal return type with None

21953d4

AdrianSosic force-pushed the refactor/surrogates/scaling branch from 8360c67 to 21953d4 Compare July 23, 2024 19:01

AVHopp reviewed Jul 23, 2024

View reviewed changes

baybe/surrogates/base.py Show resolved Hide resolved

baybe/surrogates/base.py Outdated Show resolved Hide resolved

AdrianSosic added 3 commits July 24, 2024 10:52

Implement workaround to circumvent ColumnTransformer limitations

536a3a8

Improve code grouping

b88b3ba

Remove register_custom_architecture decorator

1619bd7

The decorator is no longer compatible with the generalized surrogate layout. Instead of upgrading the decorator, a replacement mechanism using Python's built-in typing.Protocol will be introduced.

AVHopp reviewed Jul 24, 2024

View reviewed changes

baybe/surrogates/base.py Show resolved Hide resolved

AdrianSosic added this to the Surrogate refactoring milestone Jul 24, 2024

AdrianSosic merged commit 2f5fa21 into dev/surrogates Jul 24, 2024
10 checks passed

AdrianSosic deleted the refactor/surrogates/scaling branch July 24, 2024 13:58

AdrianSosic mentioned this pull request Jul 26, 2024

Surrogate interface #325

Merged

AdrianSosic mentioned this pull request Aug 9, 2024

Refactor Surrogates #338

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surrogate scaling #315

Surrogate scaling #315

AdrianSosic commented Jul 17, 2024

AVHopp left a comment

Surrogate scaling #315

Surrogate scaling #315

Conversation

AdrianSosic commented Jul 17, 2024

AVHopp left a comment

Choose a reason for hiding this comment