Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session configuration #241

Open
pmelchior opened this issue Apr 7, 2021 · 6 comments
Open

Session configuration #241

pmelchior opened this issue Apr 7, 2021 · 6 comments

Comments

@pmelchior
Copy link
Owner

Having a ton of arguments for the init method of various function makes the code really ugly and requires a fair bit of work when we want to expose an aspect of some piece of code to the user. On the other hand, a global config module becomes problematic when multiple instances of, say, sources and blend should run with different settings.

I propose we use a mechanism similar to Cache, which is globally defined by empty, and dynamically filled by every instance. This way we could create a dictionary like

config['ExtendedSource.resizing'] = True

which can be set up with defaults programmatically or even hard-coded with defaults. In the ExtendedSource constructor, we can then do one of these:

1. Session configs only

from .config import Config

class SingleExtendedSource(FactorizedComponent):
    def __init__(self, model_frame):

        morph, bbox = self.init_morph(
            model_frame,
            sky_coord,
            image,
            std,
            thresh=thresh,
            symmetric=True,
            monotonic="flat",
            min_grad=0,
            boxsize=Config['SingleExtendedSource.boxsize']
        )

        morphology = ExtendedSourceMorphology(
            model_frame,
            center,
            morph,
            bbox=bbox,
        )

That's the version of ultimate simplicity because every config setting we make is available everywhere simply by importing Config. No need to pass anything around. I think this would serve most of our use cases, make the code API much lighter, and prevent drilling holes into the code to expose yet another config parameter to the user.

A downside is for cases in which, say, sources are with different configs are run in the same blend. But I don't think the construction would be to bad:

from scarlet.config import Config

source = ExtendedSource(model_frame, center, observations)
Config['SingleExtendedSource.boxsize'] = 41
source = ExtendedSource(model_frame, center, observations)
Config.reset()

2. Session configs with override

from .config import Config

class SingleExtendedSource(FactorizedComponent):
    def __init__(self, model_frame, config=None):

        if config is None:
            config = Config

        morph, bbox = self.init_morph(
            model_frame,
            sky_coord,
            image,
            std,
            thresh=thresh,
            symmetric=True,
            monotonic="flat",
            min_grad=0,
            boxsize=config['SingleExtendedSource.boxsize']
        )

        morphology = ExtendedSourceMorphology(
            model_frame,
            center,
            morph,
            bbox=bbox,
            config=config
        )

Has a cleaner API than we currently have and can be overridden by the user, e.g. for construction (in the same blend) sources with different config parameters:

from scarlet.config import Config

source = ExtendedSource(model_frame, center, observations)
config = Config.copy()
config['SingleExtendedSource.boxsize'] = 41
source = ExtendedSource(model_frame, center, observations, config=config)

But it still requires drilling holes to pass the config argument to wherever it is needed. The alternative is that some config parameters are settable by the user at init time and some others aren't. As compromises go, I'm not a fan of this option.

Recommendation

We opt for option 1 with a twist:

from .config import Config

class SingleExtendedSource(FactorizedComponent):
    def __init__(self, model_frame):

        self.config = Config.copy()

This way we avoid the drilling holes thing and can persist any config setting in all classes that need that. I'm guessing only source classes will need that anyway.

@fred3m
Copy link
Collaborator

fred3m commented Apr 9, 2021

I'm more in favor of option 2, global variables are discouraged for a reason. While I know that you are in favor of just setting things once and not having to pass them around, I've been bitten too many times by having unexpected things happen and even in an option 2 scenario where I don't expect things to change, I will likely always pass around a config. Right now we're passing a chain of half a dozen arguments (that change from time to time) from initialization to source creation. Passing one variable that likely won't need to change seems like a pretty good compromise.

@pmelchior
Copy link
Owner Author

I expected you would think that way. So let me ask, what were these cases where you got bitten?

The problem I see is that we either have to pass configs around everywhere, or we have two classes of config parameters: those that a user can set and then the other ones. Not appealing to me.

@fred3m
Copy link
Collaborator

fred3m commented Apr 9, 2021

There have been a lot of instances over time. I can't think of anything recently because it's such an uncommon practice these days. Matplotlib does what I suggest, namely having a global set of "rc" parameters that can be updated by any given function on the fly. So I think that global constants are a good idea and should be used, but my implementation would be something like the following:

config.py:

default_config = {}

class Config:
    def __init__(self, config=None, key=None, **kwargs):
        if config is None:
            # Use the default config as a backup
            config = default_config.copy()
        elif isinstance(config, Config):
            config = config.config.copy()
        # Allow kwargs to update the current config, or default config
        config.update(kwargs)
        if key is not None:
            config = Config.get_config(config, key)
        self.config = config

    def __getitem__(self, key):
        """Retrieve a config parameter by key
        """
        return self.config[key]
    
    @staticmethod
    def get_config(config, key):
        """Retrieve a subset of config parameters
        """
        _config = {}
        for k,v in config.items():
            if k.startswith(key):
                if k[len(key)] == ".":
                    _config[k[len(key)+1:]] = v
        return _config

    def __repr__(self):
        return f"Config<{self.config}>"

Scarlet code would look like the following (note: I don't think that we should parameterize init_source and init_all_sources this way with max_components, I'm just showing how it could be done if we did want to add some custom config options to nested functions as well as classes):

from .config import default_config, Config

def init_all_sources(frame, centers, observations, config):
    # ...
    source = init_source(frame, center, observations, config)
    # ...

def init_source(frame, center, observations, config):
    _config = Config(config, "initialization.init_source")
    # ...
    components = _config["max_components"]
    while components >= 0:
        # ..
        src_config = Config(config, **{"ExtendedSource.K": components})
        source = ExtendedSource(frame, center, observations, src_config)
    # ...

default_config.update({
    "ExtendedSource.compact": False,
    "ExtendedSource.K",
})
def ExtendedSource(model_frame, sky_coord, observations, config):
    # ...
    _config = Config(config, "ExtendedSource")
    if _config["compact"]:
        return CompactExtendedSource(model_frame, sky_coord, observation, config)
    if _config["K"] == 1:
        return SingleExtendedSource(model_frame, sky_coord, observations, config)
    else:
        return MultiExtendedSource(model_frame, sky_coord, observations, config)
    # ...

default_config.update({
    "SingleExtendedSource.init.thresh": 1,
    "SingleExtendedSource.init_morph.symmetric": True,
    "SingleExtendedSource.init_morph.monotonic": "flat",
    "SingleExtendedSource.init_morph.min_grad": 0,
})
class SingleExtendedSource(FactorizedComponent):
    def __init__(self, model_frame, sky_coord, observations, config=None):
        # ...
        morph, bbox = self.init_morph(sky_coord, model_frame, image, std, config)
        center = model_frame.get_pixel(sky_coord)
        morphology = ExtendedSourceMorphology(model_frame, center, morph, bbox=bbox, config=config)
        # ...
    
    def init_morph(self, sky_coord, model_frame, image, std, config):
        _config = Config(config, "SingleExtendedSource.init_morph")
        # ...
        if _config["symmetric"]:
            # ...

default_config.update({
    "ExtendedSourceMorphology.monotonic": "angle",
    "ExtendedSourceMorphology.symmetric": False,
    "ExtendedSourceMorphology.min_grad": 0,
    "ExtendedSourceMorphology.shifting": False,
    "ExtendedSourceMorphology.resize": True,
})
        
class ExtendedSourceMorphology(ImageMorphology):
    def __init__(self, frame, center, image, bbox=None, config=None):
        _config = Config(config, "ExtendedSourceMorphology")

        if _config["monotonic"] is True:
            # ...

Then in code (like the quickstart guide) we would use

from scarlet.config import Config

my_config = Config(**{
    "initialization.init_source.max_components": 2,
    "ExtendedSourceMorphology.resize": True,
})

sources, skipped = scarlet.initialization.init_all_sources(
    model_frame,
    centers,
    observation,
    my_config,
)

This leaves minimal boilerplate in classes and functions with config options, just a single line _config = Config(config, "ClassName.methodname") gives access to the config parameters for a given method. If you want to change the global scarlet.config.default_config dictionary than you can, or if you want to pass it through to have more control that is possible too.

@fred3m fred3m closed this as completed Apr 9, 2021
@fred3m fred3m reopened this Apr 9, 2021
@taranu
Copy link

taranu commented Apr 13, 2021

I'm following along because I have essentially the same problem and would appreciate someone else coming up with a solution before I have to.

I would strongly suggest at least considering making the Config class a dataclass, as it's much easier to debug, document and introspect than a dynamic dict. You can always give it a dict attribute for dynamic configuration if you really need to, and also easily make it hierarchical and freeze parameters you don't want users to change. I think this would be preferable to the single global config dict. Consider how much nicer matplotlib's rcParams documentation would be to read and maintain if it was a single class with a type and docstring next to each parameter (though admittedly I'm not sure how you'd get the docstring to work like in pex.Config).

@fred3m
Copy link
Collaborator

fred3m commented Apr 13, 2021

@pmelchior feels strongly about having a global config, which is why I used the global dict. If not, I would make a config class that is the same as the one above without the default_config dict. Each method/class would then use keyword arguments as usual and would unpack the config for any classes or methods it calls, for example

from scarlet.config import Config

my_config = Config(
    "ExtendedSourceMorphology.resize": True,
)

sources, skipped = scarlet.initialization.init_all_sources(
    model_frame,
    centers,
    observation,
    max_components=2,
    min_components=1,
    config=my_config,
)


from .config import Config

def init_all_sources(frame, centers, observations, max_components, min_components, config):
    # ...
    source = init_source(frame, center, observations, config)
    # ...

def init_source(frame, center, observations, max_components, min_components, config):
    # ...
    components = max_components
    while components >= 0:
        # ..
        _config = Config(config, "ExtendedSource")
        source = ExtendedSource(frame, center, observations, **_config, config=config)
    # ...

def ExtendedSource(model_frame, sky_coord, observations, compact=False, K=1, config):
    # ...
    if compact:
        return CompactExtendedSource(model_frame, sky_coord, observation, config=config)
    if K == 1:
        return SingleExtendedSource(model_frame, sky_coord, observations, config=config)
    else:
        _config = Config(config, "MultiExtendedSource")
        return MultiExtendedSource(model_frame, sky_coord, observations, **_config, config=config)
    # ...

class SingleExtendedSource(FactorizedComponent):
    def __init__(self, model_frame, sky_coord, observations, config=None):
        # ...
        morph, bbox = self.init_morph(sky_coord, model_frame, image, std, config)
        center = model_frame.get_pixel(sky_coord)
        morph_config = Config(config, "ExtendedSourceMorphology")
        morphology = ExtendedSourceMorphology(model_frame, center, morph, bbox=bbox, **morph_config)
        # ...
    
    def init_morph(self, sky_coord, model_frame, detect, detect_std,
                   thresh=1, symmetric=True, monotonic=False):
        # ...
        if symmetric:
            # ...

@taranu
Copy link

taranu commented Apr 16, 2021

My comment wasn't so much about how you use the config classes and where you put them but how you define them. A Config dataclass could look like:

@dataclass
class Config:
    dynamic: Dict = field(default_factory=dict)
    immutable: ImmutableConfig = ImmutableConfig()
    initialization: InitializationConfig = InitializationConfig()
    opt1: bool = False

... etc, and where you can make ImmutableConfig @dataclass(frozen=True), for example.

Alternatively, you could use pex_config since it's a standalone pip-installable package designed for exactly this purpose (and probably what I'll use in non-Rubin projects now that I've thought about it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants