You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When creating configs using hydra-zen, the following pattern is quite common:
fromhydra_zenimportbuilds# import some moduleimportsome_moduleasmd# convert all of its objects that you want to use to config classesConfA=builds(md.A, populate_full_signature=True)
ConfB=builds(md.B, populate_full_signature=True)
# etc..# create instances of configs with particular detailsconf_a=ConfA(x=2, y=3)
conf_b=ConfB("hello")
# etc.
Often times steps 2 and 3 are combined: ConfA = builds(md.A, x=2, y=3, populate_full_signature=True). Regardless, it would be nice to be able to minimize this boilerplate code.
Potential solution
There have been discussions about making this process more ergonomic. One idea is to create a hydrate function, that will auto-apply builds (or variations of builds) to all public members of a module.
dummy_module_configs can then be registered as a node in Hydra's config store, for easy/intuitive access to all of these configs from the overrides API / CLI.
Furthermore, this leads to clean, readable config-creation code. Consider configuring a pipeline of torchvision transformations:
This code has complete parity with how one would actually create this augmentation pipeline in their ML code.
Some Problems...
Here are some issues with hydrate that I can anticipate
Getting type checkers to understand what the heck is going on
Presently, I do not think that there is any way to tell static tooling that hydrate(module) returns dataclass whose attributes' names match those of module, but whose values are all Type[Builds[...]].
Ultimately, we want users of hydrate(module) to benefit from auto-complete on both attribute names and object signatures. The only way I can think of delivering these things is by lying and annotating hydrate(module) as simply returning module... Thus hydrate(transforms) will look identical to transforms to static tools.
This will produce false positives for users: type-checkers will mark some code patterns as invalid because they do not realize that they are dealing with dataclasses.
This is a big blocker for hydrate. I will need to create discussions on the Python typing mailing list to see if maintainers of the various type checkers have any recommendations here. I do not want hydra-zen users to have static tooling marking a bunch of false positives throughout their code. I am willing to release this as an experimental feature, and only recommend its use in places where static tooling will not mark many false positives.
Not all configs should be produced by builds(<target>, populate_full_signature=True)
Although it is a sensible default to apply builds(<target>, populate_full_signature=True) to all objects in a module, this behavior is not always desirable. For instance, optimizers in torch.optim almost always need to be partial-configs, because the model parameters that they will optimize are never available at config/instantiation time. Thus hydrate needs to provide some control over how it creates configs.
As a potential solution, we might design hydrate as follows:
defhydrate(
module,
default_config_creation_fn=make_custom_builds_fn(populate_full_signature=True),
class_specific_config_fns=None,
excluded_names=None, # exclude particular items from `__all__`target_names=None, # if provided, takes precedent over `__all__`
):
...
Summary
hydra_zen.hydrate
can be applied to a namespace/module, and return a corresponding Hydra node of configs that describe that namespace/module.In effect, the entire
hydra-torch
project could be replaced with:(well... not exactly, but that is the rough idea)
Motivation
When creating configs using hydra-zen, the following pattern is quite common:
Often times steps 2 and 3 are combined:
ConfA = builds(md.A, x=2, y=3, populate_full_signature=True)
. Regardless, it would be nice to be able to minimize this boilerplate code.Potential solution
There have been discussions about making this process more ergonomic. One idea is to create a
hydrate
function, that will auto-applybuilds
(or variations ofbuilds
) to all public members of a module.E.g. consider
dummy_module.py
:Then running
will create the following structured config:
dummy_module_configs
can then be registered as a node in Hydra's config store, for easy/intuitive access to all of these configs from the overrides API / CLI.Furthermore, this leads to clean, readable config-creation code. Consider configuring a pipeline of
torchvision
transformations:This code has complete parity with how one would actually create this augmentation pipeline in their ML code.
Some Problems...
Here are some issues with
hydrate
that I can anticipateGetting type checkers to understand what the heck is going on
Presently, I do not think that there is any way to tell static tooling that
hydrate(module)
returns dataclass whose attributes' names match those ofmodule
, but whose values are allType[Builds[...]]
.Ultimately, we want users of
hydrate(module)
to benefit from auto-complete on both attribute names and object signatures. The only way I can think of delivering these things is by lying and annotatinghydrate(module)
as simply returningmodule
... Thushydrate(transforms)
will look identical totransforms
to static tools.This will produce false positives for users: type-checkers will mark some code patterns as invalid because they do not realize that they are dealing with dataclasses.
This is a big blocker for
hydrate
. I will need to create discussions on the Python typing mailing list to see if maintainers of the various type checkers have any recommendations here. I do not want hydra-zen users to have static tooling marking a bunch of false positives throughout their code. I am willing to release this as an experimental feature, and only recommend its use in places where static tooling will not mark many false positives.Not all configs should be produced by
builds(<target>, populate_full_signature=True)
Although it is a sensible default to apply
builds(<target>, populate_full_signature=True)
to all objects in a module, this behavior is not always desirable. For instance, optimizers intorch.optim
almost always need to be partial-configs, because the model parameters that they will optimize are never available at config/instantiation time. Thushydrate
needs to provide some control over how it creates configs.As a potential solution, we might design
hydrate
as follows:where a user could specify:
This would tell
hydrate
to applyzen_partial=True
when creating a config forOptimizer
and for all subclasses ofOptimizer
Feedback
Does
hydrate
seem useful? How would you use it? Did I fail to cover specific use cases here? Are there other pitfalls that I am missing?The text was updated successfully, but these errors were encountered: