Parsing a List of dataclasses #271

JonasFrey96 · 2023-07-05T07:50:38Z

Hello,

it would be great if we could add support to pass a List of dataclasses:

@dataclass
class ExperimentParams(Serializable):
        @dataclass
        class LayerParams:
            name: str
            loss_scale: float = 1.0

        target_layers: List[any] = field(
            default_factory=lambda LayerParams=LayerParams: [
                LayerParams( name="loss_a", loss_scale=1.0),
                LayerParams(name="loss_b", loss_scale=2.0),
            ]
        )

Is there any way to elegantly parse such a configuration file:

python train.py --target_layers_0.loss_scale 10

This relates to:

                raise NotImplementedError(
                    f"Field {field.name} is of type {field_type}, which isn't "
                    f"supported yet. (container of a dataclass type)"
                )

If someone can point me toward what to change, I am happy to contribute.

The text was updated successfully, but these errors were encountered:

lebrice · 2023-07-05T14:46:39Z

Hello again @JonasFrey96 !

There is limited support for something similar, but not exactly the same:

https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_example.py

In particular, this example here might be of interest to you: https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_lists_example.py

Let me know what you think!

JonasFrey96 · 2023-07-05T16:01:37Z

In my specific use case, it would be great to allow for parsing a list of data classes.
This would help a lot. I tried to look into the code, but it is quite overwhelming and seems to be none trivial.

lebrice · 2023-07-09T19:53:21Z

Hey @JonasFrey96 , did you look at this example? https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_lists_example.py

MiguelMonteiro · 2023-07-11T10:49:52Z

Hi, I think this feature would be very useful for things like lists of data augmentations in deep learning code. Is it planned to be implemented?

hugobb · 2023-07-31T04:24:39Z

Hi, I'm also interested in this feature. From my understanding the idea of the feature, would be to enable this type of hierarchical dataclass definition with List.

@dataclass
class CNNStack:
    name: str = "stack"
    num_layers: int = 3
    kernel_sizes: Tuple[int, int, int] = (7, 5, 5)
    num_filters: List[int] = field(default_factory=[32, 64, 64].copy)


@dataclass
class Config:
    stack_list: List[CNNStack] = field(default_factory=[CNNStack(), CNNStack(), CNNStack()].copy)

parser = ArgumentParser(conflict_resolution=ConflictResolution.ALWAYS_MERGE)
parser.add_arguments(Config, dest="config", default=Config())
args = parser.parse_args()
print(args)

However this is not supported yet, this only works if we manually construct the parser which in my use cases is not very convenient.
I would also be interested in helping with this, but also not sure where to start.

hugobb · 2023-07-31T15:42:18Z

I think the challenging part and the main difference with the multiple_lists_example.py is that in the proposed feature the number of CNNStack would not be fixed but could vary depending on the number of arguments provided by user.

lebrice · 2023-08-01T02:45:44Z

Hey there @hugobb , @MiguelMonteiro , thanks for commenting.

I agree with you, this feature would be great to have.

@hugobb you're right: this would be challenging to implement, especially if we don't know how many dataclasses to parse in advance!

I guess a first step could be to make this work when we know the number of classes in advance, for instance something like cnn_blocks: tuple[CnnBlock, CnnBlock]
. (As far as I recall, this isnt already supported, but I might be wrong, it's been a while since I've worked on this)

Then, as a next step, we could probably figure out the number of values by inspecting the parsed values during postprocessing, or by adding an argument that can be used explicitly, like --cnn_blocks=2 or similar.

This feature will probably require a substantial refactoring of the current "reuse" feature, which is quite messy:

the conflict resolution mode might need to become local on a per-dataclass basis instead of global
the parsing functions would need to share the information of how many values are expected (IIRC this is already partly the case, but I'd like to make this more explicit)

I could potentially invest some time in order to try to get this to work, but I'd like to get a bit more info if possible before that.

Here are some design-y questions to start us off:

what differences would there be (if any) with the current reuse feature with respect to how the values for each dataclass are passed from the command-line?
- How would you expect to specify a different value for one of the fields of, say, the last dataclass in the list?
how should the number of dataclasses to parse be determined? Should it be explicitly passed via an argument, or should it be inferred from the passed values? Will this always be possible? What should we do if/when it isn't?

Thanks again @hugobb @MiguelMonteiro @JonasFrey96 for posting. Lets keep this ball rolling.

MiguelMonteiro · 2023-08-09T13:04:42Z

Hi @lebrice, thanks for your answer.

what differences would there be (if any) with the current reuse feature with respect to how the values for each dataclass are passed from the command-line?

I am not sure how this feature works, sorry

How would you expect to specify a different value for one of the fields of, say, the last dataclass in the list?

Perhaps the most intuitive way is for the parser to automatically append ".1", ".2", etc... to the end of the name.
For example:
--cnn_stack.1.num_layers=2 --cnn_stack.2.num_layers=5

how should the number of dataclasses to parse be determined? Should it be explicitly passed via an argument, or should it be inferred from the passed values? Will this always be possible? What should we do if/when it isn't?

Ideally, it would be infered from the arguemnts passed from the command line.

What I am still unsure is if it would ever be possible to handle lists with different dataclasses in them. This would be very useful to configure data-augmentations for example.

JonasFrey96 changed the title ~~Parsing a List od dataclasses~~ Parsing a List of dataclasses Jul 5, 2023

ejnnr mentioned this issue Oct 20, 2023

Fix validation set configuration via CLI ejnnr/cupbearer#12

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing a List of dataclasses #271

Parsing a List of dataclasses #271

JonasFrey96 commented Jul 5, 2023 •

edited

Loading

lebrice commented Jul 5, 2023

JonasFrey96 commented Jul 5, 2023

lebrice commented Jul 9, 2023

MiguelMonteiro commented Jul 11, 2023

hugobb commented Jul 31, 2023 •

edited

Loading

hugobb commented Jul 31, 2023

lebrice commented Aug 1, 2023 •

edited

Loading

MiguelMonteiro commented Aug 9, 2023

Parsing a List of dataclasses #271

Parsing a List of dataclasses #271

Comments

JonasFrey96 commented Jul 5, 2023 • edited Loading

lebrice commented Jul 5, 2023

JonasFrey96 commented Jul 5, 2023

lebrice commented Jul 9, 2023

MiguelMonteiro commented Jul 11, 2023

hugobb commented Jul 31, 2023 • edited Loading

hugobb commented Jul 31, 2023

lebrice commented Aug 1, 2023 • edited Loading

MiguelMonteiro commented Aug 9, 2023

JonasFrey96 commented Jul 5, 2023 •

edited

Loading

hugobb commented Jul 31, 2023 •

edited

Loading

lebrice commented Aug 1, 2023 •

edited

Loading