SimpleParsing icon indicating copy to clipboard operation
SimpleParsing copied to clipboard

Parsing a List of dataclasses

Open JonasFrey96 opened this issue 2 years ago • 8 comments

Hello,

it would be great if we could add support to pass a List of dataclasses:

@dataclass
class ExperimentParams(Serializable):
        @dataclass
        class LayerParams:
            name: str
            loss_scale: float = 1.0

        target_layers: List[any] = field(
            default_factory=lambda LayerParams=LayerParams: [
                LayerParams( name="loss_a", loss_scale=1.0),
                LayerParams(name="loss_b", loss_scale=2.0),
            ]
        )

Is there any way to elegantly parse such a configuration file:

python train.py --target_layers_0.loss_scale 10

This relates to:

                raise NotImplementedError(
                    f"Field {field.name} is of type {field_type}, which isn't "
                    f"supported yet. (container of a dataclass type)"
                )

If someone can point me toward what to change, I am happy to contribute.

JonasFrey96 avatar Jul 05 '23 07:07 JonasFrey96

Hello again @JonasFrey96 !

There is limited support for something similar, but not exactly the same:

https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_example.py

In particular, this example here might be of interest to you: https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_lists_example.py

Let me know what you think!

lebrice avatar Jul 05 '23 14:07 lebrice

In my specific use case, it would be great to allow for parsing a list of data classes. This would help a lot. I tried to look into the code, but it is quite overwhelming and seems to be none trivial.

JonasFrey96 avatar Jul 05 '23 16:07 JonasFrey96

Hey @JonasFrey96 , did you look at this example? https://github.com/lebrice/SimpleParsing/blob/master/examples/merging/multiple_lists_example.py

lebrice avatar Jul 09 '23 19:07 lebrice

Hi, I think this feature would be very useful for things like lists of data augmentations in deep learning code. Is it planned to be implemented?

MiguelMonteiro avatar Jul 11 '23 10:07 MiguelMonteiro

Hi, I'm also interested in this feature. From my understanding the idea of the feature, would be to enable this type of hierarchical dataclass definition with List.

@dataclass
class CNNStack:
    name: str = "stack"
    num_layers: int = 3
    kernel_sizes: Tuple[int, int, int] = (7, 5, 5)
    num_filters: List[int] = field(default_factory=[32, 64, 64].copy)


@dataclass
class Config:
    stack_list: List[CNNStack] = field(default_factory=[CNNStack(), CNNStack(), CNNStack()].copy)

parser = ArgumentParser(conflict_resolution=ConflictResolution.ALWAYS_MERGE)
parser.add_arguments(Config, dest="config", default=Config())
args = parser.parse_args()
print(args)

However this is not supported yet, this only works if we manually construct the parser which in my use cases is not very convenient. I would also be interested in helping with this, but also not sure where to start.

hugobb avatar Jul 31 '23 04:07 hugobb

I think the challenging part and the main difference with the multiple_lists_example.py is that in the proposed feature the number of CNNStack would not be fixed but could vary depending on the number of arguments provided by user.

hugobb avatar Jul 31 '23 15:07 hugobb

Hey there @hugobb , @MiguelMonteiro , thanks for commenting.

I agree with you, this feature would be great to have.

@hugobb you're right: this would be challenging to implement, especially if we don't know how many dataclasses to parse in advance!

I guess a first step could be to make this work when we know the number of classes in advance, for instance something like cnn_blocks: tuple[CnnBlock, CnnBlock] . (As far as I recall, this isnt already supported, but I might be wrong, it's been a while since I've worked on this)

Then, as a next step, we could probably figure out the number of values by inspecting the parsed values during postprocessing, or by adding an argument that can be used explicitly, like --cnn_blocks=2 or similar.

This feature will probably require a substantial refactoring of the current "reuse" feature, which is quite messy:

  • the conflict resolution mode might need to become local on a per-dataclass basis instead of global
  • the parsing functions would need to share the information of how many values are expected (IIRC this is already partly the case, but I'd like to make this more explicit)

I could potentially invest some time in order to try to get this to work, but I'd like to get a bit more info if possible before that.

Here are some design-y questions to start us off:

  • what differences would there be (if any) with the current reuse feature with respect to how the values for each dataclass are passed from the command-line?
    • How would you expect to specify a different value for one of the fields of, say, the last dataclass in the list?
  • how should the number of dataclasses to parse be determined? Should it be explicitly passed via an argument, or should it be inferred from the passed values? Will this always be possible? What should we do if/when it isn't?

Thanks again @hugobb @MiguelMonteiro @JonasFrey96 for posting. Lets keep this ball rolling.

lebrice avatar Aug 01 '23 02:08 lebrice

Hi @lebrice, thanks for your answer.

what differences would there be (if any) with the current reuse feature with respect to how the values for each dataclass are passed from the command-line?

I am not sure how this feature works, sorry

How would you expect to specify a different value for one of the fields of, say, the last dataclass in the list?

Perhaps the most intuitive way is for the parser to automatically append ".1", ".2", etc... to the end of the name. For example: --cnn_stack.1.num_layers=2 --cnn_stack.2.num_layers=5

how should the number of dataclasses to parse be determined? Should it be explicitly passed via an argument, or should it be inferred from the passed values? Will this always be possible? What should we do if/when it isn't?

Ideally, it would be infered from the arguemnts passed from the command line.

What I am still unsure is if it would ever be possible to handle lists with different dataclasses in them. This would be very useful to configure data-augmentations for example.

MiguelMonteiro avatar Aug 09 '23 13:08 MiguelMonteiro