Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hydra_zen.hydrated_dataclass unconditionally sets the docstring on the wrapped class using the target's __doc__ #750

Open
lebrice opened this issue Nov 12, 2024 · 0 comments

Comments

@lebrice
Copy link

lebrice commented Nov 12, 2024

Hi there!

When you use hydrated_dataclass, the __doc__ on the wrapped dataclass is unconditionally set to "A structured config designed to ..." followed by the docstring from the _target_.

This causes issues with doctest, since it picks up the doctests from the docstring of the target callable!

An example:

@hydra_zen.hydrated_dataclass(
    target=AutoTokenizer.from_pretrained,
    frozen=True,
    unsafe_hash=True,
    populate_full_signature=True,
)
class TokenizerConfig:
    """Configuration options for the tokenizer."""

    pretrained_model_name_or_path: str
    cache_dir: Path | None = None  # use standard cache by default.
    force_download: bool = False
    local_files_only: bool = False
    token: str | bool | None = None
    revision: str = "main"
    use_fast: bool = True
    config: PretrainedConfig | None = None
    # proxies: dict[str, str] = dataclasses.field(default_factory=dict, hash=False)
    subfolder: str = ""
    tokenizer_type: str | None = None
    trust_remote_code: bool = False

# fails!
assert TokenizerConfig.__doc__ == """Configuration options for the tokenizer."""
lebrice added a commit to mila-iqia/ResearchTemplate that referenced this issue Nov 12, 2024
lebrice added a commit to mila-iqia/ResearchTemplate that referenced this issue Nov 15, 2024
* WIP: Add an LLM finetuning example

Signed-off-by: Fabrice Normandin <[email protected]>

* WIP: add / rename more configs

Signed-off-by: Fabrice Normandin <[email protected]>

* Finetuning example seems to be working

Signed-off-by: Fabrice Normandin <[email protected]>

* Making progress, more self-contained example

Signed-off-by: Fabrice Normandin <[email protected]>

* Works! (need to fix the hash used for path though)

Signed-off-by: Fabrice Normandin <[email protected]>

* Improve hashing, reduce default block size

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix val_loss logging and add docstring

Signed-off-by: Fabrice Normandin <[email protected]>

* Increase the number of dataloader workers

Signed-off-by: Fabrice Normandin <[email protected]>

* Use smaller model for now

Signed-off-by: Fabrice Normandin <[email protected]>

* Use FSDP in the example

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix bug in id generation from config classes

Signed-off-by: Fabrice Normandin <[email protected]>

* Tweak config, try to setup mid-epoch checkpointing

Signed-off-by: Fabrice Normandin <[email protected]>

* Rename `HFExample` -> `TextClassificationExample`

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix broken links in nav

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove "huggingface" datamodule config

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issues in config/tests for text_classification

Signed-off-by: Fabrice Normandin <[email protected]>

* Add an entry to test the llm_finetuning_example

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issues in the text classification example

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix weird docstring issues with hydra-zen

- mit-ll-responsible-ai/hydra-zen#750

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix test and config of text_classification_example

Signed-off-by: Fabrice Normandin <[email protected]>

* Move test from main_test.py to example_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* forward_pass is a method of LearningAlgorithmTests

Signed-off-by: Fabrice Normandin <[email protected]>

* Various type hint fixes and tweaks

Signed-off-by: Fabrice Normandin <[email protected]>

* WIP: Adding some tests for LLM finetuning example

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue in `jax.md`

Signed-off-by: Fabrice Normandin <[email protected]>

* Add link to the example page in index.md

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix tests for the llm finetuning example

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue with tuples in regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix test for `get_hash_of`

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove unused _field function

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue with built-in modules in autoref plugin

Signed-off-by: Fabrice Normandin <[email protected]>

* Add a bit of info in the example doc

Signed-off-by: Fabrice Normandin <[email protected]>

* Add more links in the doc of the module

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue with the text classification example

Signed-off-by: Fabrice Normandin <[email protected]>

* Add skipif mark for LLM finetuning test

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix data_dir of text_classification_example

Signed-off-by: Fabrice Normandin <[email protected]>

* Use the "auto" strategy for LLM Finetuning tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix error in fork_rng of LLM finetuning example

Signed-off-by: Fabrice Normandin <[email protected]>

* Try a hacky fix for failing test

Signed-off-by: Fabrice Normandin <[email protected]>

* Don't run llm finetuning tests on github Cloud CI

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* Rename llm_finetuning_example -> llm_finetuning

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix import error

Signed-off-by: Fabrice Normandin <[email protected]>

---------

Signed-off-by: Fabrice Normandin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant