Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forcing not written in directory specified by dir argument #423

Open
RolfHut opened this issue Jun 4, 2024 · 6 comments
Open

forcing not written in directory specified by dir argument #423

RolfHut opened this issue Jun 4, 2024 · 6 comments
Assignees
Labels
bug Something isn't working forcing Issue related to forcing (generation, loading, etc.)

Comments

@RolfHut
Copy link
Contributor

RolfHut commented Jun 4, 2024

When I run the makkink forcing and specify a directory that directory is used as the starting point for ESMValTool, but the final result is not written in that directory. This means that the result (including the yaml file with the forcing info) is written a few directories deeper. This means that

forcing.sources["LumpedMakkingForcing"].generate(dir = "myDir")

followed by

forcing = forcing.sources["LumpedMakkinkForcing"].load("myDir")

does not work, but should be

forcing = forcing.sources["LumpedMakkinkForcing"].load("myDir" + "work/diagnostic/script")

to make it work. This is not consistent between forcing sources.

@RolfHut RolfHut added bug Something isn't working forcing Issue related to forcing (generation, loading, etc.) labels Jun 4, 2024
@BSchilperoort BSchilperoort self-assigned this Jun 4, 2024
@BSchilperoort
Copy link
Member

BSchilperoort commented Jun 6, 2024

I ran the MakkinkForcing and MarrmotForcing to see for myself, and indeed the directory you specify will be the work directory, but not the directory where eventually the resulting data is stored in.

This is not consistent between forcing sources.

I believe this is consistent between most forcing sources (all the ones that use the DefaultForcing's .generate method).
I believe the only forcing module that deviates from this is CaravanForcing, as it does not use ESMValTool.

A workaround can be:

forcing = forcing.sources["LumpedMakkingForcing"].generate(dir="my_work_dir")
forcing_dir = forcing.directory
...
forcing = forcing.sources["LumpedMakkinkForcing"].load(forcing_dir)

But I agree that this behavior is not clearly described at the moment. Perhaps for users defining a "work directory" and an "output directory" separately might be better.

@RolfHut
Copy link
Contributor Author

RolfHut commented Jun 6, 2024

For my current use case the loading of the forcing is done in another notebook than the creation of the forcing, so that workaround would not work, unless that forcing_dir is stored somewhere as meta-data in a txt file or some such, but that would honestly defeat the point I think?

@BSchilperoort
Copy link
Member

BSchilperoort commented Jun 6, 2024

I guess as a workaround you can try to see if the "work/diagnostic/script" path exists.

from pathlib import Path

if Path("myDir" / "work/diagnostic/script").exists():
    forcing_dir = "myDir" + "/" + "work/diagnostic/script"
else:
    forcing_dir = "myDir"

Ideally we would change this in ewatercycle though, but that will be too late for you at the moment.

We can change the generate method in the following backwards compatible way:

class DefaultForcing:

    @classmethod
    def generate(
        ...
        directory: str | None = None,
        work_dir: str | None = None,
        output_dir: str | None = None,
    ):
        if directory is not None:
            warnings.warn(
                "directory kwargs is deprecated, please use work_dir and output_dir"
            )
            work_dir = directory
        
        ...
        if output_dir is not None:
            move(work_dir / "work/diagnostic/script")
            delete(work_dir)  # clean up temp files
        ...

This keeps the (deprecated) directory kwarg valid, but warns the user that they should not use it anymore. By splitting the work dir and output dir we can also do an (optional) clean up of the temporary files.

If a user only specifies the output_dir, the working directory will be the default one.

@RolfHut
Copy link
Contributor Author

RolfHut commented Jun 6, 2024

that would totally work and I can incorporate that into my workflow. Do you want to implement that?

@BSchilperoort
Copy link
Member

I can tackle it at the same time as #419, but perhaps it's best if someone else also has a look at this before I work on it @Peter9192 @sverhoeven

@Peter9192
Copy link
Collaborator

Few remarks

  • Perhaps output is not always in "work/diagnostic/script, e.g. when you only use preprocessors and no script
  • Not sure if I like two directories. We should not bother the user with thinking about work dirs. I guess the main reason we stick with the ESMValTool structure is to preserve the provenance info. But we could restructure it after ESMValTool finishes. Could we place the forcing files with the ewatercycle-forcing.yaml in the top level forcing_dir and any other files that we deem important (like provenance) in a subdir called metadata or so?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working forcing Issue related to forcing (generation, loading, etc.)
Projects
None yet
Development

No branches or pull requests

3 participants