Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement config, recipe loader & recipe runner. #18

Merged
merged 16 commits into from
Aug 8, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Zampy

A tool for downloading Land Surface Model input data.

### Name origin

Named after *Zam*; [the Avestan language term for the Zoroastrian concept of "earth"](https://en.wikipedia.org/wiki/Zam).

## How to use Zampy
See the section ["using Zampy"](using_zampy.md).
54 changes: 54 additions & 0 deletions docs/using_zampy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Using Zampy

## Installing Zampy
Zampy can be installed by doing:
```bash
pip install zampy git+https://github.com/EcoExtreML/zampy
```

## Configuration
Zampy needs to be configured with a simple configuration file.

This file is created under your -*user's home*-/.config directory:

SarahAlidoost marked this conversation as resolved.
Show resolved Hide resolved
`~/.config/zampy/zampy_config.yml`
SarahAlidoost marked this conversation as resolved.
Show resolved Hide resolved

```yaml
BSchilperoort marked this conversation as resolved.
Show resolved Hide resolved
working_directory: /home/bart/Zampy
BSchilperoort marked this conversation as resolved.
Show resolved Hide resolved
```

## Formulating a recipe
Recipes have the following structure:
BSchilperoort marked this conversation as resolved.
Show resolved Hide resolved

```yaml
name: "test_recipe"

download:
years: [2020, 2020]
bbox: [54, 6, 50, 3] # NESW

datasets:
era5:
variables:
- 10m_v_component_of_wind
- surface_pressure
eth_canopy_height:
variables:
- height_of_vegetation

convert:
convention: ALMA
frequency: 1H # outputs at 1 hour frequency. Pandas-like freq-keyword.
resolution: 0.5 # output resolution in degrees.
```

You can specify multiple datasets and multiple variables per dataset.

## Running a recipe
Save this recipe to disk and run the following code in your shell:

```bash
zampy --filename /home/username/path_to_file/simple_recipe.yml
SarahAlidoost marked this conversation as resolved.
Show resolved Hide resolved
```

This will execute the recipe (i.e. download, ingest, convert, resample and save the data).
57 changes: 57 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
site_name: Zampy Documentation

theme:
name: material
highlightjs: true
hljs_languages:
- yaml
- python
- bash
features:
- navigation.instant
- navigation.tabs
- navigation.tabs.sticky
- content.code.copy

palette:
# Palette toggle for light mode
- scheme: default
toggle:
icon: material/weather-sunny
name: Switch to dark mode
primary: light green
accent: green

# Palette toggle for dark mode
- scheme: slate
toggle:
icon: material/weather-night
name: Switch to light mode
primary: blue grey
accent: teal

plugins:
- mkdocs-jupyter:
include_source: True
- search
- mkdocstrings:
handlers:
python:
options:
docstring_style: google
docstring_options:
ignore_init_summary: no
merge_init_into_class: yes
show_submodules: no

markdown_extensions:
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
pygments_lang_class: true
- pymdownx.inlinehilite
- pymdownx.snippets
- pymdownx.superfences

extra:
generator: false
19 changes: 19 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ classifiers = [
]
dependencies = [
"requests",
"pyyaml",
"netcdf4",
"numpy",
"pandas",
Expand All @@ -66,6 +67,9 @@ dependencies = [
]
dynamic = ["version"]

[project.scripts]
zampy="zampy.cli:run_recipe"

[project.optional-dependencies]
dev = [
"bump2version",
Expand All @@ -75,10 +79,18 @@ dev = [
"mypy",
"types-requests", # type stubs for request lib
"types-urllib3", # type stubs for url lib
"types-PyYAML",
"pytest",
"pytest-cov",
"pre-commit",
]
docs = [
"mkdocs",
"mkdocs-material",
"mkdocs-jupyter",
"mkdocstrings[python]",
"mkdocs-gen-files",
]

[tool.hatch.envs.default]
features = ["dev"]
Expand All @@ -99,6 +111,13 @@ coverage = [
"pytest --cov --cov-report term --cov-report xml --junitxml=xunit-result.xml tests/",
]

[tool.hatch.envs.docs]
features = ["docs"]

[tool.hatch.envs.docs.scripts]
build = ["mkdocs build"]
serve = ["mkdocs serve"]

# [tool.hatch.envs.conda]
# type = "conda"
# python = "3.10"
Expand Down
20 changes: 20 additions & 0 deletions src/zampy/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
"""Implements CLI interface for Zampy."""
import click
from zampy.recipe import RecipeManager


@click.command()
@click.option(
"--filename",
prompt="Path to the recipe filename",
help="Path to the recipe filename.",
)
SarahAlidoost marked this conversation as resolved.
Show resolved Hide resolved
def run_recipe(filename: str) -> None:
"""Run the recipe using the CLI."""
click.echo(f"Executing recipe: {filename}")
rm = RecipeManager(filename)
rm.run()


if __name__ == "__main__":
run_recipe()
8 changes: 8 additions & 0 deletions src/zampy/datasets/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,11 @@


__all__ = ["dataset_protocol", "validation", "EthCanopyHeight", "ERA5"]


# This object tracks which datasets are available.
DATASETS: dict[str, type[dataset_protocol.Dataset]] = {
# All lowercase key.
"era5": ERA5,
"eth_canopy_height": EthCanopyHeight,
}
130 changes: 130 additions & 0 deletions src/zampy/recipe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
""""All functionality to read and execute Zampy recipes."""
from pathlib import Path
from typing import Any
import numpy as np
import yaml
from zampy.datasets import DATASETS
from zampy.datasets import converter
from zampy.datasets.dataset_protocol import Dataset
from zampy.datasets.dataset_protocol import SpatialBounds
from zampy.datasets.dataset_protocol import TimeBounds


def recipe_loader(recipe_filename: str) -> dict:
"""Load the yaml recipe into a dictionary, and do some validation."""
with open(recipe_filename) as f:
recipe: dict = yaml.safe_load(f)

if not all(("name", "download", "convert" in recipe.keys())):
msg = (
"One of the following items are missing from the recipe:\n"
"name, download, convert."
)
raise ValueError(msg)

if "datasets" not in recipe["download"].keys():
msg = "No dataset entry found in the recipe."
raise ValueError(msg)

if not all(("convention", "frequency", "resolution" in recipe["convert"].keys())):
msg = (
"One of the following items are missing from the recipe:\n"
"name, download, convert."
)
raise ValueError(msg)

return recipe


def config_loader() -> dict:
"""Load the zampty config and validate the contents."""
config_path = Path.home() / ".config" / "zampy" / "zampy_config.yml"

if not config_path.exists():
msg = f"No config file was found at '{config_path}'"
raise FileNotFoundError(msg)

with config_path.open() as f:
config: dict = yaml.safe_load(f)

if "working_directory" not in config.keys():
msg = "No `working_directory` key found in the config file."
raise ValueError(msg)

return config


class RecipeManager:
"""The recipe manager is used to get the required info, and then run the recipe."""

def __init__(self, recipe_filename: str) -> None:
"""Instantiate the recipe manager, using a prepared recipe."""
# Load & parse recipe
recipe = recipe_loader(recipe_filename)

self.start_year, self.end_year = recipe["download"]["years"]
self.timebounds = TimeBounds(
np.datetime64(f"{self.start_year}-01-01T00:00"),
np.datetime64(f"{self.end_year}-12-13T23:59"),
SarahAlidoost marked this conversation as resolved.
Show resolved Hide resolved
)
self.spatialbounds = SpatialBounds(*recipe["download"]["bbox"])

self.datasets: dict[str, Any] = recipe["download"]["datasets"]

self.convention = recipe["convert"]["convention"]
self.frequency = recipe["convert"]["frequency"]
self.resolution = recipe["convert"]["resolution"]

# Load & parse config
config = config_loader()
self.download_dir = Path(config["working_directory"]) / "download"
self.ingest_dir = Path(config["working_directory"]) / "ingest"
self.data_dir = (
Path(config["working_directory"]) / "output" / str(recipe["name"])
) # TODO: strip illegal chars from name.

# Create required directories if they do not exist yet:
for dir in [self.data_dir, self.download_dir, self.ingest_dir]:
dir.mkdir(parents=True, exist_ok=True)

def run(self) -> None:
"""Run the full recipe."""
for dataset_name in self.datasets:
_dataset = DATASETS[dataset_name.lower()]
dataset: Dataset = _dataset()
variables: list[str] = self.datasets[dataset_name]["variables"]

# Download datset
dataset.download(
download_dir=self.download_dir,
time_bounds=self.timebounds,
spatial_bounds=self.spatialbounds,
variable_names=variables,
)

dataset.ingest(self.download_dir, self.ingest_dir)

ds = dataset.load(
ingest_dir=self.ingest_dir,
time_bounds=self.timebounds,
spatial_bounds=self.spatialbounds,
variable_names=variables,
resolution=self.resolution,
regrid_method="flox",
)

ds = converter.convert(ds, dataset, convention=self.convention)

ds = ds.resample(time=self.frequency).mean()

comp = dict(zlib=True, complevel=5)
encoding = {var: comp for var in ds.data_vars}
fname = ( # e.g. "era5_2010-2020.nc"
f"{dataset_name.lower()}_" f"{self.start_year}-{self.end_year}" ".nc"
SarahAlidoost marked this conversation as resolved.
Show resolved Hide resolved
)
ds.to_netcdf(path=self.data_dir / fname, encoding=encoding)

print(
"Finished running the recipe. Output data can be found at:\n"
f" {self.data_dir}"
)
Loading