Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operations / transformations #79

Open
cholmes opened this issue Jun 28, 2024 · 1 comment
Open

operations / transformations #79

cholmes opened this issue Jun 28, 2024 · 1 comment
Assignees

Comments

@cholmes
Copy link
Contributor

cholmes commented Jun 28, 2024

There's been a good bit of desire to do 'extra' things in the converters, and we had a good discussion with some ideas on how to approach that in the last call, so wanted to open an issue.

The original one was #21 - add area and perimeter. But then also things like adding statistics as new columns, or filtering columns out.

Was thinking it could be ideal to keep the 'converters' very 'clean' - like they just translate from the source data to fiboa. But then there's maybe fiboa transform or something like that with a bunch of sub-commands. And ideally you could also use those sub-commands as part of the conversion process. Some of the initial ideas:

  • Add area and perimeter values (converter: Add option to calculate area and perimeter if missing #21)
  • Clean up geometries - automatically shift any overlapping ones. Or detect areas that are too big and remove pixels that are clearly wrong.
  • generate stats on boundary quality - size, regularity, inscribed circles
  • filter out columns
  • subset to certain geographic areas / make test & train datasets.

I'm sure there's lots more, but basically have a set of utilities that help clean up and format data better, and harmonize it for various use cases. But every transformation is an 'opinion' to use the Varda way of thinking about it. So keep those as their own utilities, for people to choose to transform as they want.

@cholmes
Copy link
Contributor Author

cholmes commented Jul 8, 2024

Other ideas:

  • harmonize data to eurocrops hcat - add extra attributes and do the mapping from source data to the eurocrop names (seems like it would need another mini ecosystem of converters for each country, though they are likely simpler than the full-fledged fiboa converters).
  • reproject - like if the source data is in a country-specific projection.

I've also been thinking about a 'merge' command for awhile - had been planning to make an issue for that, where collection-level metadata would shift to the row level. I was thinking that would be a full featured command, where you could do things like reprojection, and clean up boundaries. But it might make sense to push more to 'operations' and keep the merge pretty simple - it would just reject things that don't merge well, but the source data could be transformed more to get ready for the merge.

@m-mohr m-mohr self-assigned this Aug 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants