Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geometric lookups and "vector data cubes" #748

Open
asinghvi17 opened this issue Sep 19, 2024 · 15 comments
Open

Geometric lookups and "vector data cubes" #748

asinghvi17 opened this issue Sep 19, 2024 · 15 comments
Labels
enhancement New feature or request

Comments

@asinghvi17
Copy link
Collaborator

asinghvi17 commented Sep 19, 2024

It's super easy to create dimension axes that have vector lookups:

julia> using CairoMakie, DimensionalData
julia> points = Point2f.(1, 1:10)
10-element Vector{Point{2, Float32}}:
 [1.0, 1.0]
 [1.0, 2.0]
 [1.0, 3.0]
 [1.0, 4.0]
 [1.0, 5.0]
 [1.0, 6.0]
 [1.0, 7.0]
 [1.0, 8.0]
 [1.0, 9.0]
 [1.0, 10.0]

julia> rand(X(DimensionalData.Categorical(points)))
╭────────────────────────────────╮
│ 10-element DimArray{Float64,1} │
├────────────────────────────────┴─────────────────────────────────────── dims ┐
  ↓ X Categorical{Point{2, Float32}} [Float32[1.0, 1.0], Float32[1.0, 2.0], …, Float32[1.0, 9.0], Float32[1.0, 10.0]] ForwardOrdered
└──────────────────────────────────────────────────────────────────────────────┘
 Float32[1.0, 1.0]   0.122667
 Float32[1.0, 2.0]   0.670468
 Float32[1.0, 3.0]   0.398075
 Float32[1.0, 4.0]   0.275494
 Float32[1.0, 5.0]   0.437735
 Float32[1.0, 6.0]   0.333305
 Float32[1.0, 7.0]   0.43346
 Float32[1.0, 8.0]   0.740607
 Float32[1.0, 9.0]   0.48268
 Float32[1.0, 10.0]  0.221809

but some questions arise here.

  1. How do we plot them? For scatter I could see just using specapi to forward color as the values of the dimvector. What about 2d arrays?
  2. For e.g. weather station data, you would have a time axis and a position or geometry axis. How would people use data from here? What are common access patterns?
  3. Can we make selectors like Contains, Touches, .., At, etc work, potentially using GeometryOps? Do we want an extension for this? I think this will involve geometric predicates at least.
  4. Do we want to construct spatial indices for such geometric lookups?
  5. Can we set GI.geometrycolumns of a raster to point to geometric lookups? Do we want to stash this in the metadata?
@rafaqz
Copy link
Owner

rafaqz commented Sep 19, 2024

Dimensional data.jl already handles point lookups a bit, see mergedims. But polygons etc need more work. Probably we make a GeometryLookups.jl package so everyone can use them?

@rafaqz
Copy link
Owner

rafaqz commented Sep 19, 2024

  1. scatter default is good for points with color as the values. if there is another dimension like time we plot points in 3d? and the same for lines/polygons etc
  2. This already works for points with MergedLookup, we can use that syntax for everything else too
  3. See merged.jl for the point implementation, for polygons we would need special lookups.
  4. That could be better than what MergedLookup does currently. Its slow.
  5. Yep we can do that. One hitch is that geometry can be either values in the array or in a lookup, but we can detect that. (a confusing part of the "vector data cube" talk is people use and discuss both but theyre totally different things)

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Sep 20, 2024

I just realized that geometric lookups are a great way to represent discretized global grids, since each grid cell is just an integer - a DGGSLookup could be a subtype of some AbstractGeometricLookup, and with a nice API we could have super easy plotting / subsetting / rasterization to a Cartesian grid!

Screenshot 2024-09-20 at 4 09 04 PM (image from the DGGS talk at SDSL 2024)

@rafaqz
Copy link
Owner

rafaqz commented Sep 21, 2024

For sure. Except for the nesting, and there should be optimisations for finding polygons on these grids?

@rafaqz rafaqz added the enhancement New feature or request label Nov 5, 2024
@felixcremer
Copy link
Contributor

If you are interested in DGGS there is also https://github.com/danlooo/DGGS.jl

@asinghvi17
Copy link
Collaborator Author

Thanks Felix, that seems interesting! Are those actually DimArrays though, or just convenient wrapper structs? I went over the code briefly but am not able to see how this is working...

@asinghvi17
Copy link
Collaborator Author

I just wrote up a more interesting vector data cube example with polygons. See the following:

using DimensionalData
using DimensionalData.Lookups
using DimensionalData.Dimensions

import DimensionalData as DD

# load data
using NaturalEarth
import GeoInterface as GI, GeometryOps as GO

fc = naturalearth("admin_0_countries", 10)
geometries_data = fc.geometry
id_data = fc.ADM0_A3
pop_data = fc.POP_EST .|> Float64

# now for the fun part
# let's define a dimension named Geometry
# TODO: how can we make this be interpreted as (X, Y)?  can we somehow hook into the same things that MergeDims does?

@dim Geometry

# create a categorical lookup from the geometry
geometry_lookup = Categorical(geometries_data)

# create dimvectors which have geometry lookups - simple so far
id_dd = DimVector(id_data, Geometry(geometry_lookup))
pop_dd = DimVector(pop_data, Geometry(geometry_lookup))

# stack them so they share dims
fc_stack = DimStack((; id = id_dd, pop = pop_dd))



# Now for the _really_ fun part: mixing dimensions
# create a 2d dimarray, that hypothetically has height measurements over time for each geometry
height_dim_matrix = rand(Geometry(geometry_lookup), Ti(1:10))

# stack 'em!
full_stack = DimStack((; id = id_dd, pop = pop_dd, height = height_dim_matrix))

# select the USA
full_stack[Geometry(Where(x -> GO.contains(x, (-103, 44))))]
full_stack[Geometry(Where(x -> GO.contains(x, (-103, 44))))].id
full_stack[Geometry(Where(x -> GO.contains(x, (-103, 44))))].pop
full_stack[Geometry(Where(x -> GO.contains(x, (-103, 44))))].height

this is pretty sweet I think...now just have to figure out how to actually write this in a CF compliant way, and how to detect these lookups when plotting 😢

@rafaqz
Copy link
Owner

rafaqz commented Dec 5, 2024

Just using Where is funny, like that just works as is

I guess we can accelerate that for GO predicates with an Extent index, if we use a Fix2

@asinghvi17
Copy link
Collaborator Author

yep! I'm building out a PolygonLookup now that would support that, but we could change that to be a more generic GeometricLookup at some point that works for any geometries

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Dec 5, 2024

btw the big issue here is probably going to be file I/O. Any suggestions on that? NetCDF has some support for this but I have no clue how we would get this out from a nc file:

example.nc.txt
(remove the .txt after, that's for github sake). This file has polygons with holes, if we can handle that I figure we're pretty much done haha

@rafaqz
Copy link
Owner

rafaqz commented Dec 5, 2024

Have you looked at the python/R vector data cubes ? If we can use something already being standardised...

(The CF standard is likely pretty arcane)

@alex-s-gardner
Copy link
Contributor

alex-s-gardner commented Dec 5, 2024

This is really exciting!

And making this a DataFrame almost works, except height vectors are concatenated and scalars atrtibutes are duplicated :

DataFrame(full_stack)
2580×5 DataFrame
  Row │ Geometry         Ti     id      pop             height      
      │ Abstract…        Int64  String  Float64         Float64     
──────┼─────────────────────────────────────────────────────────────
    1 │ 2D MultiPolygon      1  IDN          2.70626e8  0.379892
    2 │ 2D MultiPolygon      1  MYS          3.19498e7  0.851437
    3 │ 2D MultiPolygon      1  CHL          1.8952e7   0.0670016
    4 │ 2D Polygon           1  BOL          1.15131e7  0.925698
    5 │ 2D MultiPolygon      1  PER          3.25105e7  0.699816

@asinghvi17
Copy link
Collaborator Author

Yeah...Ideally we would want a sliced dimtable from the stack, with a given index column

@asinghvi17
Copy link
Collaborator Author

Have you looked at the python/R vector data cubes ? If we can use something already being standardised...

(The CF standard is likely pretty arcane)

They also use CF :((

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2024

Ok guess we will have to implement it...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants