Skip to content

An open-source tool for reading OvertureMaps data with multiprocessing and additional Quality-of-Life features

License

Notifications You must be signed in to change notification settings

kraina-ai/overturemaestro

Repository files navigation


Generated using DALL·E 3 model with this prompt: Cute stylized conducting virtuoso using a paper map as music sheet. White background, minimalistic, vector graphics, clean background, encased in a circle. In navy and gold colours. Logo for a python library, should work well as small icon.

GitHub Checks GitHub Workflow Status - DEV GitHub Workflow Status - PROD pre-commit.ci status CodeFactor Grade Codecov Package version Supported Python versions PyPI - Downloads

OvertureMaestro

An open-source tool for reading OvertureMaps data with multiprocessing and additional Quality-of-Life features.

What is OvertureMaestro 🎼🌍?

  • Scalable reader for OvertureMaps data.
  • Is based on top of PyArrow1.
  • Saves files in the GeoParquet2 file format for easier integration with modern cloud stacks.
  • Filters data based on geometry.
  • Can filter data using PyArrow expressions.
  • Utilizes multiprocessing for faster data download.
  • Utilizes dedicated index of all features in the Overture Maps dataset to download only specific parts based on the geometry filter.
  • Utilizes caching to reduce repeatable computations.
  • Can be used as Python module as well as a beautiful CLI based on Typer3.

Installing

As pure Python module

pip install overturemaestro

With beautiful CLI

pip install overturemaestro[cli]

Required Python version?

OvertureMaestro supports Python >= 3.9

Dependencies

Required:

  • overturemaps (>=0.8.0): Reusing oficial CLI library with dedicated schema related functions

  • pyarrow (>=16.0.0): For OvertureMaps GeoParquet dataset wrangling

  • geopandas (>=1.0): For returning GeoDataFrames and reading Geo files

  • shapely (>=2.0): For parsing WKT and GeoJSON strings and filtering data with STRIndex

  • geoarrow-rust-core (>=0.3.0): For transforming Arrow data to Shapely objects

  • pooch (>=1.6.0): For downloading precalculated dataset indexes

  • rich (>=12.0.0): For showing progress bars

  • fsspec (>=2021.04.0) & aiohttp (>=3.8.0): For accessing AWS S3 datasets in PyArrow and GitHub files for precalculated datasets

  • geopy (>=2.0.0): For geocoding of strings

Optional:

  • typer[all] (>=0.9.0) (click, colorama, rich, shellingham): Required in CLI

  • h3 (>=4.0.0b1): For reading H3 strings. Required in CLI

  • s2 (>=0.1.9): For transforming S2 indexes into geometries. Required in CLI

  • python-geohash (>=0.8): For transforming GeoHash indexes into geometries. Required in CLI

  • scikit-learn (>=1.0): For clustering geometries when generating release index. Required for generating release index

  • polars (>=0.20.4): For calculating total bounding box from many bounding boxes. Required for generating release index

Usage

TODO

Footnotes

  1. PyArrow Website

  2. GeoParquet data format

  3. Typer docs