Skip to content

Kontur's open source geodata ETL/CI/CD pipeline designed for ease of maintenance and high single-node throughput.

License

Notifications You must be signed in to change notification settings

konturio/geocint-runner

Repository files navigation

geocint-runner

geocint processing pipeline

Geocint is Kontur's open source geodata ETL/CI/CD pipeline designed for ease of maintenance and high single-node throughput. Writing the code as Geocint target makes sure that it is fully recorded, can be run autonomously, can be inspected, reviewed and tested by other team members, and will automatically produce new artifacts once new input data comes in.

Geocint structure:

Geocint consists of 3 different parts:

  • geocint-runner - a core part of the pipeline, includes utilities and initial Makefile
  • geocint-openstreetmap - a chain of targets for downloading, updating and uploading to database OpenStreetMap planet dump
  • [geocint-private] any repository that contains your additional functionality

image

Technology stack:

  • A high-performance computer. OS: the latest Ubuntu version (not necessarily LTS).
  • Bash (Linux shell) is used for scripting one-liners that get data into the database for further processing or get data out of the database for deployment. https://tldp.org/LDP/abs/html/
  • GNU Make is used as job server. We do not use advanced features like variables and wildcards, using simple explicit "file-depends-on-file" mode. Make takes care of running different jobs concurrently whenever possible. https://makefiletutorial.com/
  • make-profiler is used as linter and preprocessor for Make that outputs a network diagram of what is getting built when and why. The output chart allows to see what went wrong and quickly get to logs. https://github.com/konturio/make-profiler
  • PostgreSQL (latest stable version) for data manipulation. No replication, minimal WAL logging, disabled synchronous_commit (fsync enabled!), parallel costs tuned to prefer parallel execution whenever possible. To facilitate debugging auto_explain is enabled, and you can find slow query plans in Postgres’ log files. log files. When you need to make it faster, follow https://postgrespro.ru/education/courses/QPT
  • GNU Parallel is used for paralleling tasks that cannot be effectively paralleled by Postgres, essentially parallel-enabled Bash. https://www.gnu.org/software/parallel/parallel.html
  • PostGIS (the latest unreleased master version) for geodata manipulation. As members of the Kontur team are maintainers of PostGIS, you have the opportunity to develop or request new features directly. https://postgis.net/docs/manual-dev/reference.html
  • h3_pg for hexagon grid manipulation, https://github.com/bytesandbrains/h3-pg. When googling for manuals make sure you use this specific extension.
  • aws-cli is used to transfer data to and from Amazon S3 buckets. https://docs.aws.amazon.com/cli/index.html
  • python is used for small tasks like unpivoting source data.
  • GDAL, OGR, osm-c-tools, osmium, and other tools are utilized in Bash CLI as needed.

Install, first run guides and best practices

About

Kontur's open source geodata ETL/CI/CD pipeline designed for ease of maintenance and high single-node throughput.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published