Stable | Latest | Source code |
---|---|---|
GeoSpark@Twitter || GeoSpark Discussion Board ||
GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark / SparkSQL with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs)/ SpatialSQL that efficiently load, process, and analyze large-scale spatial data across machines.
Name | API | Spark compatibility | Introduction |
---|---|---|---|
Core | RDD | Spark 2.X/1.X | SpatialRDDs and Query Operators. |
SQL | SQL/DataFrame | SparkSQL 2.1+ | SQL interfaces for GeoSpark core. |
Viz | RDD, SQL/DataFrame | RDD - Spark 2.X/1.X, SQL - Spark 2.1+ | Visualization for Spatial RDD and DataFrame. |
Zeppelin | Apache Zeppelin | Spark 2.1+, Zeppelin 0.8.1+ | GeoSpark plugin for Apache Zeppelin |
Please visit GeoSpark website for detailed documentations
- GeoSpark 1.3.1 is released. This version provides a complete Python wrapper to GeoSpark RDD and SQL API. It also contains a number of bug fixes and new functions from 12 contributors. See Python tutorial: RDD, Python tutorial: SQL, Release note
GeoSpark ecosystem has around 10K downloads per month.