Skip to content

Commit

Permalink
[SEDONA-664] Add native GeoPackage reader (#1603)
Browse files Browse the repository at this point in the history
  • Loading branch information
Imbruced authored Oct 15, 2024
1 parent ec1e26c commit 4c7da62
Show file tree
Hide file tree
Showing 83 changed files with 6,303 additions and 2 deletions.
92 changes: 92 additions & 0 deletions docs/tutorial/sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -705,6 +705,98 @@ For Postgis there is no need to add a query to convert geometry types since it's
.withColumn("geom", f.expr("ST_GeomFromWKB(geom)")))
```

## Load from geopackage

Since v1.7.0, Sedona supports loading Geopackage file format as a DataFrame.

=== "Scala/Java"

```scala
val df = sedona.read.format("geopackage").option("tableName", "tab").load("/path/to/geopackage")
```

=== "Java"

```java
Dataset<Row> df = sedona.read().format("geopackage").option("tableName", "tab").load("/path/to/geopackage")
```

=== "Python"

```python
df = sedona.read.format("geopackage").option("tableName", "tab").load("/path/to/geopackage")
```

Geopackage files can contain vector data and raster data. To show the possible options from a file you can
look into the metadata table by adding parameter showMetadata and set its value as true.

=== "Scala/Java"

```scala
val df = sedona.read.format("geopackage").option("showMetadata", "true").load("/path/to/geopackage")
```

=== "Java"

```java
Dataset<Row> df = sedona.read().format("geopackage").option("showMetadata", "true").load("/path/to/geopackage")
```

=== "Python"

```python
df = sedona.read.format("geopackage").option("showMetadata", "true").load("/path/to/geopackage")

Then you can see the metadata of the geopackage file like below.

```
+--------------------+---------+--------------------+-----------+--------------------+----------+-----------------+----------+----------+------+
| table_name|data_type| identifier|description| last_change| min_x| min_y| max_x| max_y|srs_id|
+--------------------+---------+--------------------+-----------+--------------------+----------+-----------------+----------+----------+------+
|gis_osm_water_a_f...| features|gis_osm_water_a_f...| |2024-09-30 23:07:...|-9.0257084|57.96814069999999|33.4866675|80.4291867| 4326|
+--------------------+---------+--------------------+-----------+--------------------+----------+-----------------+----------+----------+------+
```

You can also load data from raster tables in the geopackage file. To load raster data, you can use the following code.

=== "Scala/Java"

```scala
val df = sedona.read.format("geopackage").option("tableName", "raster_table").load("/path/to/geopackage")
```

=== "Java"

```java
Dataset<Row> df = sedona.read().format("geopackage").option("tableName", "raster_table").load("/path/to/geopackage")
```

=== "Python"

```python
df = sedona.read.format("geopackage").option("tableName", "raster_table").load("/path/to/geopackage")
```

```
+---+----------+-----------+--------+--------------------+
| id|zoom_level|tile_column|tile_row| tile_data|
+---+----------+-----------+--------+--------------------+
| 1| 11| 428| 778|GridCoverage2D["c...|
| 2| 11| 429| 778|GridCoverage2D["c...|
| 3| 11| 428| 779|GridCoverage2D["c...|
| 4| 11| 429| 779|GridCoverage2D["c...|
| 5| 11| 427| 777|GridCoverage2D["c...|
+---+----------+-----------+--------+--------------------+
```

Known limitations (v1.7.0):

- webp rasters are not supported
- ewkb geometries are not supported
- filtering based on geometries envelopes are not supported

All points above should be resolved soon, stay tuned !

## Transform the Coordinate Reference System

Sedona doesn't control the coordinate unit (degree-based or meter-based) of all geometries in a Geometry column. The unit of all related distances in SedonaSQL is same as the unit of all geometries in a Geometry column.
Expand Down
5 changes: 5 additions & 0 deletions spark/common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,11 @@
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</dependency>
<dependency>
<groupId>org.xerial</groupId>
<artifactId>sqlite-jdbc</artifactId>
<version>3.36.0.3</version>
</dependency>
<dependency>
<groupId>edu.ucar</groupId>
<artifactId>cdm-core</artifactId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.sedona.sql.datasources.geopackage.connection

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}

import java.io.File

object FileSystemUtils {

def copyToLocal(options: Configuration, file: Path): (File, Boolean) = {
if (isLocalFileSystem(options, file)) {
return (new File(file.toUri.getPath), false)
}

val fs = file.getFileSystem(options)
val tempFile = File.createTempFile(java.util.UUID.randomUUID.toString, ".gpkg")

fs.copyToLocalFile(file, new Path(tempFile.getAbsolutePath))

(tempFile, true)
}

private def isLocalFileSystem(conf: Configuration, path: Path): Boolean = {
FileSystem.get(path.toUri, conf).isInstanceOf[org.apache.hadoop.fs.LocalFileSystem]
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.sedona.sql.datasources.geopackage.connection

import org.apache.sedona.sql.datasources.geopackage.model.{GeoPackageField, TableType, TileMatrix, TileMetadata}
import org.apache.sedona.sql.datasources.geopackage.model.TableType.TableType

import java.sql.{Connection, DriverManager, ResultSet, Statement}
import scala.collection.mutable
import scala.collection.mutable.ArrayBuffer

object GeoPackageConnectionManager {

def createStatement(file: String): Statement = {
val conn = DriverManager.getConnection("jdbc:sqlite:" + file)
conn.createStatement()
}

def closeStatement(st: Statement): Unit = {
st.close()
}

def getTableCursor(path: String, tableName: String): ResultSet = {
val conn: Connection =
DriverManager.getConnection("jdbc:sqlite:" + path)
val stmt: Statement = conn.createStatement()
stmt.executeQuery(s"SELECT * FROM ${tableName}")
}

def getSchema(file: String, tableName: String): Seq[GeoPackageField] = {
val statement = createStatement(file)

try {
val rs = statement.executeQuery(s"PRAGMA table_info($tableName)")
val fields = ArrayBuffer.empty[GeoPackageField]

while (rs.next) {
val columnName = rs.getString("name")
val columnType = rs.getString("type")

fields += GeoPackageField(columnName, columnType, true)
}

fields.toSeq
} finally {
closeStatement(statement)
}
}

def findFeatureMetadata(file: String, tableName: String): TableType = {
val statement = createStatement(file)

val rs =
statement.executeQuery(s"select * from gpkg_contents where table_name = '$tableName'")

try {
rs.getString("data_type") match {
case "features" => TableType.FEATURES
case "tiles" => TableType.TILES
case _ => TableType.UNKNOWN
}
} finally {
rs.close()
closeStatement(statement)
}
}

def getZoomLevelData(file: String, tableName: String): mutable.HashMap[Int, TileMatrix] = {
val stmt = createStatement(file)
val rs =
stmt.executeQuery(f"select * from gpkg_tile_matrix where table_name = '${tableName}'")
val result: mutable.HashMap[Int, TileMatrix] = mutable.HashMap()

try {
while (rs.next()) {
val zoom_level = rs.getInt("zoom_level")
val matrix_width = rs.getInt("matrix_width")
val matrix_height = rs.getInt("matrix_height")
val tile_width = rs.getInt("tile_width")
val tile_height = rs.getInt("tile_height")
val pixel_x_size = rs.getDouble("pixel_x_size")
val pixel_y_size = rs.getDouble("pixel_y_size")

result(zoom_level) = TileMatrix(
zoom_level,
matrix_width,
matrix_height,
tile_width,
tile_height,
pixel_x_size,
pixel_y_size)
}
} finally {
rs.close()
closeStatement(stmt)
}

result
}

def findTilesMetadata(file: String, tableName: String): TileMetadata = {
val statement = createStatement(file)

val rs = statement.executeQuery(
s"select * from gpkg_tile_matrix_set where table_name = '$tableName'")

try {
val minX = rs.getDouble("min_x")
val minY = rs.getDouble("min_y")
val maxX = rs.getDouble("max_x")
val maxY = rs.getDouble("max_y")
val srsID = rs.getInt("srs_id")

val getZoomLevelData = GeoPackageConnectionManager.getZoomLevelData(file, tableName)

TileMetadata(
tableName = tableName,
minX = minX,
minY = minY,
maxX = maxX,
maxY = maxY,
srsID = srsID,
zoomLevelMetadata = getZoomLevelData,
tileRowMetadata = null)
} finally {
rs.close()
closeStatement(statement)
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.sedona.sql.datasources.geopackage.errors

class GeopackageException extends Exception {
def this(message: String) {
this()
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
package org.apache.sedona.sql.datasources.geopackage.model

case class Envelope(minX: Double, minY: Double, maxX: Double, maxY: Double)
Loading

0 comments on commit 4c7da62

Please sign in to comment.