Merge pull request #57 from jiayuasu/GeoSpark-for-Spark-1.X

Push GeoSpark 0.5.1
apache · Feb 15, 2017 · bae6012 · bae6012
2 parents 73becac + 194fd15
commit bae6012
Show file tree

Hide file tree

Showing 12 changed files with 157 additions and 159 deletions.
diff --git a/README.md b/README.md
@@ -1,177 +1,99 @@
 ![GeoSpark Logo](http://www.public.asu.edu/~jiayu2/geospark/logo.png)
 
 [![Build Status](https://travis-ci.org/jiayuasu/GeoSpark.svg?branch=master)](https://travis-ci.org/jiayuasu/GeoSpark) [![Maven Central](https://maven-badges.herokuapp.com/maven-central/org.datasyslab/geospark/badge.svg)](https://maven-badges.herokuapp.com/maven-central/org.datasyslab/geospark)
+[![Join the chat at https://gitter.im/geospark-datasys/Lobby](https://badges.gitter.im/geospark-datasys/Lobby.svg)](https://gitter.im/geospark-datasys/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
 
-GeoSpark is listed as **Infrastructure Project** on **Apache Spark Official Third Party Project Page** ([http://spark.apache.org/third-party-projects.html](http://spark.apache.org/third-party-projects.html))
-
-GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs) that efficiently load, process, and analyze large-scale spatial data across machines. GeoSpark provides APIs for Apache Spark programmer to easily develop their spatial analysis programs with Spatial Resilient Distributed Datasets (SRDDs) which have in house support for geometrical and Spatial Queries (Range, K Nearest Neighbors, Join).
+``` Supported Apache Spark version: 2.0+(Master branch), 1.0+(1.X branch) ```
 
+GeoSpark is listed as **Infrastructure Project** on [**Apache Spark Official Third Party Project Page**](http://spark.apache.org/third-party-projects.html)
 
+GeoSpark is a cluster computing system for processing large-scale spatial data. GeoSpark extends Apache Spark with a set of out-of-the-box Spatial Resilient Distributed Datasets (SRDDs) that efficiently load, process, and analyze large-scale spatial data across machines. GeoSpark provides APIs for Apache Spark programmer to easily develop their spatial analysis programs with Spatial Resilient Distributed Datasets (SRDDs) which have in house support for geometrical and Spatial Queries (Range, K Nearest Neighbors, Join).
 
-GeoSpark artifacts are hosted in Maven Central. You can add a Maven dependency with the following coordinates:
 
-The following version supports Apache Spark 2.X versions:
 
-```
-groupId: org.datasyslab
-artifactId: geospark
-version: 0.5.0
-```
+GeoSpark artifacts are hosted in Maven Central: [**Maven Central Coordinates**](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Maven-Central-Coordinates)
 
-The following version supports Apache Spark 1.X versions:
 
-```
-groupId: org.datasyslab
-artifactId: geospark
-version: 0.5.0-spark-1.x
-```
 
-##  Version information ([Full List](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Full-Version-Release-notes))
+#  Version information ([more](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Full-Version-Release-notes))
 
 
 |      Version     	| Summary                                                                                                                                                                                                               	|
 |:----------------:	|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	|
-| 0.5.0| **Major updates:** We are pleased to announce the initial version of [Babylon](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon) a large-scale in-memory geospatial visualization system extending GeoSpark. Babylon and GeoSpark are integrated together. You can just import GeoSpark and enjoy! More detials are available here: [Babylon GeoSpatial Visualization](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon);
-| 0.4.0| **Major updates:** ([Example](https://github.com/DataSystemsLab/GeoSpark/blob/master/src/main/java/org/datasyslab/geospark/showcase/Example.java)) 1. Refactor constrcutor API usage. 2. Simplify Spatial Join Query API. 3. Add native support for LineStringRDD; **Functionality enhancement:** 1. Release the persist function back to users. 2. Add more exception explanations.|
-
-##News
-### Babylon Visualization Framework on GeoSpark is now available!
-Babylon is a large-scale in-memory geospatial visualization system.
-
-Babylon provides native support for general cartographic design by extending GeoSpark to process large-scale spatial data. It can visulize Spatial RDD and Spatial Queries and render super high resolution image in parallel.
-
-Babylon and GeoSpark are integrated together. You just need to import GeoSpark and enjoy them! More detials are available here: [Babylon GeoSpatial Visualization](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon) 
-
-### Babylon Gallery
-<img src="http://www.public.asu.edu/~jiayu2/geospark/picture/usrail.png" width="250">
-<img src="http://www.public.asu.edu/~jiayu2/geospark/picture/nycheatmap.png" width="250">
-<img src="http://www.public.asu.edu/~jiayu2/geospark/picture/ustweet.png" width="250">
-
-## How to get started (For Scala and Java developers)
-
-
-
-### Prerequisites
-
-1. Apache Spark 2.X releases (Apache Spark 1.X releases support available in GeoSpark for Spark 1.X branch)
-2. JDK 1.7
-3. You might need to modify the dependencies in "POM.xml" and make it consistent with your environment.
-
-Note: GeoSpark Master branch supports Apache Spark 2.X releases and GeoSpark for Spark 1.X branch supports Apache Spark 1.X releases. Please refer to the proper branch you need.
-
-### How to use GeoSpark APIs in an interactive Spark shell (Scala)
-
-1. Have your Spark cluster ready.
-2. Download [pre-compiled GeoSpark jar](https://github.com/DataSystemsLab/GeoSpark/releases) under "Release" tag.
-3. Run Spark shell with GeoSpark as a dependency.
-
-  `
-  ./bin/spark-shell --jars GeoSpark_COMPILED.jar
-  `
-
-3. You can now call GeoSpark APIs directly in your Spark shell!
-
-### How to use GeoSpark APIs in a self-contained Spark application (Scala and Java)
-
-1. Create your own Apache Spark project in Scala or Java
-2. Add GeoSpark Maven coordinates into your project dependencies.
-4. You can now use GeoSpark APIs in your Spark program!
-5. Use spark-submit to submit your compiled self-contained Spark program.
-
-### GeoSpark Programming Examples (Scala)
-
-[GeoSpark Scala Example](https://gist.github.com/jiayuasu/bcecaa2e9e6f280a0f9a72bb7549ffaa)
-
-[Test Data](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/test/resources)
-
-### GeoSpark Programming Examples (Java)
-
-[GeoSpark Java Example](https://github.com/DataSystemsLab/GeoSpark/blob/master/src/main/java/org/datasyslab/geospark/showcase/Example.java)
-
-[Test Data](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/test/resources)
-
-## Scala and Java API usage
-
-Please refer to [GeoSpark Scala and Java API Usage](http://www.public.asu.edu/~jiayu2/geospark/javadoc/)
-
+|0.5.1| **Bug fix:** (1) GeoSpark: Fix inaccurate KNN result when K is large (2) GeoSpark: Replace incompatible Spark API call [Issue #55](https://github.com/DataSystemsLab/GeoSpark/issues/55); (3) Babylon: Remove JPG output format temporarily due to the lack of OpenJDK support|
+| 0.5.0| **Major updates:** We are pleased to announce the initial version of [Babylon](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon) a large-scale in-memory geospatial visualization system extending GeoSpark. Babylon and GeoSpark are integrated together. You can just import GeoSpark and enjoy! More details are available here: [Babylon GeoSpatial Visualization](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon)|
 
+# Important features ([more](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Important-Features))
 ## Spatial Resilient Distributed Datasets (SRDDs)
-
-GeoSpark extends RDDs to form Spatial RDDs (SRDDs) and efficiently partitions SRDD data elements across machines and introduces novel parallelized spatial (geometric operations that follows the Open Geosptial Consortium (OGC) standard) transformations and actions (for SRDD) that provide a more intuitive interface for users to write spatial data analytics programs. Moreover, GeoSpark extends the SRDD layer to execute spatial queries (e.g., Range query, KNN query, and Join query) on large-scale spatial datasets. After geometrical objects are retrieved in the Spatial RDD layer, users can invoke spatial query processing operations provided in the Spatial Query Processing Layer of GeoSpark which runs over the in-memory cluster, decides how spatial object-relational tuples could be stored, indexed, and accessed using SRDDs, and returns the spatial query results required by user.
-
-**Supported Spatial RDDs: PointRDD, RectangleRDD, PolygonRDD, LineStringRDD**
+Supported Spatial RDDs: PointRDD, RectangleRDD, PolygonRDD, LineStringRDD
 
 ## Supported data format
-GeoSpark supports Comma-Separated Values (**CSV**), Tab-separated values (**TSV**), Well-Known Text (**WKT**), and  **GeoJSON** as the input formats. Users only need to specify input format as Splitter and the start and end offset (if necessary) of spatial fields in one row when call Constructors. GeoSpark also takes **any user-supplied format mapper function** to support the desired format.
-
-## Important features
+**Native input format support**: CSV, TSV, WKT, GeoJSON
 
-### Spatial partitioning
+**User-supplied input format mapper**: Any input formats
 
-GeoSpark supports R-Tree (**RTREE**) and Voronoi diagram (**VORONOI**) spatial partitioning methods. Spatial partitioning is to repartition RDD according to objects' spatial locations. Spatial join on spatial paritioned RDD will be very fast.
+## Spatial Partitioning
+Supported Spatial Partitioning techniques: R-Tree, Voronoi diagram
 
-### Spatial Index
+## Spatial Index
+Supported Spatial Indexes: Quad-Tree and R-Tree. Quad-Tree doesn't support Spatial K Nearest Neighbors query.
 
-GeoSpark supports two Spatial Indexes, Quad-Tree (**QUADTREE**) and R-Tree (**RTREE**). Quad-Tree doesn't support Spatial K Nearest Neighbors query.
+## Geometrical operation
+Inside, Overlap, DatasetBoundary, Minimum Bounding Rectangl, Polygon Union
 
-### Geometrical operation
+## Spatial Operation
+Spatial Range Query, Spatial Join Query, and Spatial K Nearest Neighbors Query.
 
-GeoSpark currently provides native support for Inside, Overlap, DatasetBoundary, Minimum Bounding Rectangle and Polygon Union in SRDDS following [Open Geospatial Consortium (OGC) standard](http://www.opengeospatial.org/standards).
-
-### Spatial Operation
-
-GeoSpark so far provides **Spatial Range Query**, **Spatial Join Query**, and **Spatial K Nearest Neighbors Query**.
+# GeoSpark Tutorial ([more](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Tutorial))
 
 #Babylon Visualization Framework on GeoSpark
 Babylon is a large-scale in-memory geospatial visualization system.
 
 Babylon provides native support for general cartographic design by extending GeoSpark to process large-scale spatial data. It can visulize Spatial RDD and Spatial Queries and render super high resolution image in parallel.
 
-Babylon and GeoSpark are integrated together. You just need to import GeoSpark and enjoy them! More detials are available here: [Babylon GeoSpatial Visualization](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon) 
+Babylon and GeoSpark are integrated together. You just need to import GeoSpark and enjoy them! More details are available here: [Babylon GeoSpatial Visualization](https://github.com/DataSystemsLab/GeoSpark/tree/master/src/main/java/org/datasyslab/babylon) 
 
 ## Babylon Gallery
 <img src="http://www.public.asu.edu/~jiayu2/geospark/picture/usrail.png" width="250">
 <img src="http://www.public.asu.edu/~jiayu2/geospark/picture/nycheatmap.png" width="250">
 <img src="http://www.public.asu.edu/~jiayu2/geospark/picture/ustweet.png" width="250">
 
-## Publication
+# Publication
 
 Jia Yu, Jinxuan Wu, Mohamed Sarwat. ["A Demonstration of GeoSpark: A Cluster Computing Framework for Processing Big Spatial Data"](). (demo paper) In Proceeding of IEEE International Conference on Data Engineering ICDE 2016, Helsinki, FI, May 2016
 
 Jia Yu, Jinxuan Wu, Mohamed Sarwat. ["GeoSpark: A Cluster Computing Framework for Processing Large-Scale Spatial Data"](http://www.public.asu.edu/~jiayu2/geospark/publication/GeoSpark_ShortPaper.pdf). (short paper) In Proceeding of the ACM International Conference on Advances in Geographic Information Systems ACM SIGSPATIAL GIS 2015, Seattle, WA, USA November 2015
 
 
-## Acknowledgement
+# Acknowledgement
 
 GeoSpark makes use of JTS Plus (An extended JTS Topology Suite Version 1.14) for some geometrical computations.
 
 Please refer to [JTS Topology Suite website](http://tsusiatsoftware.net/jts/main.html) and [JTS Plus](https://github.com/jiayuasu/JTSplus) for more details.
 
-## Thanks for the help from GeoSpark community
-We appreciate the help and suggestions from the following GeoSpark users (List is increasing..):
 
-* @gaufung
-* @lrojas94
-* @mdespriee
-* @sabman
-* @samchorlton
-* @Tsarazin
-* @TBuc
-* ...
 
+# Contact
+
+## Questions
 
+* Please join [![Join the chat at https://gitter.im/geospark-datasys/Lobby](https://badges.gitter.im/geospark-datasys/Lobby.svg)](https://gitter.im/geospark-datasys/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
 
-## Contact
+* Email us!
 
-### Contributors
+## Contributors
 * [Jia Yu](http://www.public.asu.edu/~jiayu2/) (Email: [email protected])
 
 * [Jinxuan Wu](http://www.public.asu.edu/~jinxuanw/) (Email: [email protected])
 
 * [Mohamed Sarwat](http://faculty.engineering.asu.edu/sarwat/) (Email: [email protected])
 
-### Project website
+## Project website
 Please visit [GeoSpark project wesbite](http://geospark.datasyslab.org) for latest news and releases.
 
-### Data Systems Lab
+## Data Systems Lab
 GeoSpark is one of the projects under [Data Systems Lab](http://www.datasyslab.org/) at Arizona State University. The mission of Data Systems Lab is designing and developing experimental data management systems (e.g., database systems).
+
+# Thanks for the help from GeoSpark community
+We appreciate the help and suggestions from GeoSpark users: [**Thanks List**](https://github.com/DataSystemsLab/GeoSpark/wiki/GeoSpark-Community-Thanks-List)
+
diff --git a/pom.xml b/pom.xml
@@ -3,7 +3,7 @@
 	<modelVersion>4.0.0</modelVersion>
 	<groupId>org.datasyslab</groupId>
 	<artifactId>geospark</artifactId>
-	<version>0.5.0-spark-1.x</version>
+	<version>0.5.1-spark-1.x</version>
 
 	<name>${project.groupId}:${project.artifactId}</name>
 	<description>Geospatial extension for Apache Spark</description>
@@ -58,7 +58,7 @@
 		<dependency>
 			<groupId>org.datasyslab</groupId>
 			<artifactId>JTSplus</artifactId>
-			<version>0.1.0</version>
+			<version>0.1.1</version>
 		</dependency>
 
 		<dependency>

diff --git a/src/main/java/org/datasyslab/babylon/core/ImageGenerator.java b/src/main/java/org/datasyslab/babylon/core/ImageGenerator.java
@@ -11,6 +11,7 @@
 import java.util.List;
 
 import org.apache.spark.api.java.JavaPairRDD;
+import org.datasyslab.babylon.utils.ImageType;
 
 import scala.Tuple2;
 
@@ -24,15 +25,16 @@ public abstract class ImageGenerator implements Serializable{
 	 *
 	 * @param distributedPixelImage the distributed pixel image
 	 * @param outputPath the output path
+	 * @param imageType the image type
 	 * @return true, if successful
 	 * @throws Exception the exception
 	 */
-	public boolean SaveAsFile(JavaPairRDD<Integer,ImageSerializableWrapper> distributedPixelImage, String outputPath) throws Exception
+	public boolean SaveAsFile(JavaPairRDD<Integer,ImageSerializableWrapper> distributedPixelImage, String outputPath, ImageType imageType) throws Exception
 	{
 		List<Tuple2<Integer,ImageSerializableWrapper>> imagePartitions = distributedPixelImage.collect();
 		for(Tuple2<Integer,ImageSerializableWrapper> imagePartition:imagePartitions)
 		{
-			this.SaveAsFile(imagePartition._2.image, outputPath+"-"+imagePartition._1);
+			this.SaveAsFile(imagePartition._2.image, outputPath+"-"+imagePartition._1, imageType);
 		}
 		return true;
 	}
@@ -42,8 +44,9 @@ public boolean SaveAsFile(JavaPairRDD<Integer,ImageSerializableWrapper> distribu
 	 *
 	 * @param pixelImage the pixel image
 	 * @param outputPath the output path
+	 * @param imageType the image type
 	 * @return true, if successful
 	 * @throws Exception the exception
 	 */
-	public abstract boolean SaveAsFile(BufferedImage pixelImage, String outputPath) throws Exception;
-}
+	public abstract boolean SaveAsFile(BufferedImage pixelImage, String outputPath, ImageType imageType) throws Exception;
+}
diff --git a/src/main/java/org/datasyslab/babylon/extension/imageGenerator/NativeJavaImageGenerator.java b/src/main/java/org/datasyslab/babylon/extension/imageGenerator/NativeJavaImageGenerator.java
@@ -13,6 +13,7 @@
 import javax.imageio.ImageIO;
 
 import org.datasyslab.babylon.core.ImageGenerator;
+import org.datasyslab.babylon.utils.ImageType;
 
 /**
  * The Class NativeJavaImageGenerator.
@@ -23,16 +24,16 @@ public class NativeJavaImageGenerator extends ImageGenerator{
 	 * @see org.datasyslab.babylon.core.ImageGenerator#SaveAsFile(java.awt.image.BufferedImage, java.lang.String)
 	 */
 	@Override
-	public boolean SaveAsFile(BufferedImage pixelImage, String outputPath) {
-		File outputImage = new File(outputPath+".png");
+	public boolean SaveAsFile(BufferedImage pixelImage, String outputPath, ImageType imageType) {
+		File outputImage = new File(outputPath+"."+imageType.getTypeName());
 		outputImage.getParentFile().mkdirs();
 		try {
-			ImageIO.write(pixelImage,"png",outputImage);
+			ImageIO.write(pixelImage,imageType.getTypeName(),outputImage);
 		} catch (IOException e) {
 			e.printStackTrace();
 		}
 		return true;
 	}
 
 
-}
+}
diff --git a/src/main/java/org/datasyslab/babylon/extension/imageGenerator/SparkImageGenerator.java b/src/main/java/org/datasyslab/babylon/extension/imageGenerator/SparkImageGenerator.java
@@ -11,6 +11,7 @@
 import org.apache.spark.api.java.JavaPairRDD;
 import org.datasyslab.babylon.core.ImageGenerator;
 import org.datasyslab.babylon.core.ImageSerializableWrapper;
+import org.datasyslab.babylon.utils.ImageType;
 
 /**
  * The Class SparkImageGenerator.
@@ -21,7 +22,7 @@ public class SparkImageGenerator extends ImageGenerator{
 	 * @see org.datasyslab.babylon.core.ImageGenerator#SaveAsFile(org.apache.spark.api.java.JavaPairRDD, java.lang.String)
 	 */
 	@Override
-	public boolean SaveAsFile(JavaPairRDD<Integer,ImageSerializableWrapper> distributedPixelImage, String outputPath)
+	public boolean SaveAsFile(JavaPairRDD<Integer,ImageSerializableWrapper> distributedPixelImage, String outputPath, ImageType imageType)
 	{
 		distributedPixelImage.saveAsObjectFile(outputPath);
 		return true;
@@ -31,9 +32,9 @@ public boolean SaveAsFile(JavaPairRDD<Integer,ImageSerializableWrapper> distribu
 	 * @see org.datasyslab.babylon.core.ImageGenerator#SaveAsFile(java.awt.image.BufferedImage, java.lang.String)
 	 */
 	@Override
-	public boolean SaveAsFile(BufferedImage pixelImage, String outputPath) throws Exception {
+	public boolean SaveAsFile(BufferedImage pixelImage, String outputPath, ImageType imageType) throws Exception {
 		throw new Exception("[SparkImageGenerator][SaveAsFile] This method hasn't been implemented yet.");
 	}
 
 
-}
+}