Skip to content

Commit

Permalink
[SEDONA-668] Drop the support of Spark 3.0, 3.1, 3.2 (#1653)
Browse files Browse the repository at this point in the history
* Push the change

* Fix import orders

* Revert "Fix import orders"

This reverts commit 12443f0.

* Fix lint
  • Loading branch information
jiayuasu authored Oct 27, 2024
1 parent 4841279 commit 3d0d54d
Show file tree
Hide file tree
Showing 151 changed files with 65 additions and 26,188 deletions.
12 changes: 0 additions & 12 deletions .github/workflows/java.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,18 +68,6 @@ jobs:
scala: 2.12.15
jdk: '8'
skipTests: ''
- spark: 3.2.3
scala: 2.12.15
jdk: '8'
skipTests: ''
- spark: 3.1.2
scala: 2.12.15
jdk: '8'
skipTests: ''
- spark: 3.0.3
scala: 2.12.15
jdk: '8'
skipTests: ''
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
Expand Down
9 changes: 0 additions & 9 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,15 +70,6 @@ jobs:
- spark: '3.3.0'
scala: '2.12.8'
python: '3.8'
- spark: '3.2.0'
scala: '2.12.8'
python: '3.7'
- spark: '3.1.2'
scala: '2.12.8'
python: '3.7'
- spark: '3.0.3'
scala: '2.12.8'
python: '3.7'
env:
VENV_PATH: /home/runner/.local/share/virtualenvs/python-${{ matrix.python }}
steps:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/r.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ jobs:
strategy:
fail-fast: true
matrix:
spark: [3.0.3, 3.1.2, 3.2.1, 3.3.0, 3.4.0, 3.5.0]
spark: [3.3.0, 3.4.0, 3.5.0]
hadoop: [3]
scala: [2.12.15]
r: [oldrel, release]
Expand Down
2 changes: 1 addition & 1 deletion docs/community/develop.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Make sure you reload the `pom.xml` or reload the maven project. The IDE will ask

In a terminal, go to the Sedona root folder. Run `mvn clean install`. All tests will take more than 15 minutes. To only build the project jars, run `mvn clean install -DskipTests`.
!!!Note
`mvn clean install` will compile Sedona with Spark 3.0 and Scala 2.12. If you have a different version of Spark in $SPARK_HOME, make sure to specify that using -Dspark command line arg.
`mvn clean install` will compile Sedona with Spark 3.3 and Scala 2.12. If you have a different version of Spark in $SPARK_HOME, make sure to specify that using -Dspark command line arg.
For example, to compile sedona with Spark 3.4 and Scala 2.12, use: `mvn clean install -Dspark=3.4 -Dscala=2.12`

More details can be found on [Compile Sedona](../setup/compile.md)
Expand Down
8 changes: 4 additions & 4 deletions docs/community/snapshot.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ rm -f pom.xml.*
mvn -q -B clean release:prepare -Dtag={{ sedona_create_release.current_git_tag }} -DreleaseVersion={{ sedona_create_release.current_version }} -DdevelopmentVersion={{ sedona_create_release.current_snapshot }} -Dresume=false -DdryRun=true -Penable-all-submodules -Darguments="-DskipTests"
mvn -q -B release:clean -Penable-all-submodules

# Spark 3.0 and Scala 2.12
mvn -q deploy -DskipTests -Dspark=3.0 -Dscala=2.12
# Spark 3.3 and Scala 2.12
mvn -q deploy -DskipTests -Dspark=3.3 -Dscala=2.12

# Spark 3.0 and Scala 2.13
mvn -q deploy -DskipTests -Dspark=3.0 -Dscala=2.13
# Spark 3.3 and Scala 2.13
mvn -q deploy -DskipTests -Dspark=3.3 -Dscala=2.13

# Spark 3.4 and Scala 2.12
mvn -q deploy -DskipTests -Dspark=3.4 -Dscala=2.12
Expand Down
22 changes: 7 additions & 15 deletions docs/setup/compile.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,33 +29,25 @@ To compile all modules, please make sure you are in the root folder of all modul
Geotools jars will be packaged into the produced fat jars.

!!!note
By default, this command will compile Sedona with Spark 3.0 and Scala 2.12
By default, this command will compile Sedona with Spark 3.3 and Scala 2.12

### Compile with different targets

User can specify `-Dspark` and `-Dscala` command line options to compile with different targets. Available targets are:

* `-Dspark`: `3.0` for Spark 3.0 to 3.3; `{major}.{minor}` for Spark 3.4 or later. For example, specify `-Dspark=3.4` to build for Spark 3.4.
* `-Dspark`: `{major}.{minor}`: For example, specify `-Dspark=3.4` to build for Spark 3.4.
* `-Dscala`: `2.12` or `2.13`

=== "Spark 3.0 to 3.3 Scala 2.12"
=== "Spark 3.3+ Scala 2.12"
```
mvn clean install -DskipTests -Dspark=3.0 -Dscala=2.12
mvn clean install -DskipTests -Dspark=3.3 -Dscala=2.12
```
=== "Spark 3.4+ Scala 2.12"
```
mvn clean install -DskipTests -Dspark=3.4 -Dscala=2.12
```
Please replace `3.4` with Spark major.minor version when building for higher Spark versions.
=== "Spark 3.0 to 3.3 Scala 2.13"
```
mvn clean install -DskipTests -Dspark=3.0 -Dscala=2.13
```
=== "Spark 3.4+ Scala 2.13"
Please replace `3.3` with Spark major.minor version when building for higher Spark versions.
=== "Spark 3.3+ Scala 2.13"
```
mvn clean install -DskipTests -Dspark=3.4 -Dscala=2.13
```
Please replace `3.4` with Spark major.minor version when building for higher Spark versions.
Please replace `3.3` with Spark major.minor version when building for higher Spark versions.

!!!tip
To get the Sedona Spark Shaded jar with all GeoTools jars included, simply append `-Dgeotools` option. The command is like this:`mvn clean install -DskipTests -Dscala=2.12 -Dspark=3.0 -Dgeotools`
Expand Down
2 changes: 1 addition & 1 deletion docs/setup/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ Example:

### Notes

This docker image can only be built against Sedona 1.4.1+ and Spark 3.0+
This docker image can only be built against Sedona 1.7.0+ and Spark 3.3+

## Cluster Configuration

Expand Down
7 changes: 2 additions & 5 deletions docs/setup/emr.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In your S3 bucket, add a script that has the following content:
sudo mkdir /jars

# Download Sedona jar
sudo curl -o /jars/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar "https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar"
sudo curl -o /jars/sedona-spark-shaded-3.3_2.12-{{ sedona.current_version }}.jar "https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.3_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.3_2.12-{{ sedona.current_version }}.jar"

# Download GeoTools jar
sudo curl -o /jars/geotools-wrapper-{{ sedona.current_geotools }}.jar "https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar"
Expand All @@ -41,7 +41,7 @@ When you create an EMR cluster, in the software configuration, add the following
{
"Classification":"spark-defaults",
"Properties":{
"spark.yarn.dist.jars": "/jars/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar,/jars/geotools-wrapper-{{ sedona.current_geotools }}.jar",
"spark.yarn.dist.jars": "/jars/sedona-spark-shaded-3.3_2.12-{{ sedona.current_version }}.jar,/jars/geotools-wrapper-{{ sedona.current_geotools }}.jar",
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.sedona.core.serde.SedonaKryoRegistrator",
"spark.sql.extensions": "org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"
Expand All @@ -50,9 +50,6 @@ When you create an EMR cluster, in the software configuration, add the following
]
```

!!!note
If you use Sedona 1.3.1-incubating, please use `sedona-python-adapter-3.0_2.12` jar in the content above, instead of `sedona-spark-shaded-3.0_2.12`.

## Verify installation

After the cluster is created, you can verify the installation by running the following code in a Jupyter notebook:
Expand Down
9 changes: 4 additions & 5 deletions docs/setup/glue.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,12 @@ and Python 3.10. We recommend Sedona-1.3.1-incubating and above for Glue.

You will need to point your glue job to the Sedona and Geotools jars. We recommend using the jars available from maven. The links below are those intended for Glue 4.0

Sedona Jar: [Maven Central](https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar)
Sedona Jar: [Maven Central](https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.3_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.3_2.12-{{ sedona.current_version }}.jar)

Geotools Jar: [Maven Central](https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar)

!!!note
If you use Sedona 1.3.1-incubating, please use `sedona-python-adapter-3.0_2.12` jar in the content above, instead
of `sedona-spark-shaded-3.0_2.12`. Ensure you pick a version for Scala 2.12 and Spark 3.0. The Spark 3.4 and Scala
Ensure you pick a version for Scala 2.12 and Spark 3.3. The Spark 3.4 and Scala
2.13 jars are not compatible with Glue 4.0.

## Configure Glue Job
Expand All @@ -34,7 +33,7 @@ and the second installs the Sedona Python package directly from pip.

```python
# Sedona Config
%extra_jars https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar, https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar
%extra_jars https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.3_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.3_2.12-{{ sedona.current_version }}.jar, https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar
%additional_python_modules apache-sedona=={{ sedona.current_version }}
```

Expand All @@ -47,7 +46,7 @@ If you are using the example notebook from glue, the first cell should now look
%number_of_workers 5

# Sedona Config
%extra_jars https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.0_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar, https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar
%extra_jars https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.3_2.12/{{ sedona.current_version }}/sedona-spark-shaded-3.3_2.12-{{ sedona.current_version }}.jar, https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/{{ sedona.current_geotools }}/geotools-wrapper-{{ sedona.current_geotools }}.jar
%additional_python_modules apache-sedona=={{ sedona.current_version }}


Expand Down
7 changes: 3 additions & 4 deletions docs/setup/install-python.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,7 @@ python3 setup.py install

Sedona Python needs one additional jar file called `sedona-spark-shaded` or `sedona-spark` to work properly. Please make sure you use the correct version for Spark and Scala.

* For Spark 3.0 to 3.3 and Scala 2.12, it is called `sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar` or `sedona-spark-3.0_2.12-{{ sedona.current_version }}.jar`
* For Spark 3.4+ and Scala 2.12, it is called `sedona-spark-shaded-3.4_2.12-{{ sedona.current_version }}.jar` or `sedona-spark-3.4_2.12-{{ sedona.current_version }}.jar`. If you are using Spark versions higher than 3.4, please replace the `3.4` in artifact names with the corresponding major.minor version numbers.
Please use Spark major.minor version number in artifact names.

You can get it using one of the following methods:

Expand All @@ -48,7 +47,7 @@ You can get it using one of the following methods:
from sedona.spark import *
config = SedonaContext.builder(). \
config('spark.jars.packages',
'org.apache.sedona:sedona-spark-3.0_2.12:{{ sedona.current_version }},'
'org.apache.sedona:sedona-spark-3.3_2.12:{{ sedona.current_version }},'
'org.datasyslab:geotools-wrapper:{{ sedona.current_geotools }}'). \
config('spark.jars.repositories', 'https://artifacts.unidata.ucar.edu/repository/unidata-all'). \
getOrCreate()
Expand All @@ -69,7 +68,7 @@ spark = SparkSession. \
config("spark.serializer", KryoSerializer.getName). \
config("spark.kryo.registrator", SedonaKryoRegistrator.getName). \
config('spark.jars.packages',
'org.apache.sedona:sedona-spark-shaded-3.0_2.12:{{ sedona.current_version }},'
'org.apache.sedona:sedona-spark-shaded-3.3_2.12:{{ sedona.current_version }},'
'org.datasyslab:geotools-wrapper:{{ sedona.current_geotools }}'). \
getOrCreate()
SedonaRegistrator.registerAll(spark)
Expand Down
10 changes: 5 additions & 5 deletions docs/setup/install-scala.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ Please refer to [Sedona Maven Central coordinates](maven-coordinates.md) to sele

* Local mode: test Sedona without setting up a cluster
```
./bin/spark-shell --packages org.apache.sedona:sedona-spark-shaded-3.0_2.12:{{ sedona.current_version }},org.datasyslab:geotools-wrapper:{{ sedona.current_geotools }}
./bin/spark-shell --packages org.apache.sedona:sedona-spark-shaded-3.3_2.12:{{ sedona.current_version }},org.datasyslab:geotools-wrapper:{{ sedona.current_geotools }}
```

* Cluster mode: you need to specify Spark Master IP
```
./bin/spark-shell --master spark://localhost:7077 --packages org.apache.sedona:sedona-spark-shaded-3.0_2.12:{{ sedona.current_version }},org.datasyslab:geotools-wrapper:{{ sedona.current_geotools }}
./bin/spark-shell --master spark://localhost:7077 --packages org.apache.sedona:sedona-spark-shaded-3.3_2.12:{{ sedona.current_version }},org.datasyslab:geotools-wrapper:{{ sedona.current_geotools }}
```

### Download Sedona jar manually
Expand All @@ -42,16 +42,16 @@ Please refer to [Sedona Maven Central coordinates](maven-coordinates.md) to sele
./bin/spark-shell --jars /Path/To/SedonaJars.jar
```

If you are using Spark 3.0 to 3.3, please use jars with filenames containing `3.0`, such as `sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}`; If you are using Spark 3.4 or higher versions, please use jars with Spark major.minor versions in the filename, such as `sedona-spark-shaded-3.4_2.12-{{ sedona.current_version }}`.
Please use jars with Spark major.minor versions in the filename, such as `sedona-spark-shaded-3.3_2.12-{{ sedona.current_version }}`.

* Local mode: test Sedona without setting up a cluster
```
./bin/spark-shell --jars /path/to/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar,/path/to/geotools-wrapper-{{ sedona.current_geotools }}.jar
./bin/spark-shell --jars /path/to/sedona-spark-shaded-3.3_2.12-{{ sedona.current_version }}.jar,/path/to/geotools-wrapper-{{ sedona.current_geotools }}.jar
```

* Cluster mode: you need to specify Spark Master IP
```
./bin/spark-shell --master spark://localhost:7077 --jars /path/to/sedona-spark-shaded-3.0_2.12-{{ sedona.current_version }}.jar,/path/to/geotools-wrapper-{{ sedona.current_geotools }}.jar
./bin/spark-shell --master spark://localhost:7077 --jars /path/to/sedona-spark-shaded-3.3_2.12-{{ sedona.current_version }}.jar,/path/to/geotools-wrapper-{{ sedona.current_geotools }}.jar
```

## Spark SQL shell
Expand Down
22 changes: 10 additions & 12 deletions docs/setup/maven-coordinates.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,21 +10,20 @@

Apache Sedona provides different packages for each supported version of Spark.

* For Spark 3.0 to 3.3, the artifact to use should be `sedona-spark-shaded-3.0_2.12`.
* For Spark 3.4 or higher versions, please use the artifact with Spark major.minor version in the artifact name. For example, for Spark 3.4, the artifacts to use should be `sedona-spark-shaded-3.4_2.12`.
Please use the artifact with Spark major.minor version in the artifact name. For example, for Spark 3.4, the artifacts to use should be `sedona-spark-shaded-3.4_2.12`.

If you are using the Scala 2.13 builds of Spark, please use the corresponding packages for Scala 2.13, which are suffixed by `_2.13`.

The optional GeoTools library is required if you want to use CRS transformation, ShapefileReader or GeoTiff reader. This wrapper library is a re-distribution of GeoTools official jars. The only purpose of this library is to bring GeoTools jars from OSGEO repository to Maven Central. This library is under GNU Lesser General Public License (LGPL) license so we cannot package it in Sedona official release.

!!! abstract "Sedona with Apache Spark and Scala 2.12"

=== "Spark 3.0 to 3.3 and Scala 2.12"
=== "Spark 3.3 and Scala 2.12"

```xml
<dependency>
<groupId>org.apache.sedona</groupId>
<artifactId>sedona-spark-shaded-3.0_2.12</artifactId>
<artifactId>sedona-spark-shaded-3.3_2.12</artifactId>
<version>{{ sedona.current_version }}</version>
</dependency>
<!-- Optional: https://mvnrepository.com/artifact/org.datasyslab/geotools-wrapper -->
Expand Down Expand Up @@ -68,12 +67,12 @@ The optional GeoTools library is required if you want to use CRS transformation,

!!! abstract "Sedona with Apache Spark and Scala 2.13"

=== "Spark 3.0 to 3.3 and Scala 2.13"
=== "Spark 3.3 and Scala 2.13"

```xml
<dependency>
<groupId>org.apache.sedona</groupId>
<artifactId>sedona-spark-shaded-3.0_2.13</artifactId>
<artifactId>sedona-spark-shaded-3.3_2.13</artifactId>
<version>{{ sedona.current_version }}</version>
</dependency>
<!-- Optional: https://mvnrepository.com/artifact/org.datasyslab/geotools-wrapper -->
Expand Down Expand Up @@ -204,20 +203,19 @@ Under BSD 3-clause (compatible with Apache 2.0 license)

Apache Sedona provides different packages for each supported version of Spark.

* For Spark 3.0 to 3.3, the artifacts to use should be `sedona-spark-3.0_2.12`.
* For Spark 3.4 or higher versions, please use the artifacts with Spark major.minor version in the artifact name. For example, for Spark 3.4, the artifacts to use should be `sedona-spark-3.4_2.12`.
Please use the artifacts with Spark major.minor version in the artifact name. For example, for Spark 3.4, the artifacts to use should be `sedona-spark-3.4_2.12`.

If you are using the Scala 2.13 builds of Spark, please use the corresponding packages for Scala 2.13, which are suffixed by `_2.13`.

The optional GeoTools library is required if you want to use CRS transformation, ShapefileReader or GeoTiff reader. This wrapper library is a re-distribution of GeoTools official jars. The only purpose of this library is to bring GeoTools jars from OSGEO repository to Maven Central. This library is under GNU Lesser General Public License (LGPL) license, so we cannot package it in Sedona official release.

!!! abstract "Sedona with Apache Spark and Scala 2.12"

=== "Spark 3.0 to 3.3 and Scala 2.12"
=== "Spark 3.3 and Scala 2.12"
```xml
<dependency>
<groupId>org.apache.sedona</groupId>
<artifactId>sedona-spark-3.0_2.12</artifactId>
<artifactId>sedona-spark-3.3_2.12</artifactId>
<version>{{ sedona.current_version }}</version>
</dependency>
<dependency>
Expand Down Expand Up @@ -255,11 +253,11 @@ The optional GeoTools library is required if you want to use CRS transformation,

!!! abstract "Sedona with Apache Spark and Scala 2.13"

=== "Spark 3.0+ and Scala 2.13"
=== "Spark 3.3 and Scala 2.13"
```xml
<dependency>
<groupId>org.apache.sedona</groupId>
<artifactId>sedona-spark-3.0_2.13</artifactId>
<artifactId>sedona-spark-3.3_2.13</artifactId>
<version>{{ sedona.current_version }}</version>
</dependency>
<dependency>
Expand Down
Loading

0 comments on commit 3d0d54d

Please sign in to comment.