+ + + +
+ +
+
+ +
+
+ +
+ +
+ +
+ + +
+ +
+ +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ +
+
+ + + + + + + + +
+ +
+

Tutorial 2 - scientific Python ecosystem: pandas and GeoPandas#

+

In this tutorial we will learn using

+
    +
  1. pandas: tabular data stored in pandas.DataFrames

  2. +
  3. GeoPandas: aspatial data stored in geopandas.GeoDataFrames

  4. +
+

within PyGMT to create histograms and different maps.

+
+

This tutorial is part of the AGU2024 annual meeting GMT/PyGMT pre-conference workshop (PREWS9) Mastering Geospatial Visualizations with GMT/PyGMT

+ +
+

0. General stuff#

+

Import the required packages, besides PyGMT itself, we use pandas and GeoPandas:

+
+
+
import pygmt
+
+import pandas as pd
+import geopandas as gpd
+
+# Use a resolution of only 150 dpi for the images within the Jupyter notebook, to keep the file small
+img_dpi = 150
+
+
+
+
+
+
+

1. pandas#

+
+

1.1 Tabular data - pandas.DataFrame#

+

Use an example dataset with tabular data provided by PyGMT and load it into a pandas.DataFrame. This dataset contains earthquakes in the area of Japan.

+
+
+
df_jp_eqs = pygmt.datasets.load_sample_data(name="japan_quakes")
+df_jp_eqs.head()
+
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
yearmonthdaylatitudelongitudedepth_kmmagnitude
019871449.77149.294894.1
119871939.90141.68676.8
219871939.82141.64844.0
3198711442.56142.851026.5
4198711642.79145.10545.1
+
+
+
+
+

1.2 Create a Cartesian histogram#

+

First we create a Cartesian histogram for the moment magnitude distribution of all earthquakes in the dataset.

+
+
+
mag_min = df_jp_eqs["magnitude"].min(); print(mag_min)
+mag_max = df_jp_eqs["magnitude"].max(); print(mag_max)
+
+
+
+
+
4.0
+6.8
+
+
+
+
+
+
+
# Choose the bin width of the bars; feel free to play around with this value
+step_histo = 0.2
+
+fig = pygmt.Figure()
+
+fig.histogram(
+    data=df_jp_eqs["magnitude"],
+    projection="X10c",
+    # Determine y range automatically
+    region=[mag_min - step_histo, mag_max + step_histo * 2, 0, 0],
+    # Define the frame, add labels to the x-axis and y-axis
+    frame=["WSne", "x+lmoment magnitude", "y+lcounts"],
+    # Generate evenly spaced bins
+    series=step_histo,
+    # Fill bars with color "orange"
+    fill="orange",
+    # Draw gray outlines with a width of 1 point
+    pen="1p,gray",
+    # Choose histogram type 0, i.e., counts [Default]
+    histtype=0,
+)
+
+fig.show(dpi=img_dpi)
+
+
+
+
+_images/874f28600e9bef5e0c2434c4512f149282202993569ce8dbee97fc509c790c92.png +
+
+

Use code example above as orientation, and create a similar histogram showing the hypocentral depth distribution for all earthquakes in the dataset.

+
+
+
# Your code (:
+
+
+
+
+

For details on creating Cartesian histograms you may find the tutorial Cartesian histograms helpful.

+
+
+

1.3 Create a geographical map of the earthquakes#

+

Now it’s time to create a geographical map showing the earthquakes. You can start with using pygmt.Figure.coast and pygmt.Figure.plot. For plotting the earthquakes as size (moment magnitude) and color (hypocentral depth) coded circles on top of the map please follow the tutorial Plotting data points.

+
+
+
# Your code (:
+
+
+
+
+
+
+
+

2. GeoPandas#

+
+

2.1 Line geometry#

+

Features which can be represented as a line geometry are for example rivers, roads, national boundaries, shorlines, and any kind of profiles.

+
+

2.1.1 Aspatial Data - geopandas.GeoDataFrame with line geometry#

+

First we download some data into in a geopandas.GeoDataFrame. This dataset contains European rivers with its length and name.

+
+
+
gpd_rivers_org = gpd.read_file(
+    "https://www.eea.europa.eu/data-and-maps/data/wise-large-rivers-and-large-lakes/" + \
+    "zipped-shapefile-with-wise-large-rivers-vector-line/zipped-shapefile-with-wise-large-rivers-vector-line/" + \
+    "at_download/file/wise_large_rivers.zip"
+)
+gpd_rivers_org.head()
+
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NAMEShape_Lenggeometry
0Danube2.770357e+06MULTILINESTRING ((4185683.29 2775788.04, 41861...
1Douro8.162452e+05MULTILINESTRING ((2764963.81 2199037.624, 2766...
2Ebro8.269909e+05MULTILINESTRING ((3178371.814 2315100.781, 317...
3Elbe1.087288e+06MULTILINESTRING ((4235352.373 3422319.986, 423...
4Guadalquivir5.997583e+05MULTILINESTRING ((2859329.283 1682737.074, 286...
+
+
+

Have a look at the values in the geometry column. The coordinates are currently not given in the geographic coordinate reference system (longitude/latitude) and have to be converted. This can be done directly with GeoPandas.

+
+
+
gpd_rivers_org.crs
+gpd_rivers = gpd_rivers_org.to_crs('EPSG:4326')
+gpd_rivers.head()
+
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NAMEShape_Lenggeometry
0Danube2.770357e+06MULTILINESTRING ((8.1846 48.0807, 8.19049 48.0...
1Douro8.162452e+05MULTILINESTRING ((-8.67141 41.14934, -8.64362 ...
2Ebro8.269909e+05MULTILINESTRING ((-4.05971 42.97715, -4.06841 ...
3Elbe1.087288e+06MULTILINESTRING ((8.69715 53.90109, 8.72716 53...
4Guadalquivir5.997583e+05MULTILINESTRING ((-6.37899 36.80363, -6.34806 ...
+
+
+
+
+

2.1.2 Create a geographical map of the rivers#

+

Let’s plot the rivers represented on top of a map. The geopandas.DataFrame can be directly passed to the data parameter of pygmt.Figure.plot. Use the pen parameter to adjust the lines. The string argument has the form “width,color,style”. Different line styles are explained in the Gallery example Line styles.

+
+
+
fig = pygmt.Figure()
+
+fig.coast(
+    projection="M10c", 
+    region=[-10, 30, 35, 57],
+    land="gray99",
+    shorelines="1/0.1p,gray50",
+    frame=True,
+)
+
+fig.plot(data=gpd_rivers, pen="0.5p,steelblue,solid")
+
+fig.show(dpi=img_dpi)
+
+
+
+
+_images/cf588072815c329d79c2052d16069ba2989dfee2076edded906502f6a9816ee2.png +
+
+
+
+

2.1.3 Plot subsets of the rivers differently#

+

Now we want to filter the dataset based on the river length and plot the subsets differently.

+

To indicate the different subsets adding a legend to the plot is helpful. Therefore, we specify the text for the legend entries via the label parameter of pygmt.Figure.plot. Note, that for the first plotted element, additionally the heading (+H) and the font size of the heading (+f) are specified. Placing the legend on the figure is done via the position parameter of pygmt.Figure.legend. A box can be added via the box parameter. For further explanation feel free to look at our gallery example Legend.

+
+
+
# Split the dataset into two subsets of shorter and longer rivers
+# Feel free to play around with the limit
+len_limit = 700000  # in meters
+gpd_rivers_short = gpd_rivers[gpd_rivers["Shape_Leng"] < len_limit]
+gpd_rivers_long = gpd_rivers[gpd_rivers["Shape_Leng"] > len_limit]
+
+
+fig = pygmt.Figure()
+
+fig.coast(
+    projection="M10c", 
+    region=[-10, 35, 35, 58],
+    land="gray99",
+    shorelines="1/0.1p,gray50",
+    frame=True,
+)
+
+# Plot the subsets differently and specify the text for the legend entries
+fig.plot(data=gpd_rivers_short, pen="0.5p,orange", label=f"shorter {len_limit} m+Hriver length+f9p")
+fig.plot(data=gpd_rivers_long, pen="0.5p,darkred", label=f"longer {len_limit} m")
+
+# Place the legend at the Top Left corner with an offset of 0.1 centimeters from the map frame
+# Add a box with semi-transparent (@30) white fill (+g) and a 0.1-points thick gray outline (+p)
+fig.legend(position="jTL+o0.1c", box="+gwhite@30+p0.1p,gray")
+
+fig.show(dpi=img_dpi)
+
+
+
+
+_images/3844bc04880f1c0b3a90179cd789cde13ad0d25aff650d43fd9d840114d063fe.png +
+
+
+
+

2.1.4 Plot the rivers with color-coding for the river length#

+

Use the gallery example Line colors with a custom CPT to plot the rivers with color-coding for the river length.

+
+
+
# Your code (:
+
+
+
+
+
+
+
+

2.2 Polygon geometry#

+

Any kind of areas, like continents, countries, and states can be stored as polygon geometry.

+
+

2.2.1 Aspatial Data - geopandas.GeoDataFrame with polygon geometry#

+

Again we download some data into in a geopandas.GeoDataFrame. This dataset contains information regarding airbnb rentals, socioeconomics, and crime in Chicagos. +This time we are lucky and the data is directly provided in the geographic coordinate reference system (longitude/latitude) and no further coordinate transformation is needed.

+
+
+
gdf_airbnb = gpd.read_file("https://geodacenter.github.io/data-and-lab/data/airbnb.zip")
+gdf_airbnb.head()
+
+
+
+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
communityshape_areashape_lenAREAIDresponse_raccept_rrev_ratingprice_pproom_typenum_spots...crowdeddependencywithout_hsunemployedincome_pcharship_innum_crimesnum_theftpopulationgeometry
0DOUGLAS46004621.158131027.05450983598.77142994.51428687.77777878.1578951.78947438...1.830.714.318.223791475013124118238POLYGON ((-87.60914 41.84469, -87.60915 41.844...
1OAKLAND16913961.040819565.50615333699.20000090.10526388.81250053.7750001.85000020...1.340.418.428.7192527813063115918POLYGON ((-87.59215 41.81693, -87.59231 41.816...
2FULLER PARK19916704.8692None3768.000000NaN91.75000084.0000001.8333336...3.244.926.633.9104329717643832876POLYGON ((-87.6288 41.80189, -87.62879 41.8017...
3GRAND BOULEVARD48492503.155428196.83715733894.03703783.61538592.750000119.5333331.53333330...3.339.515.924.323472576416142821929POLYGON ((-87.60671 41.81681, -87.6067 41.8165...
4KENWOOD29071741.928323325.16790623992.54285788.14285790.65625077.9914531.61538539...2.435.411.315.73591126271365417841POLYGON ((-87.59215 41.81693, -87.59215 41.816...
+

5 rows × 21 columns

+
+
+
+
+

2.2.2 Create a choropleth map#

+

Here we are going to create a so-called choropleth map. Such a visualization is helpful to show a quantity which vary between subregions of a study area, e.g., countries or states. The dataset contains several columns, here we will focus on the column "population", but feel free to modify the code below and use another quantity for the color-coding!

+
+
+
popul_min = gdf_airbnb["population"].min(); print(popul_min)
+popul_max = gdf_airbnb["population"].max(); print(popul_max)
+
+
+
+
+
2876
+98514
+
+
+
+
+
+
+
fig = pygmt.Figure()
+
+# frame=True adds a frame using the default settings
+# Use "rltb" (right, left, top, bottom) to plot no frame at all which
+# can be useful when inserting the figure within a document
+fig.basemap(region=[-88, -87.5, 41.62, 42.05], projection="M10c", frame=True)
+
+# Set up colormap
+pygmt.makecpt(cmap="SCM/bilbao", series=[popul_min, popul_max, 10])
+# Add vertical colorbar at left side
+fig.colorbar(frame="x+lpopulation", position="jLM+v")
+
+# Plot the polygons with color-coding for the population
+fig.plot(
+    data=gdf_airbnb, 
+    pen="0.2p,gray10", 
+    fill="+z", 
+    cmap=True,
+    aspatial="Z=population",
+)
+
+fig.show(dpi=img_dpi)
+
+
+
+
+_images/7f18327908c8dd210197cc51845e45a933f356b9fd12bd029a4a8cbda080eb2b.png +
+
+
+
+
+
+

3. Additional comments#

+

Some helpful and interesting aspects:

+
    +
  • Use suitable colormaps for your data: Scientific colourmaps by Fabio Crameri, see also the publications Crameri et al. 2024 and Crameri et al. 2020

  • +
  • Datasets provided by GeoPandas: Checkout the geodatasets libaray

  • +
  • Convert other objects to pandas or GeoPandas objects to make them usable in PyGMT: For example, convert OSMnx’s MultiDiGraph to a geopandas.DataFrame

  • +
  • Create more complex geometries: Combine GeoPandas with shapely (i.e., from shapely.geometry import Polygon)

  • +
  • Support of OGR formats: Use geopandas.read_file to load data provided as shapefile (.shp), GeoJSON (.geojson), geopackage (.gpkg), etc.

  • +
+
+
+

4. Orientation / suggestion for “2.1.4 Plot the rivers with color-coding for the river length”#

+

Below you find a rough code shipset for plotting the rives with color-coding in section 2.1.4.

+
+
+
fig = pygmt.Figure()
+
+fig.coast(
+    projection="M10c", 
+    region=[-10, 35, 35, 58],
+    land="gray99",
+    shorelines="1/0.1p,gray50",
+    frame=True,
+)
+
+pygmt.makecpt(cmap="SCM/oslo",series=[gpd_rivers.Shape_Leng.min(), 1500000], reverse=True)
+fig.colorbar(frame=["x+lriver length", "y+lm"], position="+ef0.2c")
+
+for i_river in range(len(gpd_rivers)):
+    fig.plot(
+        data=gpd_rivers[gpd_rivers.index==i_river],
+        zvalue=gpd_rivers.loc[i_river, "Shape_Leng"],
+        pen="0.5p",
+        cmap=True,
+    )
+
+fig.show(dpi=img_dpi)
+
+
+
+
+_images/fde4e75320e490f1f57c57564f4ac1cda80a392ed98715c482199e892dd16701.png +
+
+
+
+ + + + +
+ + + + + + + + +
+ + + + + + +
+ + + +