diff --git a/.nojekyll b/.nojekyll index 3b43721..a71339f 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -2ebd7c3c \ No newline at end of file +9b9ba697 \ No newline at end of file diff --git a/chapters/01_classification.html b/chapters/01_classification.html index fa400cc..2d9e82c 100644 --- a/chapters/01_classification.html +++ b/chapters/01_classification.html @@ -300,7 +300,7 @@

Table of contents

1.1 Data Acquisition

In this chapter, we will employ machine learning techniques to classify a scene using satellite imagery. Specifically, we will utilize scikit-learn to implement two distinct classifiers and subsequently compare their results. To begin, we need to import the following modules.

-
+
Code
from datetime import datetime, timedelta
@@ -337,7 +337,7 @@ 

1.1.1 Searching in the Catalog

The module odc-stac provides access to free, open source satelite data. To retrieve the data, we must define several parameters that specify the location and time period for the satellite data. Additionally, we must specify the data collection we wish to access, as multiple collections are available. In this example, we will use multispectral imagery from the Sentinel-2 satellite.

-
+
Code
dx = 0.0006  # 60m resolution
@@ -390,7 +390,7 @@ 

1.1.2 Loading the Data

Now we will load the data directly into an xarray dataset, which we can use to perform computations on the data. xarray is a powerful library for working with multi-dimensional arrays, making it well-suited for handling satellite data.

Here’s how we can load the data using odc-stac and xarray:

-
+
Code
# define a geobox for my region
@@ -417,7 +417,7 @@ 

1.2.1 RGB Image

With the image data now in our possession, we can proceed with computations and visualizations.

First, we define a mask to exclude cloud cover and areas with missing data. Subsequently, we create a composite median image, where each pixel value represents the median value across all the scenes we have identified. This approach helps to eliminate clouds and outliers present in some of the images, thereby providing a clearer and more representative visualization of the scene.

-
+
Code
# define a mask for valid pixels (non-cloud)
@@ -460,7 +460,7 @@ 

1.2.2 False Color Image

In addition to the regular RGB Image, we can swap any of the bands from the visible spectrum with any other bands. In this specific case the red band has been changed to the near infrared band. This allows us to see vegetated areas more clearly, since they now appear in a bright red color. This is due to the fact that plants absorb regular red light while reflecting near infrared light (NASA 2020).

-
+
Code
# compute the false color image
@@ -502,7 +502,7 @@ 

0.33 to 0.66 are moderatly healthy plants
  • 0.66 to 1 are very healthy plants
  • -
    +
    Code
    # Normalized Difference Vegetation Index (NDVI)
    @@ -529,7 +529,7 @@ 

    1.3.1 Regions of Interest

    Since this is a supervised classification, we need to have some training data. Therefore we need to define areas or regions, which we are certain represent the feature which we are classifiying. In this case we are interested in forested areas and regions that are definitly not forested. These regions will be used to train our classifiers.

    -
    +
    Code
    # Define Polygons
    @@ -581,7 +581,7 @@ 

    1.3.2 Data Preparation

    In addition to the Regions of Interest we will extract the specific bands from the loaded dataset that we intend to use for the classification, which are the red, green, blue and near-infrared bands, although other bands can also be utilized. Using these bands, we will create both a training and a testing dataset. The training dataset will be used to train the classifier, while the testing dataset will be employed to evaluate its performance.

    -
    +
    Code
    # Classifiying dataset (only necessary bands)
    @@ -628,7 +628,7 @@ 

    Now that we have prepared the training and testing data, we will create an image array of the actual scene that we intend to classify. This array will serve as the input for our classification algorithms, allowing us to apply the trained classifiers to the entire scene and identify the forested and non-forested areas accurately.

    -
    +
    Code
    image_data = ds_class[bands].to_array(dim='band').transpose('latitude', 'longitude', 'band')
    @@ -644,7 +644,7 @@ 

    1.3.3 Classifiying with Naive Bayes

    Now that we have prepared all the needed data, we can begin the actual classification process.

    We will start with a Naive Bayes classifier. First, we will train the classifier using our training dataset. Once trained, we will apply the classifier to the actual image to identify the forested and non-forested areas.

    -
    +
    Code
    # Naive Bayes initialization and training
    @@ -661,7 +661,7 @@ 

    +
    Code
    # Plot Naive Bayes
    @@ -735,7 +735,7 @@ 

    1.3.4 Classifiying with Random Forest

    To ensure our results are robust, we will explore an additional classifier. In this section, we will use the Random Forest classifier. The procedure for using this classifier is the same as before: we will train the classifier using our training dataset and then apply it to the actual image to classify the scene.

    -
    +
    Code
    # Random Forest initialization and training
    @@ -777,7 +777,7 @@ 

    Actual Negative -6304 -314 +6294 +324 Actual Positive -284 -5203 +287 +5200 @@ -818,7 +818,7 @@

    1.3.5 Comparison of the Classificators

    To gain a more in-depth understanding of the classifiers’ performance, we will compare their results. Specifically, we will identify the areas where both classifiers agree and the areas where they disagree. This comparison will provide valuable insights into the strengths and weaknesses of each classifier, allowing us to better assess their effectiveness in identifying forested and non-forested regions.

    -
    +
    Code
    cmap_trio = colors.ListedColormap(['whitesmoke' ,'indianred', 'goldenrod', 'darkgreen'])
    @@ -845,7 +845,7 @@ 

    +
    Code
    # Plot only one class, either None (0), Naive Bayes (1), Random Forest (2), or Both (3)
    @@ -872,7 +872,7 @@ 

    +
    Code
    counts = {}
    diff --git a/chapters/01_classification_files/figure-html/cell-13-output-1.png b/chapters/01_classification_files/figure-html/cell-13-output-1.png
    index 20f5171..b29e99f 100644
    Binary files a/chapters/01_classification_files/figure-html/cell-13-output-1.png and b/chapters/01_classification_files/figure-html/cell-13-output-1.png differ
    diff --git a/chapters/01_classification_files/figure-html/cell-14-output-1.png b/chapters/01_classification_files/figure-html/cell-14-output-1.png
    index 14f8dd2..479b634 100644
    Binary files a/chapters/01_classification_files/figure-html/cell-14-output-1.png and b/chapters/01_classification_files/figure-html/cell-14-output-1.png differ
    diff --git a/chapters/01_classification_files/figure-html/cell-15-output-1.png b/chapters/01_classification_files/figure-html/cell-15-output-1.png
    index dae0674..eab5b26 100644
    Binary files a/chapters/01_classification_files/figure-html/cell-15-output-1.png and b/chapters/01_classification_files/figure-html/cell-15-output-1.png differ
    diff --git a/chapters/01_classification_files/figure-html/cell-16-output-1.png b/chapters/01_classification_files/figure-html/cell-16-output-1.png
    index 39a46e8..d07cf77 100644
    Binary files a/chapters/01_classification_files/figure-html/cell-16-output-1.png and b/chapters/01_classification_files/figure-html/cell-16-output-1.png differ
    diff --git a/chapters/02_floodmapping.html b/chapters/02_floodmapping.html
    index 1c16e68..ab8b446 100644
    --- a/chapters/02_floodmapping.html
    +++ b/chapters/02_floodmapping.html
    @@ -289,7 +289,7 @@ 

    Table of contents

    Image from wikipedia
    -
    +
    %matplotlib widget
     
     import numpy as np
    @@ -301,7 +301,7 @@ 

    Table of contents

    from scipy.stats import norm from eomaps import Maps
    -
    +
    sig0_dc = xr.open_dataset('../data/s1_parameters/S1_CSAR_IWGRDH/SIG0/V1M1R1/EQUI7_EU020M/E054N006T3/SIG0_20180228T043908__VV_D080_E054N006T3_EU020M_V1M1R1_S1AIWGRDH_TUWIEN.nc')
    @@ -331,7 +331,7 @@

    @@ -501,7 +501,7 @@

    @@ -516,7 +516,7 @@

    \(\sigma^0\). These so-called posteriors need one more piece of information, as can be seen in the equation above. We need the probability that a pixel is flooded \(P(F)\) or not flooded \(P(NF)\). Of course, these are the figures we’ve been trying to find this whole time. We don’t actually have them yet, so what can we do? In Bayesian statistics, we can just start with our best guess. These guesses are called our “priors”, because they are the beliefs we hold prior to looking at the data. This subjective prior belief is the foundation Bayesian statistics, and we use the likelihoods we just calculated to update our belief in this particular hypothesis. This updated belief is called the “posterior”.

    Let’s say that our best estimate for the chance of flooding versus non-flooding of a pixel is 50-50: a coin flip. We now can also calculate the probability of backscattering \(P(\sigma^0)\), as the weighted average of the water and land likelihoods, ensuring that our posteriors range between 0 to 1.

    The following code block shows how we calculate the priors.

    -
    +
    def calc_posteriors(water_likelihood, land_likelihood):
         evidence = (water_likelihood * 0.5) + (land_likelihood * 0.5)
         return (water_likelihood * 0.5) / evidence,  (land_likelihood * 0.5) / evidence
    @@ -528,7 +528,7 @@

    @@ -541,7 +541,7 @@

    2.5 Flood Classification

    We are now ready to combine all this information and classify the pixels according to the probability of flooding given the backscatter value of each pixel. Here we just look whether the probability of flooding is higher than non-flooding:

    -
    +
    def bayesian_flood_decision(id, sig0_dc):
         nf_post_prob, f_post_prob = calc_posteriors(*calc_likelihoods(id,  sig0_dc))
         return np.greater(f_post_prob, nf_post_prob)
    @@ -553,7 +553,7 @@

    @@ -576,7 +576,7 @@