diff --git a/index.Rmd b/index.Rmd index 98eacb9..708fe1c 100644 --- a/index.Rmd +++ b/index.Rmd @@ -11,7 +11,12 @@ output: highlight.chooser: TRUE --- -# [WUR Geoscripting](https://geoscripting-wur.github.io/) + + +# [WUR Geoscripting](https://geoscripting-wur.github.io/) # Week 1, Lesson 4: Intro To Raster @@ -396,7 +401,7 @@ cloud <- tahiti[[7]] cloud[cloud == 0] <- NA ## Plot the stack and the cloud mask on top of each other -plotRGB(tahiti, 3,4,5) +plotRGB(tahiti, 3,4,5, stretch="lin") plot(cloud, add = TRUE, legend = FALSE) ``` @@ -437,7 +442,7 @@ tahiti6_2 <- dropLayer(tahiti, 7) tahitiCloudFree <- overlay(x = tahiti6_2, y = fmask, fun = cloud2NA) ## Visualize the output -plotRGB(tahitiCloudFree, 3,4,5) +plotRGB(tahitiCloudFree, 3,4,5, stretch="lin") ``` diff --git a/index.html b/index.html index bbb3e6e..c6b07b8 100644 --- a/index.html +++ b/index.html @@ -6,27 +6,166 @@
Raster data is like any image. Although it may portray various properties of objects in the real world, these objects don’t exist as separate objects; rather, they are represented using pixels of various values which are assigned a color.
-Today, it's about constructing a simple spatio-temporal analysis using raster data, R, and git.
+Today, it’s about constructing a simple spatio-temporal analysis using raster data, R, and git.
At the end of the lecture, you should be able to:
@@ -903,8 +1046,8 @@
In a previous lecture you got introduced to the overall system architecture of R programming (see figure above) and you saw how to read and write vector data from/to file into your R environment. These vector read/write operations were made possible thanks to the OGR library. The OGR library is interfaced with R thanks to the rgdal package/binding. By analogy, raster data can be read/written thanks to the GDAL library. The figure above provides an overview of the connections between these elements. GDAL stands for Geospatial Data Abstraction Library. You can check the project home page at http://www.gdal.org/ and you will be surprised to see that a lot of the software you have used in the past to read gridded geospatial data use GDAL (i.e.: ArcGIS, QGIS, GRASS, etc). In this lesson, we will use GDAL indirectly via the raster package, which uses rgdal extensively. However, it is possible to call GDAL functionalities directly through the command line from a terminal, which is equivalent to calling a system()
command directly from within R. In addition, if you are familiar with R and its string handling utilities, it may facilitate the building of the expressions that have to be passed to GDAL. (Note: This is also doable in Bash scripting, as learned in the previous lesson, and you can even combine the two.)
In a previous lecture you got introduced to the overall system architecture of R programming (see figure above) and you saw how to read and write vector data from/to file into your R environment. These vector read/write operations were made possible thanks to the OGR library. The OGR library is interfaced with R thanks to the rgdal package/binding. By analogy, raster data can be read/written thanks to the GDAL library. The figure above provides an overview of the connections between these elements. GDAL stands for Geospatial Data Abstraction Library. You can check the project home page at http://www.gdal.org/ and you will be surprised to see that a lot of the software you have used in the past to read gridded geospatial data use GDAL (i.e.: ArcGIS, QGIS, GRASS, etc). In this lesson, we will use GDAL indirectly via the raster package, which uses rgdal extensively. However, it is possible to call GDAL functionalities directly through the command line from a terminal, which is equivalent to calling a system()
command directly from within R. In addition, if you are familiar with R and its string handling utilities, it may facilitate the building of the expressions that have to be passed to GDAL. (Note: This is also doable in Bash scripting, as learned in the previous lesson, and you can even combine the two.)
The previous function should return the version number of the current version of GDAL installed on your machine. Starting with GDAL 2.0 vector processing becomes incorporated into GDAL. In case the function above returns an error, or if you cannot install rgdal at all, you should verify that all required software and libraries are properly installed. Please refer to the system setup page.
@@ -948,7 +1098,7 @@-Note: the raster package is now deprecated, as Robert Hijmans has developed a successor to it called terra which is simpler and much faster, as it's rewritten in C++. The downside is that it is not yet supported by the vast majority of other packages, that still assume raster package objects. In the future, this lesson will be rewritten with the use of terra instead of raster. +Note: the raster package is now deprecated, as Robert Hijmans has developed a successor to it called terra which is simpler and much faster, as it’s rewritten in C++. The downside is that it is not yet supported by the vast majority of other packages, that still assume raster package objects. In the future, this lesson will be rewritten with the use of terra instead of raster.
The raster package produces and uses R objects of three different classes. The RasterLayer, the RasterStack and the RasterBrick. A RasterLayer is the equivalent of a single-layer raster, as an R workspace variable. The data themselves, depending on the size of the grid can be loaded in memory or on disk. The same stands for RasterBrick and RasterStack objects, which are the equivalent of multi-layer RasterLayer objects. RasterStack and RasterBrick are very similar, the difference being in the virtual characteristic of the RasterStack. While a RasterBrick has to refer to one multi-layer file or is in itself a multi-layer object with data loaded in memory, a RasterStack may ''virtually'' connect several raster objects written to different files or in memory. Processing will be more efficient for a RasterBrick than for a RasterStack, but RasterStack has the advantage of facilitating pixel based calculations on separate raster layers.
-Let's take a look into the structure of these objects. +The raster package produces and uses R objects of three different classes. The RasterLayer, the RasterStack and the RasterBrick. A RasterLayer is the equivalent of a single-layer raster, as an R workspace variable. The data themselves, depending on the size of the grid can be loaded in memory or on disk. The same stands for RasterBrick and RasterStack objects, which are the equivalent of multi-layer RasterLayer objects. RasterStack and RasterBrick are very similar, the difference being in the virtual characteristic of the RasterStack. While a RasterBrick has to refer to one multi-layer file or is in itself a multi-layer object with data loaded in memory, a RasterStack may ‘’virtually’’ connect several raster objects written to different files or in memory. Processing will be more efficient for a RasterBrick than for a RasterStack, but RasterStack has the advantage of facilitating pixel based calculations on separate raster layers.
+Let’s take a look into the structure of these objects.From the metadata displayed above, we can see that the RasterLayer object contains all the properties that geo-data should have; that is to say a projection, an extent and a pixel resolution.
@@ -1022,11 +1172,11 @@The RasterBrick metadata displayed above are mostly similar to what we saw earlier for the RasterLayer object, with the exception that these are multi-layer objects.
@@ -1046,7 +1196,7 @@Gewata is the name of the data set added, it is a multi-layer GeoTIFF object, its file name is LE71700552001036SGS00_SR_Gewata_INT1U.tif, informing us that this is a subset from a scene acquired by the Landsat 7 sensor. Let's not worry about the region that the data covers for now, we will find a nice way to discover that later on in the tutorial. See the example below.
+Gewata is the name of the data set added, it is a multi-layer GeoTIFF object, its file name is LE71700552001036SGS00_SR_Gewata_INT1U.tif, informing us that this is a subset from a scene acquired by the Landsat 7 sensor. Let’s not worry about the region that the data covers for now, we will find a nice way to discover that later on in the tutorial. See the example below.
Now that we have downloaded and unpacked the GeoTIFF file, it should be present in our working directory. We can investigate the content of the working directory (or any directory) using the list.files()
function.
gewata <- brick('LE71700552001036SGS00_SR_Gewata_INT1U.tif')
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
+## prefer_proj): Discarded datum unknown in Proj4 definition
+
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
+## prefer_proj): Discarded datum unknown in Proj4 definition
+
Note that in addition to supporting most commonly used geodata formats, the raster package has its own format. Saving a file using the .grd
extension ('filename.grd') will automatically save the object to the raster package format. This format has some advantages when performing geo processing in R (one advantage for instance is that it conserves original filenames as layer names in multilayer objects), however, it also has disadvantages, since those files are not compressed and thus very large.
When looking at the documentation of most functions of the raster package, you will notice that the list of arguments is almost always ended by ...
. These 'three dots' are called an ellipsis; it means that extra arguments can be passed to the function. Often these arguments are those that can be passed to the writeRaster()
function; meaning that most geo-processing functions are able to write their output directly to file, on disk. This reduces the number of steps and is always a good consideration when working with big raster objects that tend to overload the memory if not written directly to file.
Note that in addition to supporting most commonly used geodata formats, the raster package has its own format. Saving a file using the .grd
extension (‘filename.grd’) will automatically save the object to the raster package format. This format has some advantages when performing geo processing in R (one advantage for instance is that it conserves original filenames as layer names in multilayer objects), however, it also has disadvantages, since those files are not compressed and thus very large.
When looking at the documentation of most functions of the raster package, you will notice that the list of arguments is almost always ended by ...
. These ‘three dots’ are called an ellipsis; it means that extra arguments can be passed to the function. Often these arguments are those that can be passed to the writeRaster()
function; meaning that most geo-processing functions are able to write their output directly to file, on disk. This reduces the number of steps and is always a good consideration when working with big raster objects that tend to overload the memory if not written directly to file.
When writing files to disk using writeRaster()
or the filename =
argument in most raster processing functions, you should set an appropriate data type. Use the datatype =
argument, it will save some precious disk space, and increase read and write speed.
See details in ?dataType
.
The object list
contains the file names of all the single layers we have to stack. Let's open the first one to visualize it.
The object list
contains the file names of all the single layers we have to stack. Let’s open the first one to visualize it.
plot(raster(list[1]))
We see an NDVI layer, with the clouds masked out. Now let's create the RasterStack, the function for doing that is called stack()
. Looking at the help page of the function , you can see that it can accept a list of file names as argument, which is what the object list
represents. So we can very simply create the layer stack by running the function.
We see an NDVI layer, with the clouds masked out. Now let’s create the RasterStack, the function for doing that is called stack()
. Looking at the help page of the function , you can see that it can accept a list of file names as argument, which is what the object list
represents. So we can very simply create the layer stack by running the function.
turaStack <- stack(list)
turaStack
Now that we have our 166 layers RasterStack in memory, let's write it to disk using the writeRaster()
function. Note that we decide here to save it as .grd file (the native format of the raster package); the reason for that is that this file format conserves original file names (in which information on dates is written) in the individual band names. The data range is comprised between -10000 and +10000, therefore such a file can be stored as signed 2 byte integer (INT2S).
Now that we have our 166 layers RasterStack in memory, let’s write it to disk using the writeRaster()
function. Note that we decide here to save it as .grd file (the native format of the raster package); the reason for that is that this file format conserves original file names (in which information on dates is written) in the individual band names. The data range is comprised between -10000 and +10000, therefore such a file can be stored as signed 2 byte integer (INT2S).
Performing simple raster operations with raster objects is fairly easy. For instance, if you want to subtract two RasterLayers of same extent, r1
and r2
; simply doing r1 - r2
will give the expected output, which is, every pixel value of r2
will be subtracted from the matching pixel value of r1
. These types of pixel-based operations almost always require a set of conditions to be met in order to be executed; the two RasterLayers need to be identical in term of extent, resolution, projection, etc.
Different spectral bands of a same satellite scene are often stored in multi-layer objects. This means that you will very likely import them in your R working environment as RasterBrick or RasterStack objects. As a consequence, to perform calculations between these bands, you will have to write an expression refering to individual layers of the object. Referring to individual layers in a RasterBrick or RasterStack object is done by using double square brackets [[]]
. Let's look for instance at how the famous NDVI index would have to be calculated from the gewata RasterBrick object read earlier, and that contains the spectral bands of the Landsat 7 sensor. And in case you have forgotten, the NDVI formula is as follows.
Different spectral bands of a same satellite scene are often stored in multi-layer objects. This means that you will very likely import them in your R working environment as RasterBrick or RasterStack objects. As a consequence, to perform calculations between these bands, you will have to write an expression refering to individual layers of the object. Referring to individual layers in a RasterBrick or RasterStack object is done by using double square brackets $$
NDVI=\frac{NIR-Red}{NIR+Red}
-$$
$$
+[[]]
. Let’s look for instance at how the famous NDVI index would have to be calculated from the gewata RasterBrick object read earlier, and that contains the spectral bands of the Landsat 7 sensor. And in case you have forgotten, the NDVI formula is as follows.
with NIR and Red being band 4 and 3 of Landsat 7 respectively.
The resulting NDVI can be viewed in the above figure. As expected the NDVI ranges from about 0.2, which corresponds to nearly bare soils, to 0.9 which means that there is some dense vegetation in the area.
-Although this is a quick way to perform the calculation, directly adding, subtracting, multiplying, etc, the layers of big raster objects is not recommended. When working with big objects, it is advisable to use the calc()
function to perform these types of calculaions. The reason is that R needs to load all the data first into its internal memory before performing the calculation and then runs everything in one block. It is really easy to run out of memory when doing that. A big advantage of the calc()
function is that it has a built-in block processing option for any vectorized function, allowing such calculations to be fully "RAM friendly". The example below illustrates how to calculate NDVI from the same date set using the calc()
function.
Although this is a quick way to perform the calculation, directly adding, subtracting, multiplying, etc, the layers of big raster objects is not recommended. When working with big objects, it is advisable to use the calc()
function to perform these types of calculaions. The reason is that R needs to load all the data first into its internal memory before performing the calculation and then runs everything in one block. It is really easy to run out of memory when doing that. A big advantage of the calc()
function is that it has a built-in block processing option for any vectorized function, allowing such calculations to be fully “RAM friendly”. The example below illustrates how to calculate NDVI from the same date set using the calc()
function.
In the simple case of calculating NDVI, we were easily able to produce the same result with calc()
and overlay()
, however, it is often the case that one function is preferable to the other. As a general rule, a calculation that needs to refer to multiple individual layers separately will be easier to set up in overlay()
than in calc()
.
By the way, we still don't know where this area is. In order to investigate that, we are going to try projecting it in Google Earth. As you know Google Earth is all in Lat/Long, so we have to get our data re-projected to Lat/Long first. The projectRaster()
function allows re-projection of raster objects to any projection one can think of. As the function uses the PROJ.4 library (the reference library, external to R, that handles cartographic projections and performs projections transformations; the rgdal package is the interface between that library and R) to perform that operation, the crs=
argument should receive a proj4 expression. proj expressions are strings that provide the projection parameters of cartographic projections. A central place to search for projections is the spatial reference website (http://spatialreference.org/), from this database you will be able to query almost any reference and retrieve it in any format, including its proj expression. Note that proj expressions are handy because they are short and readable, but the Well-Known Text (WKT) 2 expressions are preferred for scientific correctness and lack of ambiguity; hence why if you run the gdalinfo
command on a raster file, you will see the project as a WKT2 text block.
By the way, we still don’t know where this area is. In order to investigate that, we are going to try projecting it in Google Earth. As you know Google Earth is all in Lat/Long, so we have to get our data re-projected to Lat/Long first. The projectRaster()
function allows re-projection of raster objects to any projection one can think of. As the function uses the PROJ.4 library (the reference library, external to R, that handles cartographic projections and performs projections transformations; the rgdal package is the interface between that library and R) to perform that operation, the crs=
argument should receive a proj4 expression. proj expressions are strings that provide the projection parameters of cartographic projections. A central place to search for projections is the spatial reference website (http://spatialreference.org/), from this database you will be able to query almost any reference and retrieve it in any format, including its proj expression. Note that proj expressions are handy because they are short and readable, but the Well-Known Text (WKT) 2 expressions are preferred for scientific correctness and lack of ambiguity; hence why if you run the gdalinfo
command on a raster file, you will see the project as a WKT2 text block.
Note that if re-projecting and mosaicking is really a large part of your project, you may want to consider using the gdalwarp
command line utility (gdalwarp) directly. The gdalUtils
R package provides utilities to run GDAL commands from R, including gdalwarp
, for reprojection, resampling and mosaicking.
Now that we have our NDVI layer in Lat/Long, let's write it to a KML file, which is one of the two Google Earth formats.
+Now that we have our NDVI layer in Lat/Long, let’s write it to a KML file, which is one of the two Google Earth formats.
-Tip: if you do not have Google Earth Pro installed on your computer, you can install Google Chrome and use https://earth.google.com/web/ +Tip: if you do not have Google Earth Pro installed on your computer, you can install Google Chrome and use https://earth.google.com/web/
Now let's find that file that we have just written and double click it, and watch how Google Earth brings us all the way to ... Ethiopia. More information will come later in the course about that specific area.
-We are done with this data set for this lesson. So let's explore another data set, from the Landsat sensors. This dataset will allow us to find other interesting raster operations to perform.
+Now let’s find that file that we have just written and double click it, and watch how Google Earth brings us all the way to … Ethiopia. More information will come later in the course about that specific area.
+We are done with this data set for this lesson. So let’s explore another data set, from the Landsat sensors. This dataset will allow us to find other interesting raster operations to perform.
Since 2014, the USGS has started releasing Landsat data processed to surface reflectance. This means that they are taking care of important steps such as atmospheric correction and conversion from sensor radiance to reflectance factors. Additionally, they provide a cloud mask with this product. The cloud mask is an extra raster layer, at the same resolution as the surface reflectance bands, that contains information about the presence or absence of cloud as well as shadowing effects from the clouds. The cloud mask of Landsat surface reflectance product is named cfmask, after the name of the algorithm used to detect the clouds. For more information about cloud detection, see the algorithm page, and the publication by Zhu & Woodcock. In the following section we will use that cfmask layer to mask out remaining clouds in a Landsat scene.
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
+## = prefer_proj): Discarded datum Unknown based on WGS 72 ellipsoid in Proj4
+## definition
+
plot(tahiti, 7)
According to the algorithm description, water is coded as 1, cloud as 4 and cloud shadow as 2.
Does the cloud mask fit with the visual interpretation of the RGB image we plotted before?
-We can also plot the two on top of each other, but before that we need to assign no values (NA) to the 'clear land pixels' so that they appear transparent on the overlay plot.
+We can also plot the two on top of each other, but before that we need to assign no values (NA) to the ‘clear land pixels’ so that they appear transparent on the overlay plot.
We will first do the masking using simple vector arithmetic, as if tahiti6
and fmask
were simple vectors. We want to keep any value with a 'clean land pixel' flag in the cloud mask; or rather, since we are assigning NAs, we want to discard any value of the stack which has a corresponding cloud mask pixel different from 0. This can be done in one line of code.
We will first do the masking using simple vector arithmetic, as if tahiti6
and fmask
were simple vectors. We want to keep any value with a ‘clean land pixel’ flag in the cloud mask; or rather, since we are assigning NAs, we want to discard any value of the stack which has a corresponding cloud mask pixel different from 0. This can be done in one line of code.