diff --git a/index.Rmd b/index.Rmd index 98eacb9..708fe1c 100644 --- a/index.Rmd +++ b/index.Rmd @@ -11,7 +11,12 @@ output: highlight.chooser: TRUE --- -# [WUR Geoscripting](https://geoscripting-wur.github.io/) WUR logo + + +# [WUR Geoscripting](https://geoscripting-wur.github.io/) WUR logo # Week 1, Lesson 4: Intro To Raster @@ -396,7 +401,7 @@ cloud <- tahiti[[7]] cloud[cloud == 0] <- NA ## Plot the stack and the cloud mask on top of each other -plotRGB(tahiti, 3,4,5) +plotRGB(tahiti, 3,4,5, stretch="lin") plot(cloud, add = TRUE, legend = FALSE) ``` @@ -437,7 +442,7 @@ tahiti6_2 <- dropLayer(tahiti, 7) tahitiCloudFree <- overlay(x = tahiti6_2, y = fmask, fun = cloud2NA) ## Visualize the output -plotRGB(tahitiCloudFree, 3,4,5) +plotRGB(tahitiCloudFree, 3,4,5, stretch="lin") ``` diff --git a/index.html b/index.html index bbb3e6e..c6b07b8 100644 --- a/index.html +++ b/index.html @@ -6,27 +6,166 @@ Week 1, Lesson 4: Intro To Raster - + - + - + - + - + - + @@ -34,16 +173,16 @@ // Function to generate the dynamic table of contents jQuery.fn.generate_TOC = function () { var base = $(this[0]); - + var selectors = ['h1', 'h2', 'h3', 'h4']; - + var last_ptr = [{}, {}, {}, {}]; - + var anchors = {}; - + generate_anchor = function (text) { var test = text.replace(/\W/g, '_'); - + while(test in anchors){ //if no suffix, add one if(test.match(/_\d+$/) === null){ @@ -57,13 +196,13 @@ anchors[test]=1; return(test); } - + $(selectors.join(',')).filter(function(index) { return $(this).parent().attr("id") != 'header'; }).each(function () { - + var heading = $(this); var idx = selectors.indexOf(heading.prop('tagName').toLowerCase()); var itr = 0; - + while (itr <= idx) { if (jQuery.isEmptyObject(last_ptr[itr])) { last_ptr[itr] = $('
@@ -960,13 +1110,13 @@

Overview of the raster package<

-Note: the raster package is now deprecated, as Robert Hijmans has developed a successor to it called terra which is simpler and much faster, as it's rewritten in C++. The downside is that it is not yet supported by the vast majority of other packages, that still assume raster package objects. In the future, this lesson will be rewritten with the use of terra instead of raster. +Note: the raster package is now deprecated, as Robert Hijmans has developed a successor to it called terra which is simpler and much faster, as it’s rewritten in C++. The downside is that it is not yet supported by the vast majority of other packages, that still assume raster package objects. In the future, this lesson will be rewritten with the use of terra instead of raster.

Explore the raster objects

-

The raster package produces and uses R objects of three different classes. The RasterLayer, the RasterStack and the RasterBrick. A RasterLayer is the equivalent of a single-layer raster, as an R workspace variable. The data themselves, depending on the size of the grid can be loaded in memory or on disk. The same stands for RasterBrick and RasterStack objects, which are the equivalent of multi-layer RasterLayer objects. RasterStack and RasterBrick are very similar, the difference being in the virtual characteristic of the RasterStack. While a RasterBrick has to refer to one multi-layer file or is in itself a multi-layer object with data loaded in memory, a RasterStack may ''virtually'' connect several raster objects written to different files or in memory. Processing will be more efficient for a RasterBrick than for a RasterStack, but RasterStack has the advantage of facilitating pixel based calculations on separate raster layers.

-Let's take a look into the structure of these objects. +

The raster package produces and uses R objects of three different classes. The RasterLayer, the RasterStack and the RasterBrick. A RasterLayer is the equivalent of a single-layer raster, as an R workspace variable. The data themselves, depending on the size of the grid can be loaded in memory or on disk. The same stands for RasterBrick and RasterStack objects, which are the equivalent of multi-layer RasterLayer objects. RasterStack and RasterBrick are very similar, the difference being in the virtual characteristic of the RasterStack. While a RasterBrick has to refer to one multi-layer file or is in itself a multi-layer object with data loaded in memory, a RasterStack may ‘’virtually’’ connect several raster objects written to different files or in memory. Processing will be more efficient for a RasterBrick than for a RasterStack, but RasterStack has the advantage of facilitating pixel based calculations on separate raster layers.

+Let’s take a look into the structure of these objects.

From the metadata displayed above, we can see that the RasterLayer object contains all the properties that geo-data should have; that is to say a projection, an extent and a pixel resolution.

@@ -1022,11 +1172,11 @@

Explore the raster objects

## dimensions : 20, 40, 800, 3 (nrow, ncol, ncell, nlayers) ## resolution : 9, 9 (x, y) ## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax) -## crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 +## crs : +proj=longlat +datum=WGS84 +no_defs ## source : memory ## names : layer.1, layer.2, layer.3 -## min values : -4.118481, -8.236963, -4.118481 -## max values : 3.928942, 7.857885, 3.928942 +## min values : -2.734823, -5.469645, -2.734823 +## max values : 3.600096, 7.200192, 3.600096

The RasterBrick metadata displayed above are mostly similar to what we saw earlier for the RasterLayer object, with the exception that these are multi-layer objects.

@@ -1046,7 +1196,7 @@

Reading and writing from/to file

## Unpack the archive unzip('gewata.zip')
-

Gewata is the name of the data set added, it is a multi-layer GeoTIFF object, its file name is LE71700552001036SGS00_SR_Gewata_INT1U.tif, informing us that this is a subset from a scene acquired by the Landsat 7 sensor. Let's not worry about the region that the data covers for now, we will find a nice way to discover that later on in the tutorial. See the example below.

+

Gewata is the name of the data set added, it is a multi-layer GeoTIFF object, its file name is LE71700552001036SGS00_SR_Gewata_INT1U.tif, informing us that this is a subset from a scene acquired by the Landsat 7 sensor. Let’s not worry about the region that the data covers for now, we will find a nice way to discover that later on in the tutorial. See the example below.

Now that we have downloaded and unpacked the GeoTIFF file, it should be present in our working directory. We can investigate the content of the working directory (or any directory) using the list.files() function.

- + +
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
+## prefer_proj): Discarded datum unknown in Proj4 definition
+
-Let's take a look at the structure of this object. +Let’s take a look at the structure of this object.
+ +
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
+## prefer_proj): Discarded datum unknown in Proj4 definition
+
@@ -1107,14 +1268,14 @@

Reading and writing from/to file

## resolution : 30, 30 (x, y) ## extent : 829455, 849045, 825405, 843195 (xmin, xmax, ymin, ymax) ## crs : +proj=utm +zone=36 +ellps=WGS84 +units=m +no_defs -## source : /home/WUR/masil001/IntroToRaster/data/LE71700552001036SGS00_SR_Gewata_INT1U.tif +## source : LE71700552001036SGS00_SR_Gewata_INT1U.tif ## names : LE71700552001036SGS00_SR_Gewata_INT1U ## values : 4, 39 (min, max)
-

Note that in addition to supporting most commonly used geodata formats, the raster package has its own format. Saving a file using the .grd extension ('filename.grd') will automatically save the object to the raster package format. This format has some advantages when performing geo processing in R (one advantage for instance is that it conserves original filenames as layer names in multilayer objects), however, it also has disadvantages, since those files are not compressed and thus very large.

-

Geo processing, in memory vs. on disk

-

When looking at the documentation of most functions of the raster package, you will notice that the list of arguments is almost always ended by .... These 'three dots' are called an ellipsis; it means that extra arguments can be passed to the function. Often these arguments are those that can be passed to the writeRaster() function; meaning that most geo-processing functions are able to write their output directly to file, on disk. This reduces the number of steps and is always a good consideration when working with big raster objects that tend to overload the memory if not written directly to file.

+

Note that in addition to supporting most commonly used geodata formats, the raster package has its own format. Saving a file using the .grd extension (‘filename.grd’) will automatically save the object to the raster package format. This format has some advantages when performing geo processing in R (one advantage for instance is that it conserves original filenames as layer names in multilayer objects), however, it also has disadvantages, since those files are not compressed and thus very large.

+

Geo processing, in memory vs. on disk

+

When looking at the documentation of most functions of the raster package, you will notice that the list of arguments is almost always ended by .... These ‘three dots’ are called an ellipsis; it means that extra arguments can be passed to the function. Often these arguments are those that can be passed to the writeRaster() function; meaning that most geo-processing functions are able to write their output directly to file, on disk. This reduces the number of steps and is always a good consideration when working with big raster objects that tend to overload the memory if not written directly to file.

Data type is (still) important

When writing files to disk using writeRaster() or the filename = argument in most raster processing functions, you should set an appropriate data type. Use the datatype = argument, it will save some precious disk space, and increase read and write speed.

See details in ?dataType.

@@ -1156,14 +1317,14 @@

Creating layer stacks

## Retrieve the content of the tura sub-directory list <- list.files(path='tura/', full.names=TRUE) -

The object list contains the file names of all the single layers we have to stack. Let's open the first one to visualize it.

+

The object list contains the file names of all the single layers we have to stack. Let’s open the first one to visualize it.

plot(raster(list[1]))
-

We see an NDVI layer, with the clouds masked out. Now let's create the RasterStack, the function for doing that is called stack(). Looking at the help page of the function , you can see that it can accept a list of file names as argument, which is what the object list represents. So we can very simply create the layer stack by running the function.

+

We see an NDVI layer, with the clouds masked out. Now let’s create the RasterStack, the function for doing that is called stack(). Looking at the help page of the function , you can see that it can accept a list of file names as argument, which is what the object list represents. So we can very simply create the layer stack by running the function.

-

Now that we have our 166 layers RasterStack in memory, let's write it to disk using the writeRaster() function. Note that we decide here to save it as .grd file (the native format of the raster package); the reason for that is that this file format conserves original file names (in which information on dates is written) in the individual band names. The data range is comprised between -10000 and +10000, therefore such a file can be stored as signed 2 byte integer (INT2S).

+

Now that we have our 166 layers RasterStack in memory, let’s write it to disk using the writeRaster() function. Note that we decide here to save it as .grd file (the native format of the raster package); the reason for that is that this file format conserves original file names (in which information on dates is written) in the individual band names. The data range is comprised between -10000 and +10000, therefore such a file can be stored as signed 2 byte integer (INT2S).

The resulting NDVI can be viewed in the above figure. As expected the NDVI ranges from about 0.2, which corresponds to nearly bare soils, to 0.9 which means that there is some dense vegetation in the area.

-

Although this is a quick way to perform the calculation, directly adding, subtracting, multiplying, etc, the layers of big raster objects is not recommended. When working with big objects, it is advisable to use the calc() function to perform these types of calculaions. The reason is that R needs to load all the data first into its internal memory before performing the calculation and then runs everything in one block. It is really easy to run out of memory when doing that. A big advantage of the calc() function is that it has a built-in block processing option for any vectorized function, allowing such calculations to be fully "RAM friendly". The example below illustrates how to calculate NDVI from the same date set using the calc() function.

+

Although this is a quick way to perform the calculation, directly adding, subtracting, multiplying, etc, the layers of big raster objects is not recommended. When working with big objects, it is advisable to use the calc() function to perform these types of calculaions. The reason is that R needs to load all the data first into its internal memory before performing the calculation and then runs everything in one block. It is really easy to run out of memory when doing that. A big advantage of the calc() function is that it has a built-in block processing option for any vectorized function, allowing such calculations to be fully “RAM friendly”. The example below illustrates how to calculate NDVI from the same date set using the calc() function.

In the simple case of calculating NDVI, we were easily able to produce the same result with calc() and overlay(), however, it is often the case that one function is preferable to the other. As a general rule, a calculation that needs to refer to multiple individual layers separately will be easier to set up in overlay() than in calc().

Re-projections

-

By the way, we still don't know where this area is. In order to investigate that, we are going to try projecting it in Google Earth. As you know Google Earth is all in Lat/Long, so we have to get our data re-projected to Lat/Long first. The projectRaster() function allows re-projection of raster objects to any projection one can think of. As the function uses the PROJ.4 library (the reference library, external to R, that handles cartographic projections and performs projections transformations; the rgdal package is the interface between that library and R) to perform that operation, the crs= argument should receive a proj4 expression. proj expressions are strings that provide the projection parameters of cartographic projections. A central place to search for projections is the spatial reference website (http://spatialreference.org/), from this database you will be able to query almost any reference and retrieve it in any format, including its proj expression. Note that proj expressions are handy because they are short and readable, but the Well-Known Text (WKT) 2 expressions are preferred for scientific correctness and lack of ambiguity; hence why if you run the gdalinfo command on a raster file, you will see the project as a WKT2 text block.

+

By the way, we still don’t know where this area is. In order to investigate that, we are going to try projecting it in Google Earth. As you know Google Earth is all in Lat/Long, so we have to get our data re-projected to Lat/Long first. The projectRaster() function allows re-projection of raster objects to any projection one can think of. As the function uses the PROJ.4 library (the reference library, external to R, that handles cartographic projections and performs projections transformations; the rgdal package is the interface between that library and R) to perform that operation, the crs= argument should receive a proj4 expression. proj expressions are strings that provide the projection parameters of cartographic projections. A central place to search for projections is the spatial reference website (http://spatialreference.org/), from this database you will be able to query almost any reference and retrieve it in any format, including its proj expression. Note that proj expressions are handy because they are short and readable, but the Well-Known Text (WKT) 2 expressions are preferred for scientific correctness and lack of ambiguity; hence why if you run the gdalinfo command on a raster file, you will see the project as a WKT2 text block.

Note that if re-projecting and mosaicking is really a large part of your project, you may want to consider using the gdalwarp command line utility (gdalwarp) directly. The gdalUtils R package provides utilities to run GDAL commands from R, including gdalwarp, for reprojection, resampling and mosaicking.

-

Now that we have our NDVI layer in Lat/Long, let's write it to a KML file, which is one of the two Google Earth formats.

+

Now that we have our NDVI layer in Lat/Long, let’s write it to a KML file, which is one of the two Google Earth formats.

+ +
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj
+## = prefer_proj): Discarded datum Unknown based on WGS 72 ellipsoid in Proj4
+## definition
+
- +
@@ -1323,13 +1491,13 @@

About the area

plot(tahiti, 7)
- +

According to the algorithm description, water is coded as 1, cloud as 4 and cloud shadow as 2.

Does the cloud mask fit with the visual interpretation of the RGB image we plotted before?

-

We can also plot the two on top of each other, but before that we need to assign no values (NA) to the 'clear land pixels' so that they appear transparent on the overlay plot.

+

We can also plot the two on top of each other, but before that we need to assign no values (NA) to the ‘clear land pixels’ so that they appear transparent on the overlay plot.

@@ -1359,7 +1527,7 @@

About the area

## Remove fmask layer from the Landsat stack tahiti6 <- dropLayer(tahiti, 7) -

We will first do the masking using simple vector arithmetic, as if tahiti6 and fmask were simple vectors. We want to keep any value with a 'clean land pixel' flag in the cloud mask; or rather, since we are assigning NAs, we want to discard any value of the stack which has a corresponding cloud mask pixel different from 0. This can be done in one line of code.

+

We will first do the masking using simple vector arithmetic, as if tahiti6 and fmask were simple vectors. We want to keep any value with a ‘clean land pixel’ flag in the cloud mask; or rather, since we are assigning NAs, we want to discard any value of the stack which has a corresponding cloud mask pixel different from 0. This can be done in one line of code.