P8_rasterPackageAdvanced_II-v1.html

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />


<meta name="author" content="João Gonçalves" />


<title>P8 Advanced techniques with raster data (part-2) - Supervised Classification</title>

<script src="P8_rasterPackageAdvanced_II-v1_files/jquery-1.11.3/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="P8_rasterPackageAdvanced_II-v1_files/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
<script src="P8_rasterPackageAdvanced_II-v1_files/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="P8_rasterPackageAdvanced_II-v1_files/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="P8_rasterPackageAdvanced_II-v1_files/bootstrap-3.3.5/shim/respond.min.js"></script>
<script src="P8_rasterPackageAdvanced_II-v1_files/navigation-1.1/tabsets.js"></script>
<link href="P8_rasterPackageAdvanced_II-v1_files/highlightjs-9.12.0/default.css" rel="stylesheet" />
<script src="P8_rasterPackageAdvanced_II-v1_files/highlightjs-9.12.0/highlight.js"></script>

<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
  pre:not([class]) {
    background-color: white;
  }
</style>
<script type="text/javascript">
if (window.hljs) {
  hljs.configure({languages: []});
  hljs.initHighlightingOnLoad();
  if (document.readyState && document.readyState === "complete") {
    window.setTimeout(function() { hljs.initHighlighting(); }, 0);
  }
}
</script>


<style type="text/css">
h1 {
  font-size: 34px;
}
h1.title {
  font-size: 38px;
}
h2 {
  font-size: 30px;
}
h3 {
  font-size: 24px;
}
h4 {
  font-size: 18px;
}
h5 {
  font-size: 16px;
}
h6 {
  font-size: 12px;
}
.table th:not([align]) {
  text-align: left;
}
</style>


</head>

<body>

<style type = "text/css">
.main-container {
  max-width: 940px;
  margin-left: auto;
  margin-right: auto;
}
code {
  color: inherit;
  background-color: rgba(0, 0, 0, 0.04);
}
img {
  max-width:100%;
  height: auto;
}
.tabbed-pane {
  padding-top: 12px;
}
button.code-folding-btn:focus {
  outline: none;
}
</style>


<div class="container-fluid main-container">

<!-- tabsets -->
<script>
$(document).ready(function () {
  window.buildTabsets("TOC");
});
</script>

<!-- code folding -->


<div class="fluid-row" id="header">


<h1 class="title toc-ignore">P8 Advanced techniques with raster data (part-2) - Supervised Classification</h1>
<h4 class="author"><em>João Gonçalves</em></h4>
<h4 class="date"><em>28 de Novembro de 2017</em></h4>

</div>


<div id="background" class="section level3">
<h3>Background</h3>
<hr />
<p>In <strong>supervised classification</strong>, contrarily to the unsupervised version, <em>a priori</em> defined reference classes are used as additional information. This initial process, determines which classes are the result of the classification. Usually a statistical or machine-learning algorithm is used to obtain a classification function which maps every instance (pixel or object, depending on the approach used) to its corresponding class. The following workflow is commonly used for deploying a supervised classifier:</p>
<ul>
<li><p>Definition of the thematic classes of land cover/use (e.g., coniferous forest, deciduous forest, water, agriculture, urban);</p></li>
<li><p>Classification of suitable training areas (reference areas/pixels for each class);</p></li>
<li><p>Calibration of a classification algorithm (for the training set(s));</p></li>
<li><p>Classification of the entire image;</p></li>
<li><p>Classifier performance evaluation, verification and inspection of the results (for the testing set(s)).</p></li>
</ul>
<p>Usually, in supervised classification, spectral data from each of the sensor bands is used to obtain a statistical or rule-based <em>spectral signature</em> for each class. Besides “direct” spectral data, other kinds of information or features can be used for classifier training. These include, band ratios, spectral indices, texture features, temporal variation features (e.g., green-up and senescence changes) as well as ancillary data (e.g., elevation, slope, built-up masks, roads).</p>
<p>Combining the training data and the spectral (or other) features in a classifier algorithm allows to classify the entire image, outside the training space. Usually a form of train/test split set strategy (holdout cross-validation, k-fold CV, etc) is used. The training set is used for classifier calibration while the testing set is used for evaluating the classification performance. This process is usually repeated a few times and then an average value of validation indices is calculated.</p>
<p>Because R currently provides a very-large set of classification algorithms (a good package to access them is <code>caret</code>), it is particularly well-equipped to handle this kind of problem. For developing the examples, the Random Forest (RF) algorithm will be used. RF is implemented in the (conveniently named 😉) <code>randomForest</code> package. Although packages such as <code>caret</code> provide many useful functions to handle classification (training, tuning and evaluation processes), I will not use it them here. My objective in this post, is to explore and show the basic and <em>“under the wood”</em> workflow in <em>pixel-based</em> classification of raster data.</p>
<p>In a nutshell, RF is an ensemble learning method for classification, regression and other tasks, that operate by constructing multiple decision trees during the training stage (‘bagging’) and outputting the class that is the mode of the classes (classification) or the average prediction (regression) of the individual trees. This way, RF correct for decision trees’ habit of over-fitting to their training set. See more <a href="https://en.wikipedia.org/wiki/Random_forest">here</a> and <a href="https://www.kdnuggets.com/2017/10/random-forests-explained.html">here</a>.</p>
<p>Sample data from the optical <strong>Sentinel-2a</strong> (S2) satellite platform will be used in the examples below (see <a href="https://earth.esa.int/web/sentinel/user-guides/sentinel-2-msi">here</a> for more details). This data was made available in the 2017 IEEE GRSS Data Fusion Contest and provided by the <em>GRSS Data and Algorithm Standard Evaluation</em> (DASE) <a href="http://dase.ticinumaerospace._com/index.php">website</a> (you have to register to access the sample datasets currently available). More specifically, we will use one Sentinel-2 scene for Berlin containing <a href="https://earth.esa.int/web/sentinel/user-guides/sentinel-2-msi/resolutions/spatial">10 spectral bands</a>, originally at 10m and 20m of spatial resolution but, re-sampled to 100m in DASE.<br />
Along with S2 spectral data, DASE also provides training samples for calibrating classifiers. The legend encompasses a total of 12 land cover/use classes that is presented in the table below (NOTE: only 12 out of the 17 classes actually appear in the Berlin area).</p>
<pre class="r"><code>legBerlin &lt;- read.csv(url(&quot;https://raw.githubusercontent.com/joaofgoncalves/R_exercises_raster_tutorial/master/data/legend_berlin.csv&quot;))

knitr::kable(legBerlin)</code></pre>
<table>
<thead>
<tr class="header">
<th align="left">Type</th>
<th align="right">Code</th>
<th align="left">Land.cover.class.description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">Artificial</td>
<td align="right">1</td>
<td align="left">Compact high-rise</td>
</tr>
<tr class="even">
<td align="left">Artificial</td>
<td align="right">2</td>
<td align="left">Compact midrise</td>
</tr>
<tr class="odd">
<td align="left">Artificial</td>
<td align="right">3</td>
<td align="left">Compact low-rise</td>
</tr>
<tr class="even">
<td align="left">Artificial</td>
<td align="right">4</td>
<td align="left">Open high-rise</td>
</tr>
<tr class="odd">
<td align="left">Artificial</td>
<td align="right">5</td>
<td align="left">Open midrise</td>
</tr>
<tr class="even">
<td align="left">Artificial</td>
<td align="right">6</td>
<td align="left">Open low-rise</td>
</tr>
<tr class="odd">
<td align="left">Artificial</td>
<td align="right">7</td>
<td align="left">Lightweight low-rise</td>
</tr>
<tr class="even">
<td align="left">Artificial</td>
<td align="right">8</td>
<td align="left">Large low-rise</td>
</tr>
<tr class="odd">
<td align="left">Artificial</td>
<td align="right">9</td>
<td align="left">Sparsely built</td>
</tr>
<tr class="even">
<td align="left">Artificial</td>
<td align="right">10</td>
<td align="left">Heavy industry</td>
</tr>
<tr class="odd">
<td align="left">Vegetated</td>
<td align="right">11</td>
<td align="left">Dense trees</td>
</tr>
<tr class="even">
<td align="left">Vegetated</td>
<td align="right">12</td>
<td align="left">Scattered trees</td>
</tr>
<tr class="odd">
<td align="left">Vegetated</td>
<td align="right">13</td>
<td align="left">Bush and scrub</td>
</tr>
<tr class="even">
<td align="left">Vegetated</td>
<td align="right">14</td>
<td align="left">Low plants</td>
</tr>
<tr class="odd">
<td align="left">Vegetated</td>
<td align="right">15</td>
<td align="left">Bare rock or paved</td>
</tr>
<tr class="even">
<td align="left">Vegetated</td>
<td align="right">16</td>
<td align="left">Bare soil or sand</td>
</tr>
<tr class="odd">
<td align="left">Vegetated</td>
<td align="right">17</td>
<td align="left">Water</td>
</tr>
</tbody>
</table>
<div id="data-loading-and-visualization" class="section level4">
<h4>Data loading and visualization</h4>
<hr />
<p>Now that we have defined some useful concepts, the workflow and, the data we can start coding! 👍 👍 The first step is to download and uncompress the spectral data for Sentinel-2. These will later be used as training input features for the classification algorithm is able to identify the spectral signatures for each class.</p>
<pre class="r"><code>## Create a folder named data-raw inside the working directory to place downloaded data
if(!dir.exists(&quot;./data-raw&quot;)) dir.create(&quot;./data-raw&quot;)

## If you run into download problems try changing: method = &quot;wget&quot;
download.file(&quot;https://raw.githubusercontent.com/joaofgoncalves/R_exercises_raster_tutorial/master/data/berlin.zip&quot;, &quot;./data-raw/berlin.zip&quot;, method = &quot;auto&quot;)

## Uncompress the zip file
unzip(&quot;./data-raw/berlin.zip&quot;, exdir = &quot;./data-raw&quot;)</code></pre>
<p>Load the required packages for the post:</p>
<pre class="r"><code>library(raster)
library(randomForest)</code></pre>
<p>Now that we have the data files available, let’s create a <code>RasterStack</code> object from it. We will also change layer names for more convenient ones.</p>
<pre class="r"><code>fl &lt;- list.files(&quot;./data-raw/berlin/S2&quot;, pattern = &quot;.tif$&quot;, full.names = TRUE)

rst &lt;- stack(fl)
names(rst) &lt;- c(paste(&quot;b&quot;,2:8,sep=&quot;&quot;),&quot;b8a&quot;,&quot;b11&quot;,&quot;b12&quot;)</code></pre>
<p>Now, let’s use the <code>plotRGB</code> function to visually explore the spectral data from Sentinel-2. RGB composites made from different band combinations allow us to highlight different aspects of land cover and see different layers of the Earth surface. Note: band numbers in the Sentinel-2 satellite differ from its position (integer index) in the <code>RasterStack</code> object.</p>
<p>Let’s start by making a Natural Color RGB composite from S2 bands: 4,3,2.</p>
<pre class="r"><code>plotRGB(rst, r=3, g=2, b=1, scale=1E5, stretch=&quot;lin&quot;)</code></pre>
<div class="figure">
<img src="https://raw.githubusercontent.com/joaofgoncalves/R_exercises_raster_tutorial/master/img/P8_plot_rgb_S2-1.png" />

</div>
<p>Next, let’s see an Healthy vegetation composite from S2 bands: 8,11,2.</p>
<pre class="r"><code>plotRGB(rst, r=7, g=9, b=1, scale=1E5, stretch=&quot;lin&quot;)</code></pre>
<div class="figure">
<img src="https://raw.githubusercontent.com/joaofgoncalves/R_exercises_raster_tutorial/master/img/P8_plot_b8_b11_b2_S2-1.png" />

</div>
<p>Finally a false color urban using S2 bands: 12,11,4.</p>
<pre class="r"><code>plotRGB(rst, r=10, g=9, b=3, scale=1E5, stretch=&quot;lin&quot;)</code></pre>
<div class="figure">
<img src="https://raw.githubusercontent.com/joaofgoncalves/R_exercises_raster_tutorial/master/img/P8_plot_b12_b11_b4_S2-1.png" />

</div>
<p>Now, let’s load the training samples used in calibration. These data serves as a reference or examples from which the classifier algorithm will “learn” the spectral signatures of each class.</p>
<pre class="r"><code>rstTrain &lt;- raster(&quot;./data-raw/berlin/train/berlin_lcz_GT.tif&quot;)

# Remove zeros from the train set (background NA)
rstTrain[rstTrain==0] &lt;- NA

# Convert the data to factor/discrete representation
rstTrain &lt;- ratify(rstTrain)

# Change the layer name
names(rstTrain) &lt;- &quot;trainClass&quot;

# Visualize the data
plot(rstTrain, main=&quot;Train areas by class for Berlin&quot;)</code></pre>
<div class="figure">
<img src="https://raw.githubusercontent.com/joaofgoncalves/R_exercises_raster_tutorial/master/img/P8_load_train_data-1.png" />

</div>
<p>Let’s see how many training pixels do we have for each of the 12 classes (<span class="math inline">\(N_{total} = 24537\)</span>):</p>
<pre class="r"><code>tab &lt;- table(values(rstTrain))
print(tab)</code></pre>
<pre><code>## 
##    2    4    5    6    8    9   11   12   13   14   16   17 
## 1534  577 2448 4010 1654  761 4960 1028 1050 4424  359 1732</code></pre>
<p>Although perhaps not the best approach in some cases, we will convert our raster dataset into a <code>data.frame</code> object so we can use the RF classifier. Take into consideration, that in some cases, depending on the size of your <code>RasterStack</code> and the available memory, using this approach will not be possible. One simple way to overcome this, would be to convert the training raster into a <code>SpatialPoints</code> object and then run the function <code>extract</code>. This way, only specific pixels from the stack are retrieved. In any case, let’s proceed to get pixel values into our calibration data frame:</p>
<pre class="r"><code>rstDF &lt;- na.omit(values(stack(rstTrain, rst)))
rstDF[,&quot;trainClass&quot;] &lt;- as.factor(as.character(rstDF[,&quot;trainClass&quot;]))</code></pre>
<p>As you probably noticed from the code above, <code>NA</code>’s were removed and the reference class column was converted to a categorical/factor variable. In practice, by “removing <code>NA</code>’s”, it means that we are restricting the data only to the set of training pixels in <code>rstTrain</code> (reducing from 428238 to 24537 rows 👍).</p>
<p>Next up, setting some parameters. In the example, we will use holdout cross-validation (HOCV) to evaluate the RF classifier performance. This means that we will use an iterative split set approach, with a training and a testing set. So, for this purpose, we need to define the proportion of instances for training (<code>pTrain</code>); the remaining will be set aside for evaluation. Here I took into consideration the fact that RF tends to take some time to calibrate with large numbers of observations (~ &gt;10000) hence the relatively ‘large’ train proportion. We also need to define the number of repetitions in HOCV (<code>nEvalRounds</code>).</p>
<pre class="r"><code># Number of holdout evaluation rounds
nEvalRounds &lt;- 20

# Proportion of the data used for training the classifier
pTrain &lt;- 0.5</code></pre>
<p>Now let’s initialize some objects that will allow us to store some info on the the classification performance and validation:</p>
<pre class="r"><code>n &lt;- nrow(rstDF)
nClasses &lt;- length(unique(rstDF[,&quot;trainClass&quot;]))


# Initialize objects

confMats &lt;- array(NA, dim = c(nClasses,nClasses,nEvalRounds))

evalMatrix&lt;-matrix(NA, nrow=nEvalRounds, ncol=3, 
                   dimnames=list(paste(&quot;R_&quot;,1:nEvalRounds,sep=&quot;&quot;), 
                                 c(&quot;Accuracy&quot;,&quot;Kappa&quot;,&quot;PSS&quot;)))

pb &lt;- txtProgressBar(1, nEvalRounds, style = 3)</code></pre>
<p>Now, with all set, let´s calibrate and evaluate our RF classifier (use comments to guide you through the code):</p>
<pre class="r"><code># Evaluation function
source(url(&quot;https://raw.githubusercontent.com/joaofgoncalves/Evaluation/master/eval.R&quot;))


# Run the classifier

for(i in 1:nEvalRounds){
  
  # Create the random index for row selection at each round
  sampIdx &lt;- sample(1:n, size = round(n*pTrain))

  # Calibrate the RF classifier
  rf &lt;- randomForest(y = rstDF[sampIdx, &quot;trainClass&quot;], 
                     x = rstDF[sampIdx, -1], 
                     ntree = 200)

  # Predict the class to the test set
  testSetPred &lt;- predict(rf, newdata = rstDF[-sampIdx,], type = &quot;response&quot;)

  # Get the observed class vector
  testSetObs &lt;- rstDF[-sampIdx,&quot;trainClass&quot;]
  
  # Evaluate 
  evalData &lt;- Evaluate(testSetObs, testSetPred)
  
  evalMatrix[i,] &lt;- c(evalData$Metrics[&quot;Accuracy&quot;,1],
                      evalData$Metrics[&quot;Kappa&quot;,1],
                      evalData$Metrics[&quot;PSS&quot;,1])
  
  # Store the confusion matrices by eval round
  confMats[,,i] &lt;- evalData$ConfusionMatrix
  
  # Classify the whole image with raster::predict function
  rstPredClassTMP &lt;- predict(rst, model = rf, 
                             factors = levels(rstDF[,&quot;trainClass&quot;]))
  
  if(i==1){
    # Initiate the predicted raster
    rstPredClass &lt;- rstPredClassTMP
    
    # Get precision and recall for each class
    Precision &lt;- evalData$Metrics[&quot;Precision&quot;,,drop=FALSE]
    Recall &lt;- evalData$Metrics[&quot;Recall&quot;,,drop=FALSE]
    
  }else{
    # Stack the predicted rasters
    rstPredClass &lt;- stack(rstPredClass, rstPredClassTMP)
    
    # Get precision and recall for each class
    Precision &lt;- rbind(Precision,evalData$Metrics[&quot;Precision&quot;,,drop=FALSE])
    Recall &lt;- rbind(Recall,evalData$Metrics[&quot;Recall&quot;,,drop=FALSE])
  }
  
  setTxtProgressBar(pb,i)
  
}

# save.image(file = &quot;./data-raw/P8-session.RData&quot;)</code></pre>
<p>Three classification evaluation measures will be used: (i) <strong>overall accuracy</strong>, (ii) <strong>Kappa</strong>, and, (iii) <strong>Peirce-skill score</strong> (PSS; aka true-skill statistic). Let’s print out the results by round:</p>
<pre class="r"><code>knitr::kable(evalMatrix, digits = 3)</code></pre>
<table>
<thead>
<tr class="header">
<th></th>
<th align="right">Accuracy</th>
<th align="right">Kappa</th>
<th align="right">PSS</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>R_1</td>
<td align="right">0.788</td>
<td align="right">0.755</td>
<td align="right">0.749</td>
</tr>
<tr class="even">
<td>R_2</td>
<td align="right">0.789</td>
<td align="right">0.756</td>
<td align="right">0.751</td>
</tr>
<tr class="odd">
<td>R_3</td>
<td align="right">0.782</td>
<td align="right">0.748</td>
<td align="right">0.741</td>
</tr>
<tr class="even">
<td>R_4</td>
<td align="right">0.795</td>
<td align="right">0.763</td>
<td align="right">0.757</td>
</tr>
<tr class="odd">
<td>R_5</td>
<td align="right">0.789</td>
<td align="right">0.755</td>
<td align="right">0.749</td>
</tr>
<tr class="even">
<td>R_6</td>
<td align="right">0.789</td>
<td align="right">0.755</td>
<td align="right">0.750</td>
</tr>
<tr class="odd">
<td>R_7</td>
<td align="right">0.784</td>
<td align="right">0.751</td>
<td align="right">0.744</td>
</tr>
<tr class="even">
<td>R_8</td>
<td align="right">0.785</td>
<td align="right">0.751</td>
<td align="right">0.745</td>
</tr>
<tr class="odd">
<td>R_9</td>
<td align="right">0.785</td>
<td align="right">0.751</td>
<td align="right">0.745</td>
</tr>
<tr class="even">
<td>R_10</td>
<td align="right">0.789</td>
<td align="right">0.756</td>
<td align="right">0.750</td>
</tr>
<tr class="odd">
<td>R_11</td>
<td align="right">0.785</td>
<td align="right">0.752</td>
<td align="right">0.746</td>
</tr>
<tr class="even">
<td>R_12</td>
<td align="right">0.786</td>
<td align="right">0.752</td>
<td align="right">0.746</td>
</tr>
<tr class="odd">
<td>R_13</td>
<td align="right">0.786</td>
<td align="right">0.753</td>
<td align="right">0.747</td>
</tr>
<tr class="even">
<td>R_14</td>
<td align="right">0.785</td>
<td align="right">0.751</td>
<td align="right">0.745</td>
</tr>
<tr class="odd">
<td>R_15</td>
<td align="right">0.794</td>
<td align="right">0.762</td>
<td align="right">0.756</td>
</tr>
<tr class="even">
<td>R_16</td>
<td align="right">0.785</td>
<td align="right">0.752</td>
<td align="right">0.745</td>
</tr>
<tr class="odd">
<td>R_17</td>
<td align="right">0.791</td>
<td align="right">0.759</td>
<td align="right">0.753</td>
</tr>
<tr class="even">
<td>R_18</td>
<td align="right">0.782</td>
<td align="right">0.748</td>
<td align="right">0.741</td>
</tr>
<tr class="odd">
<td>R_19</td>
<td align="right">0.781</td>
<td align="right">0.746</td>
<td align="right">0.739</td>
</tr>
<tr class="even">
<td>R_20</td>
<td align="right">0.785</td>
<td align="right">0.751</td>
<td align="right">0.744</td>
</tr>
</tbody>
</table>
<p>Next, calculate some average and sd measures across all rounds:</p>
<pre class="r"><code>round(apply(evalMatrix,2,FUN = function(x,...) c(mean(x,...), sd(x,...))), 3)</code></pre>
<pre><code>##      Accuracy Kappa   PSS
## [1,]    0.787 0.753 0.747
## [2,]    0.004 0.004 0.005</code></pre>
<p>Overall measures seem to indicate that results are acceptable with very low variation between calibration rounds. Let’s check out some average <strong>precision</strong> (aka positive predictive value, PPV), <strong>recall</strong> (aka true positive rate, TPR) and F1 measures by class:</p>
<pre class="r"><code>avgPrecision &lt;- apply(Precision,2,mean)
print(round(avgPrecision, 3))</code></pre>
<pre><code>##    11    12    13    14    16    17     2     4     5     6     8     9 
## 0.963 0.705 0.717 0.906 0.789 0.998 0.657 0.345 0.509 0.698 0.664 0.484</code></pre>
<pre class="r"><code>avgRecall &lt;- apply(Recall,2,mean)
print(round(avgRecall, 3))</code></pre>
<pre><code>##    11    12    13    14    16    17     2     4     5     6     8     9 
## 0.979 0.685 0.648 0.960 0.434 1.000 0.611 0.093 0.481 0.869 0.647 0.268</code></pre>
<pre class="r"><code>avgF1 &lt;- (2 * avgPrecision * avgRecall) / (avgPrecision+avgRecall)
print(round(avgF1, 3))</code></pre>
<pre><code>##    11    12    13    14    16    17     2     4     5     6     8     9 
## 0.971 0.695 0.681 0.932 0.560 0.999 0.633 0.147 0.495 0.774 0.655 0.345</code></pre>
<p>Well, things are not so great here… 🙍‍♂️ Some classes, such as 4, 9, 5 (different artificial/urban types) and 16 (bare soil/sand) are not great. This may be a consequence of the loss of information detail due to the 100m re-sampling or some class intermixing, and train data generalization… 🤔 … this requires more investigation 😉</p>
<p>Now let’s check the confusion matrix for the best round:</p>
<pre class="r"><code># Best round for Kappa
evalMatrix[which.max(evalMatrix[,&quot;Kappa&quot;]), , drop=FALSE]</code></pre>
<pre><code>##      Accuracy     Kappa       PSS
## R_4 0.7946858 0.7626567 0.7569426</code></pre>
<pre class="r"><code># Show confusion matrix for the best kappa
cm &lt;- as.data.frame(confMats[,,which.max(evalMatrix[,&quot;Kappa&quot;])])

# Change row/col names
colnames(cm) &lt;- rownames(cm) &lt;- paste(&quot;c&quot;,levels(rstDF[,&quot;trainClass&quot;]),sep=&quot;_&quot;)

knitr::kable(cm)</code></pre>
<table>
<thead>
<tr class="header">
<th></th>
<th align="right">c_11</th>
<th align="right">c_12</th>
<th align="right">c_13</th>
<th align="right">c_14</th>
<th align="right">c_16</th>
<th align="right">c_17</th>
<th align="right">c_2</th>
<th align="right">c_4</th>
<th align="right">c_5</th>
<th align="right">c_6</th>
<th align="right">c_8</th>
<th align="right">c_9</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>c_11</td>
<td align="right">2442</td>
<td align="right">30</td>
<td align="right">2</td>
<td align="right">2</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">1</td>
<td align="right">1</td>
<td align="right">8</td>
<td align="right">0</td>
<td align="right">3</td>
</tr>
<tr class="even">
<td>c_12</td>
<td align="right">48</td>
<td align="right">330</td>
<td align="right">57</td>
<td align="right">23</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">1</td>
<td align="right">5</td>
<td align="right">24</td>
<td align="right">1</td>
<td align="right">7</td>
</tr>
<tr class="odd">
<td>c_13</td>
<td align="right">7</td>
<td align="right">43</td>
<td align="right">358</td>
<td align="right">97</td>
<td align="right">1</td>
<td align="right">1</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">2</td>
<td align="right">6</td>
<td align="right">4</td>
<td align="right">3</td>
</tr>
<tr class="even">
<td>c_14</td>
<td align="right">0</td>
<td align="right">20</td>
<td align="right">48</td>
<td align="right">2096</td>
<td align="right">1</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">1</td>
<td align="right">1</td>
<td align="right">11</td>
<td align="right">0</td>
<td align="right">5</td>
</tr>
<tr class="odd">
<td>c_16</td>
<td align="right">0</td>
<td align="right">3</td>
<td align="right">4</td>
<td align="right">29</td>
<td align="right">80</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">4</td>
<td align="right">4</td>
<td align="right">51</td>
<td align="right">5</td>
</tr>
<tr class="even">
<td>c_17</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">914</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
</tr>
<tr class="odd">
<td>c_2</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">4</td>
<td align="right">1</td>
<td align="right">0</td>
<td align="right">507</td>
<td align="right">6</td>
<td align="right">199</td>
<td align="right">23</td>
<td align="right">73</td>
<td align="right">1</td>
</tr>
<tr class="even">
<td>c_4</td>
<td align="right">2</td>
<td align="right">4</td>
<td align="right">6</td>
<td align="right">5</td>
<td align="right">2</td>
<td align="right">0</td>
<td align="right">2</td>
<td align="right">31</td>
<td align="right">114</td>
<td align="right">105</td>
<td align="right">16</td>
<td align="right">9</td>
</tr>
<tr class="odd">
<td>c_5</td>
<td align="right">5</td>
<td align="right">3</td>
<td align="right">3</td>
<td align="right">10</td>
<td align="right">1</td>
<td align="right">0</td>
<td align="right">171</td>
<td align="right">29</td>
<td align="right">619</td>
<td align="right">262</td>
<td align="right">87</td>
<td align="right">14</td>
</tr>
<tr class="even">
<td>c_6</td>
<td align="right">14</td>
<td align="right">19</td>
<td align="right">5</td>
<td align="right">4</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">1</td>
<td align="right">6</td>
<td align="right">142</td>
<td align="right">1753</td>
<td align="right">13</td>
<td align="right">38</td>
</tr>
<tr class="odd">
<td>c_8</td>
<td align="right">2</td>
<td align="right">7</td>
<td align="right">6</td>
<td align="right">15</td>
<td align="right">7</td>
<td align="right">3</td>
<td align="right">78</td>
<td align="right">7</td>
<td align="right">136</td>
<td align="right">35</td>
<td align="right">516</td>
<td align="right">13</td>
</tr>
<tr class="even">
<td>c_9</td>
<td align="right">3</td>
<td align="right">9</td>
<td align="right">10</td>
<td align="right">17</td>
<td align="right">1</td>
<td align="right">0</td>
<td align="right">0</td>
<td align="right">2</td>
<td align="right">10</td>
<td align="right">190</td>
<td align="right">5</td>
<td align="right">104</td>
</tr>
</tbody>
</table>
<p>Since we obtained one classified map for each round we can pull all that information by ensembling it together through a majority vote (i.e., calculating the modal class). The <code>raster</code> package <code>modal</code> function makes it really easy to calculate this:</p>
<pre class="r"><code>rstModalClass &lt;- modal(rstPredClass)

rstModalClassFreq &lt;- modal(rstPredClass, freq=TRUE)

medFreq &lt;- zonal(rstModalClassFreq, rstTrain, fun=median)</code></pre>
<p>Using the modal frequency of the 20 classification rounds, let’s check out which classes obtained the highest ‘uncertainty’:</p>
<pre class="r"><code>colnames(medFreq) &lt;- c(&quot;ClassCode&quot;,&quot;MedianModalFrequency&quot;)

medFreq[order(medFreq[,2],decreasing = TRUE),]</code></pre>
<pre><code>##       ClassCode MedianModalFrequency
##  [1,]         6                   20
##  [2,]        11                   20
##  [3,]        12                   20
##  [4,]        14                   20
##  [5,]        17                   20
##  [6,]         2                   19
##  [7,]         8                   19
##  [8,]        13                   19
##  [9,]         5                   15
## [10,]        16                   14
## [11,]         9                   13
## [12,]         4                   11</code></pre>
<p>These results somewhat confirm those of the class-wise precision/recall. Classes most often shifting (lower frequency) are those with lower values for these performance measures.</p>
<p>Finally, let’s plot our results of the the final modal classification map (and modal frequency):</p>
<pre class="r"><code>par(mfrow=c(1,2), cex.main=0.8, cex.axis=0.8)

plot(rstModalClass, main = &quot;RF modal land cover class&quot;)
plot(rstModalClassFreq, main = &quot;Modal frequency&quot;)</code></pre>
<div class="figure">
<img src="https://raw.githubusercontent.com/joaofgoncalves/R_exercises_raster_tutorial/master/img/P8_plot_results-1.png" />

</div>
<p>The map on the right provides some insight to identify areas that are more problematic for the classification process. However, as you can see, the results are acceptable but improvable in many ways.</p>
<p>This concludes our exploration of the raster package and supervised classification for this post. Hope you find it useful! 😄 👍 👍</p>
</div>
</div>


</div>

<script>

// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
  $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
}
$(document).ready(function () {
  bootstrapStylePandocTables();
});


</script>

<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
  (function () {
    var script = document.createElement("script");
    script.type = "text/javascript";
    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
    document.getElementsByTagName("head")[0].appendChild(script);
  })();
</script>

</body>
</html>