-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathglobal_spatial_autocorrelation.html
798 lines (734 loc) · 55.9 KB
/
global_spatial_autocorrelation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
<meta name="author" content="Luc Anselin and Grant Morrison" />
<meta name="date" content="2019-12-28" />
<title>Global Spatial Autocorrelation 1</title>
<script src="global_spatial_autocorrelation_files/jquery-1.11.3/jquery.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="global_spatial_autocorrelation_files/bootstrap-3.3.5/css/bootstrap.min.css" rel="stylesheet" />
<script src="global_spatial_autocorrelation_files/bootstrap-3.3.5/js/bootstrap.min.js"></script>
<script src="global_spatial_autocorrelation_files/bootstrap-3.3.5/shim/html5shiv.min.js"></script>
<script src="global_spatial_autocorrelation_files/bootstrap-3.3.5/shim/respond.min.js"></script>
<script src="global_spatial_autocorrelation_files/navigation-1.1/tabsets.js"></script>
<link href="global_spatial_autocorrelation_files/highlightjs-9.12.0/default.css" rel="stylesheet" />
<script src="global_spatial_autocorrelation_files/highlightjs-9.12.0/highlight.js"></script>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
pre:not([class]) {
background-color: white;
}
</style>
<script type="text/javascript">
if (window.hljs) {
hljs.configure({languages: []});
hljs.initHighlightingOnLoad();
if (document.readyState && document.readyState === "complete") {
window.setTimeout(function() { hljs.initHighlighting(); }, 0);
}
}
</script>
<style type="text/css">
h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}
</style>
<link rel="stylesheet" href="tutor.css" type="text/css" />
<style type = "text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}
</style>
<!-- tabsets -->
<style type="text/css">
.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
background: white;
border: 1px solid #ddd;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs > li.active:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: "";
border: none;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs > li.active {
display: block;
}
.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
background-color: transparent;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}
.tabset-dropdown > .nav-tabs > li {
display: none;
}
</style>
<!-- code folding -->
</head>
<body>
<div class="container-fluid main-container">
<div class="fluid-row" id="header">
<h1 class="title toc-ignore">Global Spatial Autocorrelation 1</h1>
<h3 class="subtitle">R Notes</h3>
<h4 class="author">Luc Anselin and Grant Morrison<a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a></h4>
<h4 class="date">12/28/2019</h4>
</div>
<div id="TOC">
<ul>
<li><a href="#introduction">Introduction</a><ul>
<li><a href="#objectives">Objectives</a><ul>
<li><a href="#r-packages-used">R Packages used</a></li>
<li><a href="#r-commands-used">R Commands used</a></li>
</ul></li>
</ul></li>
<li><a href="#preliminaries">Preliminaries</a><ul>
<li><a href="#load-packages">Load packages</a></li>
<li><a href="#geodadata-website">geodaData website</a></li>
<li><a href="#making-the-weights">Making the weights</a></li>
</ul></li>
<li><a href="#the-moran-scatter-plot">The Moran Scatter Plot</a><ul>
<li><a href="#concept">Concept</a><ul>
<li><a href="#morans-i">Moran’s I</a></li>
<li><a href="#permutation-inference">Permutation inference</a></li>
<li><a href="#moran-scatter-plot">Moran scatter plot</a></li>
</ul></li>
<li><a href="#creating-a-moran-scatter-plot">Creating a Moran scatter plot</a><ul>
<li><a href="#interpretation">Interpretation</a></li>
<li><a href="#assessing-significance">Assessing significance</a></li>
<li><a href="#reference-distribution">Reference distribution</a></li>
<li><a href="#chow-test-morans-i-scatterplot">Chow test Moran’s I scatterplot</a></li>
</ul></li>
</ul></li>
<li><a href="#spatial-correlogram">Spatial Correlogram</a><ul>
<li><a href="#concept-1">Concept</a></li>
<li><a href="#creating-a-spatial-correlogram">Creating a spatial correlogram</a></li>
<li><a href="#interpretation-1">Interpretation</a></li>
</ul></li>
</ul>
</div>
<p><br></p>
<div id="introduction" class="section level2 unnumbered">
<h2>Introduction</h2>
<p>This notebook cover the functionality of the <a href="https://geodacenter.github.io/workbook/5a_global_auto/lab5a.html">Global Spatial Autocorrelation 1</a> section of the GeoDa workbook. We refer to that document for details on the methodology, references, etc. The goal of these notes is to approximate as closely as possible the operations carried out using GeoDa by means of a range of R packages.</p>
<p>The notes are written with R beginners in mind, more seasoned R users can probably skip most of the comments on data structures and other R particulars. Also, as always in R, there are typically several ways to achieve a specific objective, so what is shown here is just one way that works, but there often are others (that may even be more elegant, work faster, or scale better).</p>
<p>For this notebook, we use Cleveland house price data. Our goal in this lab is show how to assign spatial weights based on different distance functions.</p>
<div id="objectives" class="section level3">
<h3>Objectives</h3>
<p>After completing the notebook, you should know how to carry out the following tasks:</p>
<ul>
<li><p>Visualize Moran’s I with a Moran scatterplot</p></li>
<li><p>Carry out inference using the permutation approach</p></li>
<li><p>Make analysis reproducible with the random seed</p></li>
<li><p>Create a LOWESS smooth of the Moran scatter plot</p></li>
<li><p>Conduct a Chow test with the Moran scatterplot</p></li>
<li><p>Analyze the range of spatial autocorrelation by means of a spatial correlogram</p></li>
</ul>
<div id="r-packages-used" class="section level4">
<h4>R Packages used</h4>
<ul>
<li><p><strong>sf</strong>: To read in the shapefile and make queen contiguity weights</p></li>
<li><p><strong>spdep</strong>: To create k-nearest neighbors and distance-band neighbors, calculate distances between neighbors, convert to a weights structure, and coercion methods to sparse matrices.</p></li>
<li><p><strong>ggplot2</strong>: To make customized plots such as a Moran’s I scatter plot and spatial correlogram.</p></li>
<li><p><strong>Hmisc</strong>: To get LOWESS smoother functionality in ggplot2.</p></li>
<li><p><strong>robustHD</strong>: To compute standarized scores for variables and lag variables. in construction of a Moran’s I scatterplot</p></li>
<li><p><strong>deldir</strong>: To create voronoi polygons.</p></li>
<li><p><strong>tidyverse</strong>: For basic data frame manipulation.</p></li>
<li><p><strong>gap</strong>: To compute chow test statistics.</p></li>
<li><p><strong>gridExtra</strong>: To pack multiple plots into one, mainly used to construct the spatial correlogram</p></li>
<li><p><strong>geodaData</strong>: To access the data for this tutorial</p></li>
</ul>
</div>
<div id="r-commands-used" class="section level4">
<h4>R Commands used</h4>
<p>Below follows a list of the commands used in this notebook. For further details and a comprehensive list of options, please consult the <a href="https://www.rdocumentation.org">R documentation</a>.</p>
<ul>
<li><p><strong>Base R</strong>: <code>install.packages</code>, <code>library</code>, <code>setwd</code>, <code>rep</code>, <code>sd</code>, <code>mean</code>, <code>summary</code>, <code>attributes</code>, <code>lapply</code>, <code>class</code>, <code>length</code>, <code>which</code>, <code>data.frame</code>, <code>plot</code></p></li>
<li><p><strong>sf</strong>: <code>st_read</code>, <code>st_relate</code>, <code>st_as_sf</code></p></li>
<li><p><strong>spdep</strong>: <code>dnearneigh</code>, <code>nb2listw</code>, <code>sp.correlogram</code>, <code>Szero</code></p></li>
<li><p><strong>ggplot2</strong>: <code>ggplot</code>, <code>geom_smooth</code>, <code>geom_point</code>, <code>xlim</code>, <code>ylim</code>, <code>geom_hline</code>, <code>geom_vline</code>, <code>geom_line</code>, <code>ggtitle</code>, <code>scale_x_continous</code></p></li>
<li><p><strong>Hmisc</strong>: <code>stat_plsmo</code></p></li>
<li><p><strong>robustHD</strong>: <code>standardized</code></p></li>
<li><p><strong>deldir</strong>: <code>deldir</code></p></li>
<li><p><strong>tidyverse</strong>: <code>filter</code></p></li>
<li><p><strong>gap</strong>: <code>chow.test</code></p></li>
<li><p><strong>gridExtra</strong>: <code>grid.arrange</code></p></li>
</ul>
</div>
</div>
</div>
<div id="preliminaries" class="section level2">
<h2>Preliminaries</h2>
<p>Before starting, make sure to have the latest version of R and of packages that are compiled for the matching version of R (this document was created using R 3.5.1 of 2018-07-02). Also, optionally, set a working directory, even though we will not actually be saving any files.<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a></p>
<div id="load-packages" class="section level3">
<h3>Load packages</h3>
<p>First, we load all the required packages using the <code>library</code> command. If you don’t have some of these in your system, make sure to install them first as well as their dependencies.<a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a> You will get an error message if something is missing. If needed, just install the missing piece and everything will work after that.</p>
<pre class="r"><code>library(sf)
library(spdep)
library(ggplot2)
library(deldir)
library(robustHD)
library(Hmisc)
library(tidyverse)
library(gap)
library(gridExtra)
library(geodaData)</code></pre>
</div>
<div id="geodadata-website" class="section level3 unnumbered">
<h3>geodaData website</h3>
<p>All of the data for the R notebooks is available in the <strong>geodaData</strong> package. We loaded the library earlier, now to access the individual data sets, we use the double colon notation. This works similar to to accessing a variable with <code>$</code>, in that a drop down menu will appear with a list of the datasets included in the package. For this notebook, we use <code>clev_pts</code>.</p>
<p>Otherwise, to get the data for this notebook, you will and to go to <a href="https://geodacenter.github.io/data-and-lab//clev_sls_154_core/">Cleveland Home Sales</a> The download format is a zipfile, so you will need to unzip it by double clicking on the file in your file finder. From there move the resulting folder titled: nyc into your working directory to continue. Once that is done, you can use the <strong>sf</strong> function: <code>st_read()</code> to read the shapefile into your R environment.</p>
<pre class="r"><code>clev.points <- geodaData::clev_pts</code></pre>
</div>
<div id="making-the-weights" class="section level3">
<h3>Making the weights</h3>
<p>The weights used for this notebook are queen contiguity, based off voronoi polygons contructed from the point data for this notebook. In order to make the weights, we must first construct voronoi polygons from the cleveland point data. There are a number of ways to do this. We will be using the <strong>deldir</strong> package as a starting point. We will need to convert the result from the <strong>deldir</strong> package to class <strong>sf</strong>, which we have been working with throughout the notebooks.</p>
<p>The only function we need from <strong>deldir</strong> is <code>deldir</code>, which outputs a data structure with voronoi polygons. The only inputs needed are a vector of the x coordinates and a vector of the y coordinates. The base R <code>plot</code> function can give us a preliminary look at the voronoi polygons. We will need a few additional parameters other than <strong>vtess</strong>, so the plot is legitable. Set <code>wlines = "tess"</code>, <code>wpoints = "none"</code> and <code>lty = 1</code>.</p>
<pre class="r"><code>vtess <- deldir(clev.points$x, clev.points$y)
plot(vtess, wlines="tess", wpoints="none",
lty=1)</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-4-1.png" width="672" /></p>
<p>This function will be used to convert the <strong>deldir</strong> voronoi polygons to <strong>sp</strong>, where we can easily convert them to <strong>sf</strong>. We are not going to cover the individual steps of this function because it is outside the scope of these notebooks. The important thing to note here is that this function converts <strong>deldir</strong> voronoi polygons to <strong>sp</strong>.</p>
<pre class="r"><code>voronoipolygons = function(thiess) {
w = tile.list(thiess)
polys = vector(mode='list', length=length(w))
for (i in seq(along=polys)) {
pcrds = cbind(w[[i]]$x, w[[i]]$y)
pcrds = rbind(pcrds, pcrds[1,])
polys[[i]] = Polygons(list(Polygon(pcrds)), ID=as.character(i))
}
SP = SpatialPolygons(polys)
voronoi = SpatialPolygonsDataFrame(SP, data=data.frame(dummy = seq(length(SP)), row.names=sapply(slot(SP, 'polygons'),
function(x) slot(x, 'ID'))))
}</code></pre>
<p>Again, we can use the base R <code>plot</code> function to take a look at voronoi polygons. Now that they are class <strong>sp</strong>, we don’t need the extra parameters in <code>plot</code>.</p>
<pre class="r"><code>v <- voronoipolygons(vtess)
plot(v)</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-6-1.png" width="672" /></p>
<p>With the voronoi polygons in <strong>sp</strong> class, we can easily conert them to <strong>sf</strong> with <code>st_as_sf</code>. Again we use the base R <code>plot</code> function to view the polygons.</p>
<pre class="r"><code>vtess.sf <- st_as_sf(v)
plot(vtess.sf$geometry)</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-7-1.png" width="672" /></p>
<p>Now that we have the voronoi polygons as an <strong>sf</strong> object, we can use the queen contiguity process outline in the Contiguity Based Weights notebook. We will briefly cover each step of the process. For more indepth information please see the Contiguity Based Weights notebook.</p>
<p>To start we create a function for queen contiguity, which is just <code>st_relate</code> with the specified pattern for queen contiguity which is <code>F***T****</code></p>
<pre class="r"><code>st_queen <- function(a, b = a) st_relate(a, b, pattern = "F***T****")</code></pre>
<p>We apply the queen contiguity function to the voronoi polygons and see that the class of the output is <strong>sgbp</strong>. This structure is close to the <strong>nb</strong> structure, but has a few differences that we will need to correct to use the rest of <strong>spdep</strong> functionality.</p>
<pre class="r"><code>queen.sgbp <- st_queen(vtess.sf)
class(queen.sgbp)</code></pre>
<pre><code>## [1] "sgbp"</code></pre>
<p>This function converts type <strong>sgbp</strong> to <strong>nb</strong>. It is covered in more depth in the Contiguity Based Weight notebook. In short, it explicitly changes the name of the class and deals with the observations that have no neighbors.</p>
<pre class="r"><code>as.nb.sgbp <- function(x, ...) {
attrs <- attributes(x)
x <- lapply(x, function(i) { if(length(i) == 0L) 0L else i } )
attributes(x) <- attrs
class(x) <- "nb"
x
}</code></pre>
<p>We use <code>as.nb.sgbp</code> to convert neighbor types and then check the class with <code>class</code>.</p>
<pre class="r"><code>queen.nb <- as.nb.sgbp(queen.sgbp)
class(queen.nb)</code></pre>
<pre><code>## [1] "nb"</code></pre>
<p>To go from neighbors object to weights object, we use <code>nb2listw</code>, with default parameters, we will get row standardized weights.</p>
<pre class="r"><code>queen.weights <- nb2listw(queen.nb)</code></pre>
</div>
</div>
<div id="the-moran-scatter-plot" class="section level2">
<h2>The Moran Scatter Plot</h2>
<div id="concept" class="section level3">
<h3>Concept</h3>
<div id="morans-i" class="section level4">
<h4>Moran’s I</h4>
<p>Moran’s I statistic is arguably the most commonly used indicator of global spatial autocorrelation. It was initially suggested by Moran (1948), and popularized through the classic work on spatial autocorrelation by Cliff and Ord (1973). In essence, it is a cross-product statistic between a variable and its spatial lag, with the variable expressed in deviations from its mean. For an observation at location i, this is expressed as <span class="math inline">\(z_i = x_i - \bar{x}\)</span>, where <span class="math inline">\(\bar{x}\)</span>is the mean of variable x.</p>
<p>Moran’s I statistic is then:</p>
<p><span class="math display">\[I = \frac{\Sigma_i\Sigma_jw_{ij}z_iz_j/S_0}{\Sigma_iz_i^2/n}\]</span> with <span class="math inline">\(w_{ij}\)</span> as elements of the spatial weights matrix, <span class="math inline">\(S_0 = \Sigma_i\Sigma_jw_{ij}\)</span> as the sum of all of the weights and n as the number of observations.</p>
</div>
<div id="permutation-inference" class="section level4">
<h4>Permutation inference</h4>
<p>Inference for Moran’s I is based on a null hypothesis of spatial randomness. The distribution of the statistic under the null can be derived using either an assumption of normality (independent normal random variates), or so-called randomization (i.e., each value is equally likely to occur at any location).</p>
<p>An alternative to an analytical derivation is a computational approach based on permutation. This calculates a reference distribution for the statistic under the null hypothesis of spatial randomness by randomly permuting the observed values over the locations. The statistic is computed for each of these randomly reshuffled data sets, which yields a reference distribution.</p>
<p>This distribution is then used to calculate a so-called pseudo p-value. This is found as <span class="math display">\[p = \frac{R +1}{M+1}\]</span> where R is the number of times the computed Moran’s I from the spatial random data sets (the permuted data sets) is equal to or more extreme than the observed statistic. M equals the number of permutations. The latter is typically taken as 99, 999, etc., to yield nicely rounded pseudo p-values.</p>
<p>The pseudo p-value is only a summary of the results from the reference distribution and should not be interpreted as an analytical p-value. Most importantly, it should be kept in mind that the extent of significance is determined in part by the number of random pemutations. More precisely, a result that has a p-value of 0.01 with 99 permutations is not necessarily more significant than a result with a p-value of 0.001 with 999 permutations.</p>
</div>
<div id="moran-scatter-plot" class="section level4">
<h4>Moran scatter plot</h4>
<p>The Moran scatter plot, first outlined in Anselin (1996), consists of a plot with the spatially lagged variable on the y-axis and the original variable on the x-axis. The slope of the linear fit to the scatter plot equals Moran’s I.</p>
<p>We consider a variable z, given in deviations from the mean. With row-standardized weights, the sum of all the weights (S0) equals the number of obsevations (n). As a result, the expression for Moran’s I simplifies to:</p>
<p><span class="math display">\[I= \frac{\Sigma_i\Sigma_jw_{ij}z_iz_j}{\Sigma_iz_i^2} = \frac{\Sigma_i(z_i*\Sigma_jw_{ij}z_j)}{\Sigma_iz_i^2}\]</span></p>
<p>Upon closer examination, this turns out to be the slope of a regression of <span class="math inline">\(\Sigma_jw_{ij}z_i\)</span> on <span class="math inline">\(z_i\)</span> This is the principle underlying the Moran scatter plot.</p>
<p>An important aspect of the visualization in the Moran scatter plot is the classification of the nature of spatial autocorrelation into four categories. Since the plot is centered on the mean (of zero), all points to the right of the mean have <span class="math inline">\(z_i>0\)</span> and all points to the left have <span class="math inline">\(z_i<0\)</span>. We refer to these values respectively as high and low, in the limited sense of higher or lower than average. Similarly, we can classify the values for the spatial lag above and below the mean as high and low.</p>
<p>The scatter plot is then easily decomposed into four quadrants. The upper-right quadrant and the lower-left quadrant correspond with positive spatial autocorrelation (similar values at neighboring locations). We refer to them as respectively high-high and low-low spatial autocorrelation. In contrast, the lower-right and upper-left quadrant correspond to negative spatial autocorrelation (dissimilar values at neighboring locations). We refer to them as respectively high-low and low-high spatial autocorrelation.</p>
<p>The classification of the spatial autocorrelation into four types begins to make the connection between global and local spatial autocorrelation. However, it is important to keep in mind that the classification as such does not imply significance. This is further explored in our discussion of local indicators of spatial association (LISA).</p>
</div>
</div>
<div id="creating-a-moran-scatter-plot" class="section level3">
<h3>Creating a Moran scatter plot</h3>
<p>Before we create the Moran’s I scatterplot, we will get the statistic using <code>moran</code> from <strong>spdep</strong>. For this function, we need the a variable to do the Moran’s I on, a weights structure, the length of the dataset, and then <code>Szero</code> of the queen weights, which calculates the constants needed for tests of spatial autocorrelation.</p>
<pre class="r"><code>moran <- moran(clev.points$sale_price, queen.weights, length(queen.nb), Szero(queen.weights))
moran$I</code></pre>
<pre><code>## [1] 0.2810649</code></pre>
<p>We get a value of .281, which is the Moran’s I statistic, which also corresponds to the slope of the Moran’s I scatter plot.</p>
<p>In creating the Moran’s I scatterplot, we will need to to create a lag variable of sale price from our queen weights. This is just done with the function <code>lag.listw</code>, which takes a weights structure and a variable of equal length to create a lag variable from.</p>
<pre class="r"><code>clev.points$lagged_sale_price <- lag.listw(queen.weights,clev.points$sale_price)
clev.points$lagged_sale_price</code></pre>
<pre><code>## [1] 83287.500 112912.500 80178.571 108550.000 58375.000 96816.667
## [7] 115600.000 41503.571 59732.143 80025.000 48785.714 110750.000
## [13] 103000.000 72300.000 113050.000 141918.167 112900.000 55142.857
## [19] 140594.143 88871.429 213734.833 176844.143 201950.000 134800.000
## [25] 67350.000 46371.375 88125.000 137600.000 107625.000 28357.143
## [31] 26218.750 9025.000 18291.667 25950.000 7002.667 41863.300
## [37] 28187.500 17300.000 10250.000 25333.333 18418.750 21437.500
## [43] 26430.000 20771.000 60539.000 43412.500 102080.375 24936.667
## [49] 29619.143 28094.200 16657.000 18691.500 22193.625 21575.667
## [55] 17804.571 128700.000 35710.400 45935.250 36411.500 42500.000
## [61] 82133.333 71680.000 81100.000 55733.333 69066.667 66600.000
## [67] 39300.000 59750.000 58000.000 48625.000 45593.750 15441.667
## [73] 47174.750 33416.667 35150.000 34083.500 51762.500 36304.750
## [79] 31783.600 12908.000 14676.600 15166.667 22483.333 19541.667
## [85] 20426.600 21100.000 36212.000 20900.000 32775.000 29946.600
## [91] 44350.333 24431.286 21279.750 62380.000 21683.667 25914.900
## [97] 42369.667 8481.000 10904.800 13657.000 13830.000 14412.250
## [103] 24264.143 14487.333 32960.571 12226.800 8708.500 12655.667
## [109] 15714.286 20383.333 21151.667 10258.571 18451.667 131651.500
## [115] 20420.000 182072.714 140650.000 22034.000 43550.000 53100.000
## [121] 41488.889 26500.000 13095.714 8764.444 9208.667 6156.667
## [127] 11761.333 5529.500 7409.600 40510.000 34388.286 9250.000
## [133] 9128.000 15470.000 12699.714 18842.000 14440.000 10125.000
## [139] 17967.500 9500.000 30250.000 17400.000 11174.000 13248.833
## [145] 9068.750 25633.333 7248.500 66900.000 62812.500 67125.000
## [151] 88250.000 28300.000 78900.000 71093.750 9750.000 14085.714
## [157] 12350.000 15216.667 25670.000 26750.000 57414.286 21920.000
## [163] 43500.000 16912.500 19062.500 15666.667 65044.444 20583.333
## [169] 11358.333 34816.667 34750.000 39360.000 29092.857 24422.222
## [175] 23360.000 42685.714 29080.000 16983.333 21666.667 31960.000
## [181] 24400.000 7242.857 16180.000 28216.667 9633.333 32742.857
## [187] 28833.333 21360.000 29391.667 24414.286 34720.000 14083.333
## [193] 14860.000 17916.667 23133.333 27850.000 18370.000 16975.000
## [199] 45980.000 36700.000 33583.333 29750.000 35416.667 28614.286
## [205] 22000.000</code></pre>
<p>We need standardized values for both the lag variable and the sale price variable to build the Moran’I scatterplot. Standardized values are just z scores for each observation(<span class="math inline">\(z_i =\frac{ x_i -\mu}{\sqrt{Var(x)}}\)</span>). To get the standardized values, we will use <code>standardize</code> from the <strong>robustHD</strong> package. We could very easily calculate these with Base R vectorized operations, but it is faster to just use a package function.</p>
<pre class="r"><code>clev.points$standardized_sale_price <- standardize(clev.points$sale_price)
clev.points$standardized_lag_sale_price <- standardize(clev.points$lagged_sale_price)</code></pre>
<p>To construct the moran’s I scatterplot, we will use <strong>ggplot2</strong> for more aesthically pleasing plots. We will not go into too much depth on the options available for for these plots, but for more information, check out <a href="https://ggplot2.tidyverse.org/reference/index.html">ggplot2 documentation</a>. In addition, The first Exploratory Data Analysis notebook is also a good resource to look at.</p>
<p>To make the Moran’s I scatterplot, we make a scatterplot in <strong>ggplot2</strong> with the standardized sale price values as the x axis and the standardized lag sale price variable as the y axis. We use <code>geom_point</code> to add the points to the plot. <code>geom_smooth</code> adds a regression line. The default option is a loess smooth line, we specify the <code>method = lm</code> to get a standard linear regression line . We add dotted lines at the x and y axis to separate the 4 types of spatial autocorrelation. We do this with <code>geom_hline</code> for the x axis and <code>geom_vline</code> for the y axis. To get the speciifcations of the scatterplot to better match up with GeoDa, we set the x and y scale limits to -10 and 10.</p>
<pre class="r"><code>ggplot(data = clev.points, aes(x=standardized_sale_price, y = standardized_lag_sale_price)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
geom_hline(yintercept = 0, lty = 2) +
geom_vline(xintercept = 0, lty = 2) +
xlim(-10,10) +
ylim(-10,10) +
ggtitle("Moran scatter plot sale price")</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-16-1.png" width="672" /></p>
<div id="interpretation" class="section level4">
<h4>Interpretation</h4>
<p>We can see that the shape of the point cloud is determined by the presence of several outliers on the high end (e.g., larger than three standard deviational units from the mean). One observation, with a sales price of 527,409 (compared to the median sales prices of 20,000), is as large as 8 standard deviational units above the mean. On the lower end of the spectrum (to the left of the dashed line in the middle that represents the mean), there is much less spread in the house prices, and those points end up bunched together. By eliminating some of the outliers, one may be able to see more detail for the remaining observations, but we will not pursue that here.</p>
</div>
<div id="assessing-significance" class="section level4">
<h4>Assessing significance</h4>
<p>We have an estimate of the Moran’s I statistic, but no information on the significance of that statistic. This can be obtained by constructing a distribution by means of random assignment. In order to do this, we first choose the number of permutations desired, which will directly affect the minimum pseudo p-value we can obtain for the test statistic. In the case of 999 permutation the minimum p-value would be .001, which would mean none of the sample distribution statistics are as extreme or more extreme than the test statistic.</p>
<div id="replicability---the-random-seed" class="section level5">
<h5>Replicability - the random seed</h5>
<p>To faciliate replication, it is best to set a seed for the random number generator. The one used in GeoDa is 123456789, so we will demonstrate how to set the seed here. It is just <code>set.seed</code> and the desired seed number as an input.</p>
<pre class="r"><code>set.seed(123456789)</code></pre>
</div>
</div>
<div id="reference-distribution" class="section level4">
<h4>Reference distribution</h4>
<p>To make the reference distribution, we will need to draw 999 randomized samples of the housing point data of the same size as the number of observations in the housing point data. This random sample will allow us to assign the values to random locations, which will give us a spatially random distribution. To get to this point, we will build up in steps in order to better understand the process.</p>
<p>We start by taking one random sample of our points with the base R <code>sample</code> function. We choose the same size as the number of sale price data observations to make a spatially randomized vector of our sale price data. The point of this is to randomly assign the housing prices to the voronoi polygons, then to compute the Moran’s I statistic for the spatially random assignment based off the original weights structure.</p>
<pre class="r"><code>draw <- sample(clev.points$sale_price, size = length(clev.points$sale_price))
draw</code></pre>
<pre><code>## [1] 5000 10000 8500 4775 4500 5200 28500 62000 12000 20000
## [11] 30000 17000 19000 167000 24900 41500 65000 131650 26400 12000
## [21] 172500 65000 2500 120000 305000 15500 5000 69000 6800 49000
## [31] 85000 15000 228000 1300 14500 14000 42500 4250 116250 29000
## [41] 11750 14500 42000 15000 73400 24800 10000 5000 84900 235500
## [51] 40000 28000 47000 7000 11000 275000 16500 29150 100000 155000
## [61] 8000 15000 63000 34750 7900 13000 18000 8000 48500 15000
## [71] 26000 38500 17500 8000 16900 23500 65000 21119 15000 75000
## [81] 76000 7000 48000 27750 15000 51000 9000 6300 4924 72000
## [91] 25750 33500 45900 13000 5500 125000 7500 10000 22000 10000
## [101] 73000 26000 20000 9000 11582 169500 20000 25000 8600 72900
## [111] 5000 230000 6375 2910 23000 3000 41000 16500 3000 32500
## [121] 5000 25250 4149 40000 25750 7000 25000 8000 68900 16000
## [131] 48500 15000 109900 11500 6483 49000 27600 122500 6370 9500
## [141] 32500 27000 47000 3500 45000 135000 11500 5968 13100 8000
## [151] 6500 25000 138500 5000 11750 10500 5000 38200 10000 25000
## [161] 7000 7000 29900 75000 34000 7000 527409 4000 15000 19000
## [171] 5500 76000 20000 21199 41069 78000 15000 26000 1049 65000
## [181] 103000 92000 8400 10000 82500 56900 20000 13000 2100 81500
## [191] 15834 20000 19971 12000 1448 70000 5800 64000 48900 32500
## [201] 165000 5000 16000 11750 89500</code></pre>
<p>Now we can begin to calculate the Moran’s I statistic by first calculating the spatial lag variable based on our queen weights and the spatially random sample.</p>
<pre class="r"><code>lag1 <- lag.listw(queen.weights,draw)</code></pre>
<p>We can get the Moran’s I statistic by regressing the standardized values of the spatial lag variable on the standardized values of the random draw. We can get the standardized value with the <code>standardize</code> function. The <code>summary</code> function allows us to see a summary of the regression statistics.</p>
<pre class="r"><code>lmfit <- lm(standardize(lag1) ~ standardize(draw))
summary(lmfit)</code></pre>
<pre><code>##
## Call:
## lm(formula = standardize(lag1) ~ standardize(draw))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1296 -0.6232 -0.2418 0.2690 5.5560
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.833e-17 6.997e-02 0.000 1.000
## standardize(draw) -3.771e-02 7.014e-02 -0.538 0.591
##
## Residual standard error: 1.002 on 203 degrees of freedom
## Multiple R-squared: 0.001422, Adjusted R-squared: -0.003497
## F-statistic: 0.2891 on 1 and 203 DF, p-value: 0.5914</code></pre>
<p>The slope here is the estimate for <strong>standardize(draw)</strong>. This value is fairly close to zero as the randomization process makes makes the draw spatially random.</p>
<p>To build our distribution, we will need to repeat this process many times over. We can accomplish this by way of a <code>for</code> loop. We will need somewhere to store our Moran’s I result for each iteration. To do this we will make an empty vector of a length corresponding to our desired number of permutations.</p>
<pre class="r"><code>randomized_moran <- rep(NA, 999)</code></pre>
<p>The process here is the same as the one followed for one draw, but here we use the <code>for</code> loop to get 999 iterations and store the resulting Moran’s I values in the vector that we created above. First we do the random sample with the <code>sample</code> function. Then we make a lag variable based upon the random draw and our queen weights. Next we run the regression with the <code>lm</code> function between the stanardized values of the lag variable and random draw variable. Lastly, we extract the slope coefficient which is our Moran’s I statistic and store it in index i. Each iteration of the loop will store the value at the subsequent index ie 1, then 2, then 3, and so on.</p>
<pre class="r"><code>for(i in 1:999){
draw <- sample(clev.points$sale_price, size = length(clev.points$sale_price))
lag <- lag.listw(queen.weights,draw)
lmfit <- lm(standardize(lag) ~ standardize(draw))
randomized_moran[i] <- lmfit$coefficients[2]
}</code></pre>
<p>We can obtain summary statistics of our distribution with <code>summary</code></p>
<pre class="r"><code>summary(randomized_moran)</code></pre>
<pre><code>## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.26215 -0.07946 -0.02163 -0.01469 0.04247 0.37970</code></pre>
<pre class="r"><code>sd(randomized_moran)</code></pre>
<pre><code>## [1] 0.09099489</code></pre>
<p>Now to get the p value, we will check the number of samples that had higher Moran’s I statistic than the observed value. To do this, we use the base R <code>which</code> function to get a vector of the indices at which the conditional is TRUE. We then get the length of the vector with <code>length</code>.</p>
<pre class="r"><code>length(which(randomized_moran > .281))</code></pre>
<pre><code>## [1] 7</code></pre>
<p>Since the result is 1, there is only 1 value in all of the permutations that is higher than the test statistic. This means that the p value is .002, <span class="math inline">\(\frac{1 + R}{1 + M}\)</span>, where R = 1 and M = 999.</p>
<p>There are a number of ways we can visualize the distribution that we just constructed in R. We will use <strong>ggplot2</strong> to do these visualizations because it looks much better than base R visualizations.</p>
<p>To start, we convert our vector with the randomized moran’s I values into a data frame, so we can use <strong>ggplot2</strong> functions. For this, we just use the <code>data.frame</code> function with the vector of randomized moran’s I values as an argument and then assign a name for the column, which is just <code>moran</code> in this case.</p>
<p>The first option is a density plot. This requires the standard <code>ggplot</code> function with <code>aes</code> containing the x axis. Additionally we need <code>geom_density</code>. We use <code>geom_vline</code> to plot the mean of the distribution and our observed statistic.</p>
<pre class="r"><code>df <- data.frame(moran = randomized_moran)
ggplot(data = df,aes(x=moran)) +
geom_density() +
geom_vline(xintercept = moran[[1]], col = "green") +
geom_vline(xintercept = mean(randomized_moran), col = "blue")</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-26-1.png" width="672" /></p>
<p>The next option is a histogram. The only difference here is that we use <code>geom_histogram</code> instead of `geom_density.</p>
<pre class="r"><code>ggplot(data = df, aes(x=moran)) +
geom_histogram() +
geom_vline(xintercept = moran[[1]], col = "green") +
geom_vline(xintercept = mean(randomized_moran), col = "blue")</code></pre>
<pre><code>## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-27-1.png" width="672" /> #### LOWESS smoother</p>
<p>The LOWESS smoother is not implemented directly in <strong>ggplot2</strong>, but can be found in an add-on package. We use the <strong>Hmisc</strong> package to add this functionality to the <strong>ggplot2</strong> plots. To add the smoother to our Moran’s I scatter plot, we use the <code>stat_plsmo</code> from the <strong>Hmisc</strong> package. The default span for GeoDa is .2 so we will set the <code>span =</code> parameter to .2.</p>
<p>With the LOWESS smoother, we can see potential structural breaks in the pattern of spatial autocorrelation. For example some parts of the data, the curve may be very steep, and positive, indicating strong spatial autocorrelation, whereas in other parts, it could be flat, indicating no spatial autocorrelation.</p>
<pre class="r"><code>ggplot(data = clev.points, aes(x=standardized_sale_price, y = standardized_lag_sale_price)) +
geom_point() +
stat_plsmo(span = .2, color = "blue") +
geom_hline(yintercept = 0, lty = 2) +
geom_vline(xintercept = 0, lty = 2) +
xlim(-10,10) +
ylim(-10,10) +
ggtitle("LOWESS smooth of Moran Scatterplot")</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-28-1.png" width="672" /></p>
</div>
<div id="chow-test-morans-i-scatterplot" class="section level4">
<h4>Chow test Moran’s I scatterplot</h4>
<p>The Chow test is a statistical test of whether or not the coeffiecients of two different linear regressions are equal. In the case of the Moran’s I scatterplot, it is just the slope of the regression line and the intercepts, since it is a simple linear regression.</p>
<p>The brushing operation in GeoDa is fairly difficult to implement in R, but we can do a less interactive version. First we must consider which criteria we want to select points on. This could be anything from its location to other characteristics in the data. In our case we will do it based on location. As an approximation for the midpoint of the set of points, we take the the mean of the x and y coordinates. From there we assign “Select” to the points in the bottom left quadrant and “Rest” to the rest of the points by way of the <code>if_else</code> function. This function takes a conditional, a result to assign in the case where the conditional is TRUE, and a result to assign when the conditional is FALSE. In our case it is “Select” and “Rest”.</p>
<pre class="r"><code>mid_x <- mean(clev.points$x)
mid_y <- mean(clev.points$y)
clev.points<- clev.points %>% mutate(bottom_left = if_else((x < mid_x & y < mid_y),"Select", "Rest"))</code></pre>
<p>Before we run the chow test, we will visualize the difference in slopes of the selected data, non-selected data and the aggregate data. With <strong>ggplot2</strong>, we can accomplish this by setting categorical colors based whether or not an observation is “Selected” or “Rest”. To do this, we specify <code>aes(color = bottom_left)</code> in both <code>geom_point</code> and <code>geom_smooth</code>. This will give us colored points and regression lines for “Selected” and “Rest”. Then to get blue and red colors, we use <code>scale_color_manual</code>. For this plot, we do not set x and y limits because the -10 to 10 speciifcation is too dificult to see the differences in the regression lines.</p>
<pre class="r"><code>ggplot(clev.points, aes(x=standardized_sale_price,y=standardized_lag_sale_price)) +
geom_point(aes(color=bottom_left)) +
geom_smooth(aes(color=bottom_left), method = lm, se = FALSE) +
geom_smooth(method=lm,se = FALSE, color = "black") +
scale_color_manual(values=c("blue","red")) +
labs(color="Selection") +
geom_hline(yintercept = 0, lty = 2) +
geom_vline(xintercept = 0, lty = 2) +
ggtitle("Chow test Moran Scatterplot")</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-30-1.png" width="672" /></p>
<p>To perform the chow test, we need two separate data frames as inputs for the function. To get the two data frames, we use the <strong>tidyverse</strong> <code>filter</code> function. This function filter out observations based on a conditional. TRUE values for the conditional remain in the data frame while FALSE values are filtered out.</p>
<pre class="r"><code>clev.select <- clev.points %>% filter(bottom_left == "Select")
clev.rest <- clev.points %>% filter(bottom_left == "Rest")</code></pre>
<p>Now we use the base R <code>lm</code> function to run separate regressions on the standardized lag variable and standardized sale price variable.</p>
<pre class="r"><code>reg.select <- lm(standardized_lag_sale_price~standardized_sale_price, data=clev.select)
reg.rest <- lm(standardized_lag_sale_price~standardized_sale_price, data=clev.rest)</code></pre>
<p>Now we use the <code>summary</code> function on each regression object to get summary statistics of the residuals, the regression coefficients and and their respective standard errors, the R squared values, and the F statistic.</p>
<pre class="r"><code>summary(reg.select)</code></pre>
<pre><code>##
## Call:
## lm(formula = standardized_lag_sale_price ~ standardized_sale_price,
## data = clev.select)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.32083 -0.26382 -0.08486 0.25641 2.38954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.15667 0.06062 -2.584 0.0117 *
## standardized_sale_price 0.51228 0.11818 4.335 4.53e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.492 on 74 degrees of freedom
## Multiple R-squared: 0.2025, Adjusted R-squared: 0.1917
## F-statistic: 18.79 on 1 and 74 DF, p-value: 4.534e-05</code></pre>
<pre class="r"><code>summary(reg.rest)</code></pre>
<pre><code>##
## Call:
## lm(formula = standardized_lag_sale_price ~ standardized_sale_price,
## data = clev.rest)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2276 -0.6161 -0.3336 0.2146 4.5370
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.10241 0.09335 1.097 0.275
## standardized_sale_price 0.42072 0.07816 5.383 3.42e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.056 on 127 degrees of freedom
## Multiple R-squared: 0.1858, Adjusted R-squared: 0.1793
## F-statistic: 28.97 on 1 and 127 DF, p-value: 3.418e-07</code></pre>
<p>We see that the slopes vary by about .08 and the intercepts vary by .25</p>
<p>To run the chow test, we need 4 inputs for <code>chow.test</code>. We need the two standardized variables from the the “Select” data frame: <strong>clev.select</strong> and the two standardized variables from the the “Rest” data frame: <strong>clev.rest</strong>.</p>
<pre class="r"><code>chow <- chow.test(clev.select$standardized_lag_sale_price, clev.select$standardized_sale_price, clev.rest$standardized_lag_sale_price, clev.rest$standardized_sale_price)
chow</code></pre>
<pre><code>## F value d.f.1 d.f.2 P value
## 2.2974700 2.0000000 201.0000000 0.1031467</code></pre>
<p>With a p-value of .103 we do not have significant evidence to conclude that the slopes of the two regressions are different under a standard alpha level of .05.</p>
</div>
</div>
</div>
<div id="spatial-correlogram" class="section level2">
<h2>Spatial Correlogram</h2>
<div id="concept-1" class="section level3">
<h3>Concept</h3>
<p>A non-parametric spatial correlogram is an alternative measure of global spatial autocorrelation that does not rely on the specification of a spatial weights matrix. Instead, a local regression is fit to the covariances or correlations computed for all pairs of observations as a function of the distance between them (for example, as outlined in Bjornstad and Falck 2001).</p>
<p>With standardized variables z, this boils down to a local regression:</p>
<p><span class="math display">\[z_iz_j = f(d_{ij}) + u\]</span></p>
<p>where <span class="math inline">\(d_{ij}\)</span> is the distance between a pair of locations i - j, u is an error term and f is a non-parametric function to be determined from the data. Typically, the latter is a LOWESS or kernel regression.</p>
</div>
<div id="creating-a-spatial-correlogram" class="section level3">
<h3>Creating a spatial correlogram</h3>
<p>In GeoDa, creating a spatial correlogram is much more straight forward than in R. The process in r requires us to start with the sale price points, then to create a neighbors structure base on the distance breaks desired for the correlogram.</p>
<p>To start, we use <code>cbind</code> to put the x and y coordinates together for use in the distance based neighbor functions of <strong>spdep</strong>.</p>
<pre class="r"><code>coords <- cbind(clev.points$x, clev.points$y)</code></pre>
<p>Now we create distance based neighbors coordinate matrix and lower distance bound and an upper distance bound, which is used to define neighbors. We use <code>dnearneigh</code> to create the distance band neighbors. For more in depth information on distance based neighbors, please see the Disatnce Based Weights notebook. We use a distance of 4823.27 to emulate the first example in the GeoDa workbook.</p>
<pre class="r"><code>dist.band.nb <- dnearneigh(coords,0,4823.27)</code></pre>
<p>Using the <strong>spdep</strong> function `sp.correlogram, we can get measures of spatial autocorrelation for an input number of lag orders. We can then use the base R plotting function to get a look at the autocorrelation values for each lag order.</p>
<pre class="r"><code>sp <- sp.correlogram(dist.band.nb, clev.points$sale_price, order = 10, method = "I",style = "W", randomisation = TRUE, spChk = NULL, zero.policy = TRUE)
plot(sp)</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-38-1.png" width="672" /></p>
<p>To get a better looking plot, we can extract the moran’s I values and put them into a data frame, so we can use <strong>ggplot2</strong> plotting functionality.</p>
<pre class="r"><code>morans <- sp$res[,1]
df <- data.frame(Morans_I = morans,lags = 1:10 )
ggplot(data = df, aes(x=lags,y=Morans_I)) +
geom_point() +
geom_smooth(col = "purple", se = FALSE) +
geom_hline(yintercept = 0) +
ylim(-.5,.5) </code></pre>
<pre><code>## `geom_smooth()` using method = 'loess' and formula 'y ~ x'</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-39-1.png" width="672" /></p>
<p>To get closer to the GeoDa correlogram plotting functionality, we can convert lags to euclidean distance.</p>
<pre class="r"><code>df$euclidean_distance <- df$lags * 4823.3
ggplot(data = df, aes(x=euclidean_distance,y=Morans_I)) +
geom_point() +
geom_smooth(col = "purple", se = FALSE) +
geom_hline(yintercept = 0) +
ylim(-.5,.5) +
scale_x_continuous(breaks = df$euclidean_distance)</code></pre>
<pre><code>## `geom_smooth()` using method = 'loess' and formula 'y ~ x'</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-40-1.png" width="672" /></p>
<p>The spatial correlogram can be paired with a bar chart that shows the number of neighbor pairs for each lag order. To get this information, we will need to work outside the <strong>spdep</strong> package and compute them ourselves.</p>
<p>To begin, we set up an empty vector to store the pair numbers.</p>
<pre class="r"><code>pairs <- rep(NA, 10)</code></pre>
<p>Here we run <code>dnearneigh</code> on each interval of euclidean distance that corresponds to a lag in 1 to 10. To get the number of pairs for each lag order, we simply sum up the cardinality of the neighbor structure per each lag order and then divide it by two because this sum gives the total number of neighbors and the total number of pairs will be half this number.</p>
<pre class="r"><code>for (i in 1:10){
nb <- dnearneigh(coords, (i - 1) * 4823.28, i * 4823.28)
pairs[i] <- sum(card(nb)) / 2
}</code></pre>
<p>Now we create a data frame from the two vectors we create with the lag order values and associated euclidean distance values.</p>
<pre class="r"><code>df <- data.frame(lag_order = 1:10, auto_corr = morans, num_pairs = pairs)
df$euclidean_distance <- df$lag_order * 4823</code></pre>
<p>Here we create two different plots, one is a histogram with the number of pairs in each bin, the other is the spatial correlogram</p>
<pre class="r"><code>p1 <- ggplot(data = df, aes(x = euclidean_distance,y = auto_corr)) +
geom_point() +
geom_smooth(col = "purple", se = FALSE) +
geom_hline(yintercept = 0) +
ylim(-1,1) +
scale_x_continuous(breaks = df$euclidean_distance)
p2 <- ggplot(data = df, aes(x=euclidean_distance,y = num_pairs, fill = as.factor(euclidean_distance))) +
geom_bar(stat = "identity") +
scale_fill_brewer(palette = "Paired") +
theme(legend.position = "none") +
geom_text(aes(label=num_pairs), position = position_dodge(width = .9), vjust=-.25) +
ylim(0, 1.2 * max(pairs)) +
scale_x_continuous(breaks = df$euclidean_distance)
p1</code></pre>
<pre><code>## `geom_smooth()` using method = 'loess' and formula 'y ~ x'</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-44-1.png" width="672" /></p>
<pre class="r"><code>p2</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-44-2.png" width="672" /></p>
<p>Using <code>grid.arrange</code> from the <strong>gridExtra</strong> package, we can combine the two plots into one image.</p>
<pre class="r"><code>grid.arrange(p1,p2,ncol = 1)</code></pre>
<pre><code>## `geom_smooth()` using method = 'loess' and formula 'y ~ x'</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-45-1.png" width="672" /></p>
<p>Following the same process outlined above, we can make a function that constructs the correlogram based on the desired lag order, distance band, variable, and coordinates.</p>
<pre class="r"><code>geoda_correlogram <- function(lag.order, distance, var,coords){
# Funtion that outputs a spatial correlogram with a bar plot of neighbor pairs
# Inputs:
# lag.order: The desired number of lag orders to be included in the plot
# distance: The desired distance band for the lags
# var: A variable to analyze the spatial autocorelation
# coords: A matrix of coordinates of the same length as var
# creating vectors to store autocorrelation values and number of pairs
pairs <- rep(NA, lag.order)
#loop to calculate number of pairs for each lag order
for(i in 1:lag.order) {
nb <- dnearneigh(coords, (i-1) * distance, i * distance)
pairs[i] <- sum(card(nb)) / 2
}
# Computing spatial autocorrelation
nb1 <- dnearneigh(coords, 0 , distance)
sp <- sp.correlogram(nb1, var, order = lag.order, method = "I", style = "W", randomisation = FALSE, spChk = NULL, zero.policy = TRUE)
# Putting the lag orders, autocorrelation, pairs and distance into a dataframe
df <- data.frame(lag = 1:lag.order, num_pairs = pairs, auto_corr = sp$res[,1])
df$euclidean_distance <- df$lag * round(distance, digits = 0)
# Making plots
p1 <- ggplot(data = df, aes(x = euclidean_distance,y = auto_corr)) +
geom_point() +
geom_smooth(col = "purple", se = FALSE) +
geom_hline(yintercept = 0) +
ylim(-1,1) +
scale_x_continuous(breaks = df$euclidean_distance)
p2 <- ggplot(data = df, aes(x=euclidean_distance,y=num_pairs, fill = as.factor(euclidean_distance))) +
geom_bar(stat= "identity") +
scale_fill_brewer(palette = "Paired") +
theme(legend.position = "none") +
geom_text(aes(label=num_pairs), position = position_dodge(width = .9), vjust=-.25) +
ylim(0, 1.2 * max(pairs)) +
scale_x_continuous(breaks = df$euclidean_distance)
grid.arrange(p1,p2,ncol =1)
}
geoda_correlogram(10, 4823.3, clev.points$sale_price, coords)</code></pre>
<pre><code>## `geom_smooth()` using method = 'loess' and formula 'y ~ x'</code></pre>
<p><img src="global_spatial_autocorrelation_files/figure-html/unnamed-chunk-46-1.png" width="672" /></p>
</div>
<div id="interpretation-1" class="section level3">
<h3>Interpretation</h3>
<p>The top of the above graph is the actual correlogram. This depicts how spatial autocorrelation changes with distance. The first dot correpsonds with distances between 0 and 4823 feet. The dashed line indicates a spatial autocorrelation of 0. The autocorrelation starts positive and then fluctates above and below the dashed line.</p>
</div>
</div>
<div class="footnotes">
<hr />
<ol>
<li id="fn1"><p>University of Chicago, Center for Spatial Data Science – <a href="mailto:[email protected]">[email protected]</a>,<a href="mailto:[email protected]">[email protected]</a><a href="#fnref1">↩</a></p></li>
<li id="fn2"><p>Use <code>setwd(directorypath)</code> to specify the working directory.<a href="#fnref2">↩</a></p></li>
<li id="fn3"><p>Use <code>install.packages(packagename)</code>.<a href="#fnref3">↩</a></p></li>
</ol>
</div>
</div>
<script>
// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
}
$(document).ready(function () {
bootstrapStylePandocTables();
});
</script>
<!-- tabsets -->
<script>
$(document).ready(function () {
window.buildTabsets("TOC");
});
$(document).ready(function () {
$('.tabset-dropdown > .nav-tabs > li').click(function () {
$(this).parent().toggleClass('nav-tabs-open')
});
});
</script>
<!-- code folding -->
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>