-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathsessions.html
944 lines (837 loc) · 64.3 KB
/
sessions.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<title>Big Data for Regional Science - Sessions</title>
<link rel="stylesheet" href="stylesheets/styles.css">
<link rel="stylesheet" href="stylesheets/pygment_trac.css">
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body>
<div class="wrapper">
<header>
<h1> Special sessions </h1>
<h1> <i> Applications of new sources of (big) data in
Regional Science </i> </h1>
<h2> 61st NARSC Meetings </h2>
<h3><center> Washington, DC <br>
November 12-15, 2014</h3></center>
<ul>
<li><a href="index.html"><strong>Call</strong></a></li>
<li><a href="http://www.narsc.org/newsite/?page_id=62"><strong>Conference</strong></a></li>
<li><a href="sessions.html"><strong>Sessions</strong></a></li>
</ul>
</header>
<section >
<h1> </h1>
<b><u><a href="#s1">Session I</a> - <i> Time and location </i></u></b> </br>
- <i> Big, Data, Smart Cities and Research Infrastructure Innovation
</i>. Robert J. Stimson <a href="#stimson"> [abstract]</a></br>
- <i>Residential Foreclosure and Non-Housing Wealth</i>. Sharon
O'Donnell <a
href="#odonnell">[abstract]</a> </br>
- <i>Forecasting Regional Health Crises Using Google Trends</i>. Jason
Parker <a href="#parker">[abstract]</a> </br>
- <i>Fast Food Data: The Usefulness of Social Media Byproducts.</i>
David Folch <a href="#folch"> [abstract] </a> </br>
<b><u><a href="#s2">Session II</a> - <i> Time and location </i></u></b> </br>
- <i>Neighborhood Effects in a Behavioral Randomized Controlled
Trial</i>. Tammy Leonard <a href="#leonard">[abstract]</a> </br>
- <i>The Spatial Pattern of Inequality within Cities and its Relation
with the Local Economy</i>. Norbert Schanne <a
href="#schanne">[abstract]</a> </br>
- <i>Spatial and Social Frictions in the City: Evidence From Yelp</i>.
Ronald R. Davis
<a href="#davis">[abstract]</a> </br>
- <i>“The magic’s in the recipe” - Urban Diversity and Popular
Amenities</i>. Dani Arribas-Bel <a href="#arribas">[abstract]</a> </br>
<b><u><a href="#s3">Session III</a> - <i> Time and location </i></u></b> </br>
- <i>From 'Big Noise' to 'Big Data': a case study of cross-validation
between 3 large geographical datasets on visitor flows between
regional urban centres</i>. Robin Lovelace <a
href="#lovelace">[abstract]</a> </br>
- <i>Digital Neighborhoods</i>. Luc Anselin <a
href="#anselin">[abstract]</a> </br>
- <i>Sensitivity of Location-Sharing Services Data: Evidence from
American Travel Pattern</i>. Zhenhua Chen <a
href="#chen">[abstract]</a> </br>
- <i>A framework of Mapping Social
Connections in Space and Time</i>. Xinyue Ye <a href="#ye">[abstract]</a> </br>
<b><u><a href="#s4">Session IV</a> - <i> Time and location </i></u></b> </br>
- <i>New Data, New Applications: A Methon for Transportation System
Performance Monitoring.</i> Mohja Rhoads <a href="#rhoads">[abstract]</a> </br>
- <i>Mobile phone data and motorway traffic: can the former
predict the latter?</i> Emmanouil Tranos <a href="#tranos">[abstract]</a> </br>
- <i>Freight Deliveries Directly Generated by Residential Units:
An Analysis with the 2009 NHTS Data. Yiwei Zhou</i> <a href="#zhou">[abstract]</a></br>
- <i> Baltimore’s Post-recession Socioeconomic Environment and Local Job
Access for Work-eligible Temporary Assistance for Needy Families
(TANF) Recipients: a Locational Approach for Welfare-to-work
Examination.</i> Ting Zhang <a href="#zhang">[abstract]</a></br>
<h2><a id="s1"></a> Session I </h2>
<h3> <a id="stimson"></a> <i>Big, Data, Smart Cities and Research Infrastructure Innovation </i></br>
<u>Stimson, Robert J.</u> (University of Melbourne);
Pettit, Chris (University of Melbourne);
Sinnott, Richard (University of Melbourne) </h3>
<p>Advances in information technologies are opening new ways to approach research
and policy analysis for cities and regions. This is being driven in part by
what is now referred to as ‘big data’ and also by the emergence of policies
that are championing the ‘creative commons’ and ‘open data’. Harnessing the
opportunities presented by these innovations is being championed by what is
being referred to as ‘smart cities’. This paper overviews these developments
and then focuses on how innovations in building new research infrastructures
are starting to revolutionise the way urban and regional research might be
facilitated through a number of initiatives that are occurring around the
world. That includes the Australian Urban research Infrastructure (AURIN)
project which is taken as an example. AURIN is a A$24million initiative by the
Australian Government - led by the University of Melbourne but involving
universities, research institutions and data agencies across the country -
which is developing and operating a new national research infrastructure that
facilitates access to a wide range of types of data at multiple levels of
scale sourced from multiple sources with the on-line capability to integrate
those data and interrogate data using open source spatial and statistical
analysis and modelling e-research tools with visualization. An important
feature of the AURIN e-research infrastructure is the capability to enable
merit-based securitized access to unit record data and its integration with
spatial objective data with the researcher not being able to download the
individual data but conduct interrogation on-line and receive the results of
those interrogations thus ensuring protection of privacy. The paper presents
some of the applications that can be undertaken using the AURIN
e-infrastructure capabilities. That includes: havesting spatial data and
conducting econometric analysis and modelling; applying customised tools open
source developed such as a Walkability tool and a Planning What If? Tool; and
the integration of survey-based unit record data with spatial objective data.
The paper concludes by discussing some of the impediments faced in this
interface between ‘big data’, ‘smart cities’ and research infrastructure
initiatives which are challenges that need to be addressed. </p>
<h3> <a id="odonnell"></a> <i>Residential Foreclosure and Non-Housing Wealth</i> </br>
Coulson, Ed (Pennsylvania State University);
<u>O'Donnell, Sharon</u> (U.S. Census Bureau) </h3>
<p>The foreclosure literature describes a "double trigger"
mechanism that increases the probability that a homeowner will
face foreclosure. The double trigger is made up of a negative
economic shock (e.g, job loss, divorce) and the presence of an
underwater mortgage. In this paper, we examine the role of
non-housing wealth on foreclosure probabilities. Homeowners with
underwater mortgages have more housing choices if they have
sufficient non-housing assets that can be used to sell the home
and payoff the deficiency. For those with only housing assets, the
absence of non-housing assets intensifies the triggers.
The presence of non-housing wealth also contributes to foreclosure
probability if the homeowner uses the assets to purchase a second
home and default on the first. This is a form of strategic default
(""ruthless"" strategic default) exists if the homeowner has the
means to keep current on the mortgage of the first home and has
sufficient non-housing assets to pay off the deficiency but choose
to apply it to the down payment of a second home.
Analysis is based on a commingled file of 60 months of panel
survey data (2008 Survey of Income and Program Participation
(SIPP)) record linked at the address level with local government
data that document stages of foreclosure of a portion of the SIPP
mortgaged home. Data file contains monthly events of the
households synchronized with foreclosure events.
Initial analysis, based on 40 months of panel data, show an
association between homeowners in financial distress and defaults.
Homeowners that feel the strongest effect of the local housing
bubble on the value of their homes are at greater risk of default.
Using two measures of non-housing assets, analysis failed to find
any evidence between wealth and default.
The paper does not determine who are motivated to engage in
"ruthless" strategic default but empirical evidence suggests that
less than 2.5% homeowners have the means to attempt it.</p>
<h3> <a id="parker"></a> <i>Forecasting Regional Health Crises Using Google Trends</i> </br>
<u>Parker, Jason</u> (Michigan State University);
Loveridge, Scott (Michigan State University) </h3>
<p>Community behavioral health includes positive mental health and freedom from
substance abuse. The incidence of conditions such as depression and
alcoholism vary greatly from one place to the next and in a given location can
change over time; sometimes, such as in the case of crystal meth, the
incidence can grow rapidly. Community behavioral health outcomes are produced
by an array of resources that include formal and informal educational systems,
law enforcement, health care providers, and social service workers. Often
this large array of participants works in an uncoordinated fashion due to poor
information about emerging needs. Local decision-makers seem to ignore many
of the health data resources available to them, possibly because these
resources appear after the fact, and don’t function well in predicting
regional health problems. Before the advent of data science, the first sign
of a serious health crisis was the local news media. Today, Google Trends
provides new data that anyone can use to see patterns in search history at the
regional level, including searches for web sites with information about
illegal drugs and various mental health disorders. Because the information is
free of charge and offered in real time, rather than based on laborious
population sampling, multiple reminders to obtain a defensible survey response
rate, and data cleaning, using the new Google Trends data may also make it
possible to construct powerful localized predictors of health crises so that
regional planners, law enforcement officials, and social service agencies can
react much more quickly. The rich cyclicality of this data poses a serious
statistical problem and must be estimated using a new panel frequency domain
approach. The purpose of this paper is to demonstrate use of Google Trend
data in conjunction with more traditional covariates to provide accurate
forecast equations for regional health problems so that local decision-makers
can adjust their programs more quickly in response to emerging needs. The
covariates include measurements of unemployment, household income, local
interest rates, demographics, regional government spending, and inequality.
In our model, we control for local fixed effects and national common time
effects so that our analysis is independent of any region-specific and
year-specific differences. Any remaining cross-sectional dependence between
regions in the model can be accounted for by factor augmentation.</p>
<h3> <a id="folch"></a> <i>Fast Food Data: The Usefulness of Social Media Byproducts</i> </br>
Spielman, Seth (University of Colorado - Boulder);
<u>Folch, David C.</u> (Florida State University);
Manduca, Robert (Harvard University)</h3>
<p>The promise of “big” sets of user generated geographic content is that it
provides a new way to understand cities. From a data collection perspective
user-generated big urban data offers clear advantages: 1) it is available in
real time and has very short update cycles and 2) it is inexpensive to collect
in large part because it is often a byproduct of modern urban life. However,
from a data use perspective the arguments for big urban data are much less
compelling. There are real questions about the fidelity of big urban data to
“real” on the ground conditions. Like “fast food”, big data is cheap and
seemingly everywhere, but its quality is often suspect. Is big data a
replacement for “traditional” data sources, a compliment to traditional data
or simply a source of empty calories? Because big urban data sets are not
designed studies in the formal statistical sense it is difficult to measure
data quality. Thus much of the literature on big geographic data has focused
on epistemological questions about what constitutes “good” user-generated
content (Spielman 2014). These evaluations are further complicated when
spatial accuracy is not necessarily the primary goal of the data collection
process (Arribas-Bel, 2013). In this paper we examine the correspondence
between Yelp (www.yelp.com) data and an administrative dataset of restaurants
in the Phoenix, Arizona metro area produced by the Maricopa Association of
Governments (MAG). Our initial goal was to examine the fidelity of Yelp to the
“real world.” We assumed that the register of restaurants used for
metropolitan planning purposes would represent reality and that Yelp would
capture only some aspect of the “truth.” Instead we found little overlap
between the administrative and “big” urban data sets. Extensive direct
comparison revealed only about a third of restaurants in each dataset are
found in the other. This lack of correspondence could be caused by trivial
differences such as geocoding or spelling conventions, or it could indicate
substantive differences in the data. We therefore turned to an indirect
approach that compares the spatial distribution of the two datasets and found
that they differ in systematic and predictable ways. Ripley’s Kdiff function
(Bailey and Gatrell 1995), the difference between the expected number of
restaurants within each dataset as a function of distance, indicates that
across all spatial scales the Yelp data is significantly clustered relative to
the MAG data. Specifically, restaurants in the MAG database are spread fairly
evenly across the metro area while those in the Yelp data are more
concentrated in certain parts of metro Phoenix, most notably the downtowns and
upscale areas of Phoenix, Scottsdale, and Tempe. Further examination of the
locations with a high density of Yelp restaurants suggests that they share
some common traits. Areas with a high density of Yelp relative to MAG
restaurants are in more walkable areas as judged by their average WalkScore.
Additionally, logistic models using the US Census Longitudinal Employer
Household Dynamics Database (http://lehd.ces.census.gov) indicate that census
block groups with high probability of containing more Yelp than officially
listed restaurants tend to have larger numbers of low-wage workers and people
employed in the Arts, Entertainment, and Recreation sector and Accommodation
and Food Services sector. These characteristics, combined with qualitative
knowledge of the neighborhoods in question, begin to suggest that the Yelp
data does an especially good job of documenting dense, hip areas, frequented
by professionals, tourists, and the workers who serve them. MAG’s
systematically collected data, on the other hand, is underrepresented in these
areas, but is perhaps more complete in other parts of the metro area. Our
comparison of Yelp and MAG data highlights the strengths and weaknesses of
each. The two datasets had approximately the same number of restaurants, but
the Yelp data is far more detailed and comprehensive in certain areas of
Phoenix. It also may be better at documenting informal or family-run
businesses, and it certainly is updated more frequently than MAG. However,
Yelp does not represent the entire metro area equally, so analyses of Yelp
data alone would likely overstate restaurant concentration in downtowns, and
understate it in suburbs. MAG data may be less exhaustive in certain places,
but due to its systematic construction it is likely to be more consistent
across the entire region. Both datasets have a great deal of information to
contribute. When combined, administrative and user generated databases seem to
provide a more holistic and comprehensive picture of the world than either
would do by itself. Effective planning might use the more evenly spread MAG
data for metro-level research, supplementing it with Yelp data for detailed
analysis of the neighborhoods well served by Yelp. The different strengths of
the two datasets are just the most recent iteration of an established problem.
Datasets on businesses have almost always been generated as the byproduct of
some other commercial process. The nature of that process influences the form
of the final dataset. By combining features of multiple datasets generated
through different processes it may be possible to address the weaknesses of
each.</p>
<h2><a id="s2"></a> Session II </h2>
<h3> <a id="leonard"></a> <i>Neighborhood Effects in a Behavioral Randomized Controlled Trial</i> </br>
Pruitt, Sandi L.;
<u>Leonard, Tammy</u> (University of Dallas);
Murdoch, James;
Hughes, Amy;
McQueend, Amy;
Guptae, Samir </h3>
<p>Electronic Medical Records data are a (relatively) newly
available source of space, time data. Particularly for
regionally based large urban safety-net health systems, EMR
data can allow for novel insights into the provision of health
services for low-income under- and uninsured populations.
However, the use and interpretation of EMR data has largely
been unexplored and is quite challenging. The data were
collected for administrative purposes and are large.
Additionally, there is a high level of sensitivity around data
security and privacy when using EMR data. Despite these
challenges, we demonstrate the utility of EMR data to explore
geographic and social influences on the outcomes in a
Randomized Control Trial (RCT) by examining a RCT designed to
increase colorectal cancer (CRC) screening. Cadastral
geocoding of EMR address records was used to validate patient
addresses and to also append housing data from the count
appraisal district to the EMR records. Additionally, street
network data was used to calculate travel times to the health
clinics where health services were received. We found
statistically significant neighborhood effects. Most notably,
average CRC test use among neighboring study participants was
significantly and positively associated with individual
patient’s CRC test use. This potentially important
spatially-varying covariate has not previously been considered
in health-behavior RCTs. The implications are both empirical
and methodological. Empirically, we find that in the case of
the RCT examined, neighborhood effects, while significant, do
not modify the intervention effect size estimates.
Methodologically, our results contribute to the understanding
of neighborhood effects and RCTSs. RCTs of interventions
intended to modify health behaviors may be influenced by
neighborhood effects, which can impede unbiased estimation of
intervention effects. Our results contribute to the growing
literature suggesting that RCTs focused on individual behavior
should assess potential social interactions between
participants, which may cause intervention arm contamination.
</p>
<h3> <a id="schanne"></a> <i>The Spatial Pattern of Inequality within Cities and its Relation with the Local Economy</i> </br>
vom Berge, Philipp (Insitute for Employment Research);
<u>Schanne, Norbert</u> (Insitute for Employment Research);
Schild, Christopher-Johannes (IAB Institute for Employment Research);
Wurdack, Anja (IAB Institute for Employment Research) </h3>
<p>This paper investigates the intra-urban spatial structure of labour market
inequality. Comparing the spatial pattern of intra-urban inequality across
cities has been difficult so far because of the requirement of standardized
data collection at a very detailed spatial scale. The Research Data Centre
(FDZ) of the Federal Employment Agency in the Institute for Employment
Research (IAB) has recently accessed geocoded register data on the German
labour market which cover the entire workforce liable to statutory social
security and all working-age social benefit recipients. We use the year 2009
wave of this data to construct measures for labour market inequality at the
level of regular 500meter x 500meter grid cells, for example the local
proportion of low-wage employees. These are the ground our further analysis
bases on. We start our analysis with a case study on the three largest German
cities: Berlin, Hamburg, and Munich. Visualisation of the social inequality
measure at the level of grid cells forms a context for interpreting commonly
employed metrics on intra-urban social segregation and other spatial
structures in inequality. The three cities show distinctly shaped spatial
structures in social inequality. Besides this, they differ with regard to the
local economic development, the progress of structural change, and the policy
of providing subsidised residences. In order to generalise this case-study
evidence, we extend the analysis to cover all cities in Germany with more than
100,000 inhabitants. We establish quantitative relations between measures for
the shape of urban labour market inequality and the city size and growth, the
industry structure, structural change, and social policy. We discuss
instrumental variables which potentially allow for interpreting the estimated
correlations in a causal fashion. Preliminary results (for the cities with
population size over 500,000 persons) suggest a positive relationship between
Duncan’s Segregation Index and the median wage within a city; with regard to
other variables, relationships are less clear.</p>
<h3> <a id="davis"></a> <i>Spatial and Social Frictions in the City: Evidence From Yelp</i> </br>
<u>Davis, Donald R</u> (Columbia University);
Dingel, Jonathan I (University of Chicago);
Monras, Joan (Sciences Politique);
Morales, Eduardo (Princeton University) </h3>
<p>A city means much more to its residents than just home and work. We eat out.
We shop. We seek entertainment. We take advantage of the thousands of
opportunities the city provides. While these choices are fundamental to how we
use the city, they are also hard to observe. Surveys reveal what we say we do.
Time diaries record what we do for how long, but not where we go. Even new GPS
studies show where we go, but not why we go there or what other opportunities
were relevant. In order to address these issues, we need information about
individuals' residences and workplaces, places they go, and the alternatives
available but not chosen. We need to pay attention not only to spatial
frictions in the city but also to social frictions. And we need to be mindful
that the spatial and social frictions themselves may vary with the
characteristics of the residents. We construct the first data set that has all
the features required to examine this problem. The starting point is data from
the online user-generated review site, Yelp.com. In 2011, we downloaded all
reviews written by about 50,000 Yelp users who had reviewed a venue in New
York City. We randomly selected 25 percent of these users, jointly accounting
for about 645,000 reviews, for closer study. We used a combination of keyword
searches and close examination of review texts to identify approximate home
and work locations for a subset of these users. We combine these locations
with data on income levels and residential racial/ethnic composition from the
2000 Census of Population. To measure racial/ethnic demographic distances
between two census tracts, we calculate the Euclidean distance between the two
tracts' population shares for four racial and ethnic groups. To measure
segregation, we calculate the Echenique and Fryer (2007) spectral segregation
index for the modal race/ethnicity in each census tract. This particular index
has the property that a census tract is more segregated if it is surrounded by
more segregated tracts. We estimate travel times between home, work, and
venues as the public transit travel time between the centroids of New York
City census tracts from Google Maps. We restrict our estimation sample to
users with home and work locations in Manhattan in order to mitigate the issue
of transport-mode choice, since the large majority of Manhattan residents use
public transit. To describe crime rates, we use geographically precise NYPD
robbery statistics that we aggregate to the level of census tracts. We infer
users' genders from their profile photos on the Yelp web site. These data
allow us to estimate a discrete-choice model of restaurant visits with a rich
set of user and venue characteristics. They also allow us to examine the
separate spatial frictions owing to distance of venues from home and work. We
can relate characteristics of users to their willingness to enter areas of the
city with varying crime rates, incomes, and racial composition. This allows us
to understand how these social factors may act as barriers to movement and
commerce within the city. Using our estimated demand system, we construct
counterfactuals in which we examine how proposed transport infrastructure
additions or reversion to higher crime rates affect the degree of integration
(in many ways) of the city. Our preliminary results suggest influential roles
for travel times, demographic differences, crime rates, and user
characteristics. We find that the “demographic distance" between two locations
is as important as the travel time between them. Women in particular are
significantly less likely to visit venues in neighborhoods with high crime or
racial and ethnic demographics different from those of their own
neighborhood.</p>
<h3> <a id="arribas"></a> <i>“The magic’s in the recipe” - Urban Diversity and Popular Amenities</i> </br>
<u>Arribas-Bel, Dani</u> (University of Birmingham);
Bakens, Jessie (VU University, Amsterdam) </h3>
<p>This paper uses a novel source of (big) data to analyze the main factors
behind the popularity of urban amenities in The Netherlands. In particular, we
collect data from the location-based service Foursquare and employ it to
obtain a rich catalogue of restaurant locations, as well as a database of
other urban amenities. This, combined with traditional sources of
socio-economic data, allows us to estimate regressions at the area and venue
levels, uncovering the main determinants of the popularity of specific
restaurants as well as of entire areas or neighborhoods of a city. In doing
so, we contribute to the existing literature along three main dimensions: we
provide insight and new knowledge about urban systems, in particular about the
under-studied aspect of urban amenities; we demonstrate the use of a novel
source of data available to urban researchers as a byproduct (Arribas-Bel
2014) to improve the understanding of phenomena of interest not only to
researchers but to practitioners such as urban planners and business owners;
and we quantify, document and characterize some of the biases inherent to
these new sources of data in the context of urban applications. From an
economic point of view, cities have become not only agglomerations of
production, but also important consumption arenas (Glaeser et al. 2001).
Although this dual role is widely recognized by the literature, very little
research has been devoted to analyze and identify the mechanisms that lead to
attractive consumer cities. In other words, We know much more about the
ingredients (i.e. cultural amenities, the presence of green open space,
population composition) than about the recipe. How these elements are
internally combined to create a successful “consumer city” remains largely
uninvestigated. There are at least two possible reasons why this is the case:
first, previous studies usually consider cities in the aggregate and have thus
focused only on the elements, the ingredients, failing to recognize the
spatial arrangement within each urban area; second, but very much related, it
has been traditionally difficult to obtain spatially detailed data on revealed
preferences for urban amenities. During the last few decades, the world has
witnessed an explosion in computing power that has put a powerful computer in
the pocket of even non-experienced users. In parallel, location technology
such as the global positioning system (GPS) has also undergone dramatic
improvements and sharp drops in cost, enabling it to reach the consumer mass.
The combination of these two trends is producing a vast amount of
geo-referenced data, presenting many opportunities for research in the urban
realms. A prime example of this is the phenomenon known as location-based
services (LBSs), of which Foursquare is one of the main industry players.
These are online applications that allow users to broadcast their location in
real-time in what has come to be known as a checkin. The accumulation of this
form of metadata is producing databases that effectively store a digital
representation of some aspects of the world, as well as many traces of human
behaviour. We believe this can help fill the need for quantifiable measures of
revealed preferences about urban amenities. </p>
<h2><a id="s3"></a> Session III </h2>
<h3> <a id="lovelace"></a> <i>From 'Big Noise' to 'Big Data': a case study of cross-validation between 3 large geographical datasets on visitor flows between regional urban centres</i> </br>
<u>Lovelace, Robin</u> (University of Leeds);
Malleson, Nicolas (University of Leeds);
Birkin, Mark (University of Leeds);
Cross, Philip (University of Leeds) </h3>
<p>Much has been written about 'Big Data': definitions, characteristics, the
methodological challenges it poses (Boyd and Crawford, 2012). There has also
been speculation about how it may or may not revolutionise Regional Science
and related fields (Arribas-Bel, 2014). Amongst the excitement, there has been
little time to pause for thought and reflect about the kinds of application
where Big Data is most suited. Indeed, Big Data also has its critics (e.g.
Taleb 2012) and their arguments should be heeded to avoid the field being
tainted, for example, with the type of controversy that has engulfed national
spying agencies since the Snowden leaks, or similar concerns of the ‘Big
Brother’ variety which have set the patient records agenda back in the UK
(Ganesh, 2014). What is needed in this context, we argue, is not wide-eyed
speculation about an ambiguous concept of 'Big Data', but an honest appraisal
of the applications for which different kinds of emerging data sources may be
most and least useful. Specifically, with the growing volume of data
available, there has been a tendency to uncritically proceed with the
analysis, resulting in beautiful visualisations and new insights. Yet in many
cases careful evaluation of the quality of Big Data sources is lacking. It is
the aim of this paper to discuss how quality can be evaluated in the realm of
big data sources, by cross-validation. The theoretical underpinning of this
paper goes back to the definition of Big Data as information that is high in
volume, velocity and variety (Laney, 2001). Although this definition is
frequently mentioned in talks on the subject, rarely are the criteria which
constitute whether a dataset is 'Big' or not explored in detail. Furthermore,
the consequences of each attribute for the types of application for which Big
datasets are suited is rarely discussed. We thus start from the premise that
each of the aforementioned attributes of Big Data can provide advantages and
disadvantages to the researcher, in equal measures. One Big dataset may be
completely different from the next in terms of its relative merits. We thus
use three unrelated datasets for the empirical part of this study: Geotagged
Twitter data: The Twitter data were collected with the Twitter Streaming
Application Programming Interface (API), which provides 'live' access to
public messages posted to Twitter. Data were collected during 445 days between
2011-06-22 and 2012-09-09 in West Yorkshire. Mobile phone mast location data:
A mobile telephone service provider provided aggregated data on home location
as well as frequency and number of trips between major urban and retail
centres across Yorkshire. Individual geolocated survey data on shopping
habits: This dataset was provided by the consultancy Acxiom who do surveys
across the UK, collecting ~1 million records yearly. The data has a fine
spatial resolution (full postcode) and many attributes of interest for market
research. The method was to test each dataset as an input into a spatial
interaction model of movement between urban centres in Yorkshire, UK. In their
raw form, it was found that each dataset is of little value to the majority of
researchers, hence the term 'Big Noise'. It is only through a process of
cleaning (to ensure consistency), filtering (to remove extraneous information)
and aggregation that the raw datasets are transformed into a state that allows
direct comparison between them and with the results of a spatial interaction
model. We conclude by advocating a greater emphasis on these techniques of
'data tidying' in Big Data research as this seems to be a major bottleneck in
the field and an area where value researchers can add most value to noisy
information.</p>
<h3> <a id="anselin"></a> <i>Digital Neighborhoods</i> </br>
Anselin, Luc (Arizona State University);
Williams, Sarah (Massachusetts Institute of Technology) </h3>
<p>This paper investigates the spatial footprint of “digital neighborhoods,”
i.e., a concept of neighborhood derived from the content of geo-located and
time-stamped social media messages, which greatly extend the usual range of
local data available to urban and regional scientists. The messages pertain to
different types of contents and activities that tend to cluster in space and
over time. We are interested in using different spatial clustering techniques
to detect significant groupings and how these can be explained by underlying
socio-economic characteristics. In addition to the spatial dimension, we
examine the space-time distribution of messages during the day and over the
course of a week to assess the extent to which the digital neighborhoods are
dynamic across time and over space, and how this varies by type of message. We
base our analysis on two sources of social media data for a period in early
2014 in New York City. One is a sample of over 5 million Twitter messages
collected through February and March, of which close to 600,000 have
geographic coordinates that correspond to over 450,000 venues. The second is a
comprehensive set of Foursquare check-ins for the first week of February,
which similarly contains close to 600,000 observations, but for a much smaller
set of venues (65,000). In addition to the locations of the Twitter messages
and the Foursquare check-ins, we consider more than 300,000 business locations
from the comprehensive ESRI business data base. In our analysis, we take two
different perspectives. In one, we take the geography of N.Y.C. block groups
as the point of departure (n = 6454) and investigate the spatial and
space-time density of messages within this framework. Using a variety of
clustering methods (including measures of local spatial autocorrelation), we
identify block groups that form “digital hot spots” and “digital deserts.” The
former show much more digital activity than would be expected, given their
population share or share of the business locations. The hot spots are
dominated by Manhattan, but also include new up-and-coming areas, such as Long
Island City in Brooklyn, Williamsburg and Smith and Court Street. Digital
deserts are the opposite, block groups that are severely under-represented in
the digital world. We relate these patterns to socio-economic characteristics
of the block-groups in a series of spatial regressions. In the second
perspective, we take the location of the venues as the point of departure and
address clustering by means of association matrices, i.e., a type of distance
measure based on the similarity of check-ins among individuals, by type of
venue. This replicates the approach taken by the “Livehoods” project (Cranshaw
et al, 2012), but we also focus on the sensitivity of the obtained clusters to
the parameters chosen in the process of the clustering approach. In addition,
we investigate the dynamics of these clusters over the course of the day and
the day in the week. These digital neighborhoods help to highlight the
underlying economic dynamics of the matching geographic neighborhoods and tend
to have a higher diversity of businesses.</p>
<h3> <a id="chen"></a> <i>Sensitivity of Location-Sharing Services Data: Evidence from American Travel Pattern</i> </br>
<u>Chen, Zhenhua</u> (George Mason University);
Schintler, Laurie </h3>
<p>Location sharing services (LSSs) enable individuals to “check-in”
to locations via GPS-equipped devices, and to share this information
with friends in real-time. These services, and other related
applications, are generating a huge amount of passively collected data
on social and spatio-temporal behavior. Unlike traditional sources of
data, the information produced by LSS users has broad geographic
coverage; it is also rich in spatial and temporal detail. In fact, a
number of studies have already exploited this type of data to
understand different aspects of human and societal behavior, including
patterns of travel behavior. However, one concern about location
sharing services data, as with other sources of Big Data, is that it
is potentially biased. Users of such services tend to correspond to a
particular demographic – i.e., low to medium income males between the
ages of 19-29. Moreover, users can vary in the frequency with which
they report their locations. For example, some individuals may only
“check-in” when travelling long-distance, whereas others may do so on
more of a regular basis. There may also be a bias in terms of the
types of locations, or activities, that users report to their friends.
To complicate matters, there may be differences in the demographics of
users and their behavior across different location-sharing services.
These differences could relate, for example, to the relative
popularity of the services or stages of deployment. Without an
understanding of these issues and sensitivities, any social or spatial
behavior inferred from this type of data may end up being ad hoc,
inaccurate, or ambiguous. Thus, it is critical to understand who and
what is being represented by the data, and how these characteristics
differ across different services. In this study, we begin to explore
these issues. Specifically, the purpose of our study is three-fold:
1). to assess how well LSS data captures daily travel behavior
patterns; 2). to examine how sensitive the estimates are across
different location-sharing services; and 3). to develop a methodology
for processing location-sharing services data to derive information on
average daily travel behavior. For the purpose of the study, we focus
on two aspects of daily travel behavior: person miles of travel (PMT)
and daily person trips (DPT). The location-sharing services we examine
include Brightkite, Gowalla and Foursquare. We use the National
Household Travel Survey (NHTS) estimates of PMT and DPT as benchmarks
for the study. The analysis is conducted at the national level
(contiguous US) and for the top 51 most populated metropolitan areas
in the contiguous US. The study has five major findings: First,
estimates of travel behavior from LSS data are found to be more
accurate for populated rather than less-populated areas; Second, some
variations in daily travel behavior are found in LSS data, although
there are some consistencies, especially between Gowalla and
Foursquare. Third, Brightkite is the least accurate in terms of
representing daily travel behavior; Fourth, LSS data provides a better
estimation of daily person miles of travel than average daily person
trips; Lastly, discrepancies between the travel behavior inferred from
LSSs and those from the NHTS seem to correspond to the particular
demographics and travel characteristics of metropolitan areas.
Through the sensitivity analysis of three LSS data with a comparison
to the classical NHTS data, our results indicate that the accuracy of
estimation for PMT and DPT using LSS data is highly dependent on the
numbers of check-in records. Since metropolitan areas with high
population density tend to have a better representation of daily
travel pattern as compared to NHTS, the research findings suggest that
it would be more accurate and suitable to use LSS data for travel
behavior analysis with a focus on big metropolitan areas.</p>
<h3> <a id="ye"></a> <i>A framework of Mapping Social Connections in Space and Time</i> </br>
<u>Ye, Xinyue</u> (Kent State University);
Lai, Chih-Hui (Kent State University) </h3>
<p>Emergency events such as natural disasters often precipitate the
(re)activation of organized efforts in ways different from the normal times.
Not only for individuals, organizations, including relief and non-relief
related, often engage in intensive communicative action with individuals and
organizations for offering and acquiring support of any kind. After the 2010
Haiti Earthquake, Twitter has become an important emergency information and
communication backbone where individuals and organizations request and share
information for disaster relief within and outside the affected area. This
unique system of information and communication allows for the identification
of the dynamics of active and latent social connections as well as the
temporal shifts of resource allocation geospatially. These geospatial details
are either identified by the user in the text or automatically recorded by the
system. To advance societal understanding about the transferability of virtual
network systems into physical relief actions, this project aims to achieve
four goals. First, it will examine the patterns of the global network of
interorganizational communication on Twitter in two disaster contexts: 2012
New York/New Jersey Superstorm Sandy and 2013 Typhoon Haiyan in the
Philippines. These events are chosen because of their widespread impacts as
well as their geographical variations, which allows for the observation of
similar and divergent patterns of disaster relief. Analytically, this
longitudinal analysis is meant to generate geospatial representation of the
temporal change of the global virtual interorganizational network for each of
these disasters. Findings will help illuminate the geo-economic disparities of
organizational resource mobilization across disasters. Second, the analysis
will identify the factors that differentiate the clusters of actors involved
in different types of disaster relief around the world. As a result,
volunteering coordination can be made more effectively by collaborating with
relevant organizations falling within each cluster. Third, findings will
locate the latent network of organizational collaboration for emergency
response on a global scale. Predictions can be made about the timing and the
geography of such network being activated before, during, and after disaster.
These results will unveil the conditions and opportunities where the online
links can translate into the provision of physical resources. Fourth, this
research will reveal the broader patterns of global emergency and humanitarian
aid network. In addition to disaster relief, most non-governmental and
volunteer organizations are dedicated to multiple types of humanitarian aid.
Using three disasters as the starting point will help identify the mechanisms
of how virtual interorganizational network parallels or enhances the
interorganizational collaboration for humanitarian efforts. Methodologically,
supplementing the conventional survey and interview techniques, use of Twitter
data allows for a more systematic way of obtaining data on different types of
organizations (intergovernmental organizations, international, national, and
local non-governmental organizations). It also enables longitudinal
observation of the dynamic interaction among organizations of different types.
In sum, this project presents significant scholarly, practical, and policy
implications for disaster relief and humanitarian aid.</p>
<h2> <a id="s4"></a> Session IV </h2>
<h3> <a id="rhoads"></a> <i>New Data, New Applications: A Methon for Transportation System Performance Monitoring</i> </br>
Giuliano, Genevieve (University of Southern California);
<u>Rhoads, Mohja</u> (University of Southern California);
Chakrabarti, Sandip (University of Southern California) </h3>
<p>This paper is motivated by the availability of a new data source. We have
developed a data archive from the real-time data feed used for transportation
system monitoring in the Los Angeles region. This system, Regional
Integration of Intelligent Transportation Systems (RIITS), includes freeway,
arterial and public transit data produced by several state and local agencies.
The availability of detailed, historical data across modes and facilities has
obvious applications for transportation system modeling and simulation, but
also provide opportunities for developing new analytical tools for
transportation planning and management. This paper presents a method for
monitoring the regional transportation system.
Performance monitoring is an essential part of transportation planning and
system management, yet historically the cost and complexity of gathering
sufficient data and conducting performance analyses has limited regular
monitoring. Our data archive includes geo-spatial freeway, arterial and
transit operations data. The freeway and arterial data come from over 6,000
sensors across Los Angeles County. The transit information comes from a
combination of GPS devices and passenger counts from all Los Angeles Metro
transit bus and rail routes. The data are generated in intervals as short as
every 30 seconds, and all data are located by x-y coordinate. These data allow
us to sample across time, space and modes at almost any time-space interval.
In this paper we present our method for monitoring the highway system. The
transportation network is diverse. Within the highway system, highways range
from 2 lane rural roads to 12 lane urban freeways. We therefore use cluster
analysis based on functional attributes to group segments of the highway
system. Our cluster analysis yields three groups for the highway system.
Operational data is not uniformly available: some parts of the system are
more instrumented than others, and not all sensors report valid data. We
therefore develop a weighting scheme to generate representative performance
measures for each cluster group.
We illustrate our method using 30 days of data from the highway system. Our
results for highways using average speed, volumes, and variance as performance
measures show that performance varies significantly across clusters, time
periods and days of the week but different weighting schemes do not
significantly affect results. </p>
<h3> <a id="tranos"></a> <i>Mobile phone data and motorway traffic: can the former
predict the latter?</i> </br>
<u>Tranos, Emmanouil</u> (University of Birmingham) </h3>
<p>This paper aims to test the relationship between mobile
phone usage and motorway traffic. Can we use data from mobile
phone providers as a detector of motorway traffic? Such a
modelling exercise can provide a useful tool for transport
engineers as it will enable the (near) real-time estimation of
car traffic in specific segments of motorways using data from
mobile phone operators and avoiding the use of other more
expensive and less efficient surveying techniques. The case
study for our research is the city of Amsterdam. The data
utilized for this paper has been supplied by a major telecom
operator and provides aggregated information about mobile
phone usage at the level of the GSM cell for the year 2010.
The temporal dimension provides information at an hourly basis
creating a very detailed pool of data. Such a rich dataset
appears to be a ‘luxury’ for spatial analysts, but at the same
time increases the complexity of the analytical approach. In
addition, extensive datasets for motorway traffic using
detection loops as well as weather data are also used for this
paper. The richness of the mobile phone dataset will be
utilised in two ways. At a first level, the effect of car
traffic on mobile phone usage will be tested. The result of
this exercise will provide the basis of the analysis as it
will establish the relation between mobile phone usage and
motorway traffic. At a second step the mobile phone dataset
will be utilised in a more sophisticated way. Instead of using
only data regarding mobile phone usage (e.g. new phone calls
or erlangs), handovers will also be considered. The latter
contains information regarding the transfer of calls from one
GSM antenna to another. This usually happens when the mobile
phone user crosses the boundaries of a GSM cell and therefore
reflects movements in space. What is tested in the second step
is that a low rate of handovers in relative terms can be
related with bottle necks and traffic jams. Simply put,
handovers during a traffic jams are, in relative terms, less
than when roads are open. The latter will provide the main
contribution of the paper as it will introduce a rather simple
methodology to capture traffic jams at a (near) real time.</p>
<h3> <a id="zhou"></a> <i>Freight Deliveries Directly Generated by Residential Units:
An Analysis with the 2009 NHTS Data</i> </br>
<u>Zhou, Yiwei</u> (Rensselaer Polytechnic Institute);
Wang, Xiaokun (Rensselaer Polytechnic Institute) </h3>
<p>As a result of the rapid growth of online shopping, more
goods and services are delivered directly to residential units.
The door-to-door deliveries improve residents’ accessibility to
retailing sector, and at the same time create truck delivery
trips. However, partially due to the data limitation, most
existing freight research focuses on freight trips generated by
the multiple industrial sectors, little is known about freight
trips generated by residential units. As more and more urban
areas are pushing for dense and mixed development, it is
necessary to understand the pattern of truck freight trips
directly generated by residential units. For this paper,
dataset from NHTS is used to investigate the freight trips
generated by residential units. NHTS 2009 provides accurate,
comprehensive and timely information on trips, land use,
household characteristics and social economic factors. It is
the first time NHTS data is used to estimate freight trips. A
statistical model is established to explain freight trips
generated by residential units and discover influential
factors. Besides, the model is expected to predict freight
trips generated when applying to real residential units. A
negative binomial right censored model is used to identify the
impacts of influential factors such as housing density, type of
house and house ownership. An application is made to simulate
number of freight deliveries generated by residential units in
New York City. Results are compared with derived real business
freight trips data. To further validate simulation results, the
same model is applied to different education groups. The
simulated freight trips generated by residential units are
compared to those using full dataset. A closer examination at
the state level further discloses the spatial variation in
their relationship. Such a study will supplement city logistics
studies that traditionally focus on business behaviors, help
reconstruct the complete picture of freight activities in urban
areas.</p>
<h3> <a id="zhang"></a> <i>Baltimore’s Post-recession Socioeconomic Environment and Local Job Access for Work-eligible Temporary Assistance for Needy Families (TANF) Recipients: a Locational Approach for Welfare-to-work Examination</i> </br>
<u>Zhang, Ting</u> (University of Baltimore) </h3>
<p>This study examines the impact of Baltimore local community
socioeconomic environment and local job access on work-eligible
Temporary Assistance for Needy Families (TANF) Recipients’
welfare-to-work transfer propensity during the post-recession
period between July 2009 and December 2012. The data we use
include linked multi-agency micro level longitudinal
administrative record extracts from Maryland state government
and census block level public data from US Census Bureau,
American Community Survey, US Bureau of Labor Statistics, and
Baltimore City Police Department. We adopt a hierarchical
mixed-effect logistic regression, descriptive statistics, and
spatial econometrics to estimate the impacts. ArcGIS will also
be used to generate density maps, identify spatial hotspots and
compute job access. The local community socioeconomic
environment and the availability of the local jobs (defined by
the location weighted job-hotspot-to-home distance) are critical
to work-eligible TANF recipients’ employment outcome. This
evidence-based study will inform Baltimore City government and
local agencies, as well as Maryland State government agencies,
of further strategies to redesign the TANF related social safety
net services and service delivery in Central Baltimore area. The
findings will not only identify the importance of home location
and community environment to employment outcome and generate
implications to welfare, planning and transportation policies,
but also identify disparity of education, health, and family
responsibility across industries. The study will conclude with
policy implications and directions for future researches.
Differences in local labor market opportunities and local
socioeconomic community environment are critical to
work-eligible welfare recipients. The February 2008
Reauthorization of the TANF Program Final Rule defined personal
responsibility and serious effort to work expectations for
work-eligible welfare recipients. The access to local labor
market opportunities and local socioeconomic community
environment plays important roles in work-eligible TANF
recipients’ job finding propensity. Previous literature has
indicated the importance of local community socioeconomic
environment for employment outcomes. Previous literature has
also indicated that the long distance and commuting time to
local labor market often affects individuals’ employment outcome
for various reasons. Our study therefore hypothesize that local
community socioeconomic environment and distance between home
and potential job opportunities matter to TANF recipients’
welfare-to-work transfer propensity. Local community demographic
composition, income and poverty level, local transit conditions,
and crime level are important factors affecting TANF recipients’
job opportunities. The longer the distance between home and job
opportunities, the lower the odds for them to find a job and
this distance impact varies by industry. Considering the
demographics of our observing TANF recipients, we also
hypothesize child responsibility, lower education attainment,
poorer health are associated with lower odds to find a job.</p>
</section>
<footer>
<p><small>Theme inspired by <a href="https://github.com/orderedlist">orderedlist</a></small></p>
</footer>
</div>
<script src="javascripts/scale.fix.js"></script>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-6032674-1");
pageTracker._trackPageview();
} catch(err) {}
</script>
</body>
</html>