Chapter 15 - Descriptive summary, design, and
inference
15.1 More spatial analysis
Owing to my insistence
on stressing
that analysis is a core GIScience activity, we have already discussed
many of the ideas introduced in this chapter, but the organization is
clear and the examples and explanations are excellent. Again, this text
need not be read word-by-word, but you should be familiar with its
content so that you can refer to it as you work with GIS.
Every
new release of ArcGIS adds new statistical tools, many of which are
visible at
ArcMap > Tools > Graph > Create
as well
as
Geostatistical Analyst > Explore Data.
Although
the text denies a spatial component to Figure 15.2 it's
likely that the yellow lines are regions that have been selected
(highlighted) on a map (an example of a "What's at x?" question). Refer back to Figure 14.8 which
shows two maps and two statistical visualizations, and could also show
the underlying data matrix with selected objects highlighted in some
way. These are methods only imagined a few decades ago.
15.2 Descriptive summaries
15.2.1
Centers
The technique of summarizing data with
statistics is sometimes referred to as "data reduction" because a few
numbers characterize many (see ArcMap's Attribute Table
Statistics feature). Similarly the centroid and standard deviation
circle can be used to summarize point data and the centroid to
summarize the location of a polygon. (I'm not sure what corresponds to
the standard deviation of a polygon, but it may be the moment of
inertia.) Many of the tools presented in this chapter can be found at
ArcMap > Toolbox > Spatial Statistics > Measuring Geographic Distributions
You can access the Toolbox from its red icon on the Standard Menu.
Figure
15.4 is worth studying but may be confusing. Mark the
points x = 1, 3, ..., 22 on the horizontal axis and draw a small triangle
at x = 10.3 to see that the mean is where the sum of squared distances is
minimized.
The discussion in Box
15.1
doesn't explain what would happen with different size weights. But
it is a stark example of how positivistic (as opposed
to "normative") planning will ignore ethical, esthetic, and
emotional dimensions.
15.2.2
Dispersion
This
course has a statistics prerequisite so that you have at least been
introduced to the basic tools of data description. I've stressed that
there is a strong resonance between analyzing the data of geography and
studying the geography of data. Imagine an (x, y) data matrix such
as any of the point data we've mapped so far. The center is just the
point representing the means of the 2 coordinates and the
standard deviation ellipse
is very much like the dispersion around a correlation line.
To find spatial descriptors for ArcGIS
point data, run Measuring
Geographic
Distributions > Mean Center. A screenshot
shows the data, the highlighted tool (and
tooltip), and the result (including a label with coordinates), as well
as a standard deviation circle. Note as well that this graphic will fit
on a report page!
15.2.3 and 4
Measures
of pattern: unlabeled and labeled points
This
is the simplest kind of geographic data: points in 2-D space (points on
a line are more complicated because they are constrained, and a single
point is boring). Scattered throughout the book are examples of this
model, in particular Figure
3.11 Frames 1 and 2; and Figure 4.4 Frames A,
B, and E. The easiest way to think about the process of point patterns
is a continuum from clustered
(all points at one location) through random to dispersed (all
points at the vertices of a hexagonal
arrangement). I strongly urge you to draw these on a piece of paper to
get a sense of what's going on!
There
are many ways to statistically test for the null hypothesis Ho:
randomness v. the other 2 models. Among the easiest to implement is the
nearest neighbor
distance statistic, which in ArcMap also gives you a crude but useful
picture of the continuum as well as the statistic testing for
randomness. Figure 15.9
illustrates how my old friend Art Getis and Janet Franklin developed a
way to compare patterns locally,
illustrating that all statistics have local and global values:
clustering is not only a property of all the points but also also a
field with values everywhere.
These
techniques can be applied to polygon data as well. The Moran statistic
example in this section is not easy to understand, but if you return to
the "toy" example of
Box 4.4
you'll see the idea of regions and their boundaries that I represent as
8 vertices
and (8 x 7)/2 = 28 possible edges
connecting them (the numbers below the zero-diagonal in the matrix), of
which 15 (the number of 1's) exist. I suggest you draw these edges on
your book (or a copy of the page) and ponder this elegant model of
network
connectivity. To reinforce this idea, see Figure 8.13 which is
a graph representing a TIN, and for a more realistic example of a
network (or graph, as
it is known mathematically) see Figure
4.11 B.
15.2.5
Fragmentation and fractional dimension
Here
we have yet another reference to spatial complexity and fractals, very
common in nature and all data derived from real (as opposed to
simulated) processes. The two figures are nice examples of objects and
fields derived from underlying complexity (soils and images). Did I
mention that I was a geographic fractal pioneer, with one of the first books
on the subject, still in (re)print since 1993?
15.3 Optimization
A long and sometimes
heated debate rages about the application of technical models
to planning, with one side arguing for efficiency and the
other arguing against cold logic. This book is full of examples of
models that illustrate this tension. Box 15.4 is an
example of a network optimizer that no doubt improves the profitability
of Schindler Elevator but may constrain the technician from
taking a leisurely lunch at MacArthur Park. (Incidentally, I
was born and grew up where Melrose Avenue intersects the western edge
of the map.) As this is a program on science and management, I suggest
you read this section as an essay arguing the technocratic case.
Finally, this section could conclude the chapter.
15.4 Hypothesis testing
If you need a discursive
review of hypothesis testing, look over this section. The "toy" example
illustrates the problem of interpolation, which was introduced in
Section 14.4.4 (see Figure 14.25). Spatial autocorrelation (SA) - look it up
in the index and online - is a fundamental concept of geographic
analysis. In Figure 15.21
if all the
points in the sample had the same value SA would be 1 and the
value at B
would be easy to interpolate. If, on the other hand, the points had
values that didn't relate in any way to one another, SA = 0 and it
would be impossible to predict the value at B.
The
general null hypothesis is Ho: SA = 0, and there are many ways to test
against it. Return to Figure 15.10 and ask whether nearby states have
similar housing values. The Moran analysis illustrated in Figure 15.22
suggests that we can reject Ho at
any significance level
greater than 0.002. In other words, to use slightly colloquial
language, there's about a 1/500 chance of getting a value of Moran's I
= 0.4011 or greater from a random rearrangement of the 49 contiguous US
states.
15.5 Conclusion
This
chapter and the one before it constitute the analytical heart of the
textbook and should be grasped as an introduction to spatial analysis
and statistics. If you're interested in this stuff (as am I) there are
many places you can go for more information. If it doesn't interest you
very much, then try to at least be familiar with some of the concepts
because they are essential to environmental science.