Chapter 15 - Descriptive summary, design, and inference

15.1 More spatial analysis

Owing to my insistence on stressing that analysis is a core GIScience activity, we have already discussed many of the ideas introduced in this chapter, but the organization is clear and the examples and explanations are excellent. Again, this text need not be read word-by-word, but you should be familiar with its content so that you can refer to it as you work with GIS.

Every new release of ArcGIS adds new statistical tools, many of which are visible at
ArcMap > Tools > Graph > Create
as well as
Geostatistical Analyst > Explore Data.

Although the text denies a spatial component to Figure 15.2 it's likely that the yellow lines are regions that have been selected (highlighted) on a map (an example of a "What's at x?" question). Refer back to Figure 14.8 which shows two maps and two statistical visualizations, and could also show the underlying data matrix with selected objects highlighted in some way. These are methods only imagined a few decades ago.

15.2 Descriptive summaries

15.2.1 Centers

The technique of summarizing data with statistics is sometimes referred to as "data reduction" because a few numbers characterize many (see ArcMap's Attribute Table Statistics feature). Similarly the centroid and standard deviation circle can be used to summarize point data and the centroid to summarize the location of a polygon. (I'm not sure what corresponds to the standard deviation of a polygon, but it may be the moment of inertia.) Many of the tools presented in this chapter can be found at
ArcMap > Toolbox > Spatial Statistics > Measuring Geographic Distributions
You can access the Toolbox from its red icon on the Standard Menu.

Figure 15.4 is worth studying but may be confusing. Mark the points x = 1, 3, ..., 22 on the horizontal axis and draw a small triangle at x = 10.3 to see that the mean is where the sum of squared distances is minimized.

The discussion in Box 15.1 doesn't explain what would happen with different size weights. But it is a stark example of how positivistic (as opposed to "normative") planning will ignore ethical, esthetic, and emotional dimensions.

15.2.2 Dispersion

This course has a statistics prerequisite so that you have at least been introduced to the basic tools of data description. I've stressed that there is a strong resonance between analyzing the data of geography and studying the geography of data. Imagine an (x, y) data matrix such as any of the point data we've mapped so far. The center is just the point representing the means of the 2 coordinates and the standard deviation ellipse is very much like the dispersion around a correlation line.

To find spatial descriptors for ArcGIS point data, run Measuring Geographic Distributions > Mean Center. A screenshot shows the data, the highlighted tool (and tooltip), and the result (including a label with coordinates), as well as a standard deviation circle. Note as well that this graphic will fit on a report page!

15.2.3 and 4 Measures of pattern: unlabeled and labeled points

This is the simplest kind of geographic data: points in 2-D space (points on a line are more complicated because they are constrained, and a single point is boring). Scattered throughout the book are examples of this model, in particular Figure 3.11 Frames 1 and 2; and Figure 4.4 Frames A, B, and E. The easiest way to think about the process of point patterns is a continuum from clustered (all points at one location) through random to dispersed (all points at the vertices of a hexagonal arrangement). I strongly urge you to draw these on a piece of paper to get a sense of what's going on!

There are many ways to statistically test for the null hypothesis Ho: randomness v. the other 2 models. Among the easiest to implement is the nearest neighbor distance statistic, which in ArcMap also gives you a crude but useful picture of the continuum as well as the statistic testing for randomness. Figure 15.9 illustrates how my old friend Art Getis and Janet Franklin developed a way to compare patterns locally, illustrating that all statistics have local and global values: clustering is not only a property of all the points but also also a field with values everywhere.

These techniques can be applied to polygon data as well. The Moran statistic example in this section is not easy to understand, but if you return to the "toy" example of Box 4.4 you'll see the idea of regions and their boundaries that I represent as 8 vertices and (8 x 7)/2 = 28 possible edges connecting them (the numbers below the zero-diagonal in the matrix), of which 15 (the number of 1's) exist. I suggest you draw these edges on your book (or a copy of the page) and ponder this elegant model of network connectivity. To  reinforce this idea, see Figure 8.13 which is a graph representing a TIN, and for a more realistic example of a network (or graph, as it is known mathematically) see Figure 4.11 B.

15.2.5 Fragmentation and fractional dimension

Here we have yet another reference to spatial complexity and fractals, very common in nature and all data derived from real (as opposed to simulated) processes. The two figures are nice examples of objects and fields derived from underlying complexity (soils and images). Did I mention that I was a geographic fractal pioneer, with one of the first books on the subject, still in (re)print since 1993?

15.3 Optimization

A long and sometimes heated debate rages about the application of technical models to planning, with one side arguing for efficiency and the other arguing against cold logic. This book is full of examples of models that illustrate this tension. Box 15.4 is an example of a network optimizer that no doubt improves the profitability of Schindler Elevator but may constrain the technician from taking a leisurely lunch at MacArthur Park. (Incidentally, I was born and grew up where Melrose Avenue intersects the western edge of the map.) As this is a program on science and management, I suggest you read this section as an essay arguing the technocratic case. Finally, this section could conclude the chapter.

15.4 Hypothesis testing

If you need a discursive review of hypothesis testing, look over this section. The "toy" example illustrates the problem of interpolation, which was introduced in Section 14.4.4 (see Figure 14.25). Spatial autocorrelation (SA) - look it up in the index and online - is a fundamental concept of geographic analysis. In Figure 15.21 if all the points in the sample had the same value SA would be 1 and the value at B would be easy to interpolate. If, on the other hand, the points had values that didn't relate in any way to one another, SA = 0 and it would be impossible to predict the value at B.

The general null hypothesis is Ho: SA = 0, and there are many ways to test against it. Return to Figure 15.10 and ask whether nearby states have similar housing values. The Moran analysis illustrated in Figure 15.22 suggests that we can reject Ho at any significance level greater than 0.002. In other words, to use slightly colloquial language, there's about a 1/500 chance of getting a value of Moran's I = 0.4011 or greater from a random rearrangement of the 49 contiguous US states.

15.5 Conclusion

This chapter and the one before it constitute the analytical heart of the textbook and should be grasped as an introduction to spatial analysis and statistics. If you're interested in this stuff (as am I) there are many places you can go for more information. If it doesn't interest you very much, then try to at least be familiar with some of the concepts because they are essential to environmental science.