Chapter 4 - The nature of geographic data

You've been using ArcGIS, and this chapter will deepen your insights by introducing some of the basic concepts of GIScience that have evolved from a century of geographic research and thought:
Fortunately the book has highlighted statements - and lots of informative graphics - to reinforce these concepts and to serve as a reference.

However, you should also be aware that a lot of potentially new ideas are illustrated in this chapter using many graphics and boxes (with their own graphics). I suggest circling or underlining text references to these to help the two sides of your brain cooperate.

4.1 Introduction

I began using GIS by examining raster data, which as we've seen measures continuous fields, and I was somewhat disdainful of GIS as a whole (and I'm still quite skeptical). Since that time I've become more impressed with the scientific contributions GIS research has made, and this chapter reviews some of those concepts.

4.2 The fundamental problem revisited

It is useful to contrast space and time:

TIME SPACE
DIMENSIONS 1 1, 2 or 3
DIRECTIONS "forward" east/west
north/south
up/down
MOVEMENT Inevitable Voluntary
ANALYSIS, e.g.: Forecast Interpolate

The third row implies that we can only move in one direction temporally (though haven't you often wanted to move into the past?) but 3 directions spatially. The last row may be obscure. Frequently we are concerned about forecasting in time, the easiest approach to which is to fit a line through the data (say stock market closing prices) and extrapolate the line into the future. But in spatial analysis we are often interested in values near other values (what's the probability of finding a wetland near the coastline?) so we interpolate to estimate this.

Tobler's so-called Law (introduced in Section 3.1) is mentioned frequently throughout this book and by geographers in general. I know Waldo Tobler fairly well but I've never asked him how he feels about the notoriety that his law has brought. Though it isn't a scientific law (it's frequently violated!) it is a profound insight about the universe, and it makes interpolation possible.

In any case, spatial autocorrelation - which we shall meet often in this course - enables GIScientists to use simplification to represent the world, as modeled for example in Figures 3.15, 3.17, and many other places in both textbooks. We are making use of this fundamental principle in our ArcGIS analysis of the highly generalized 1937 Earhart/Noonan track. (When I was in the US Coast Guard we called this technique "dead reckoning.")

4.3 Spatial autocorrelation and scale

Box 4.1 needs an illustration (look at Figure 8.7) to show how 2-D (2-dimensional) polygons are built up of 1-D polylines, which are made by joining 0-D points. Of course the world isn't built of integer-valued objects; as any child knows (and Mandelbrot demonstrated) the universe is fractal at all scales. But ArcGIS can only handle integer-dimensional data for the moment.

Nevertheless, the analysis of just a little amount of spatial data can represent a very wide range of patterns, as demonstrated in Figure 4.1. To get a feel for spatial autocorrelation (which I'll abbreviate as SA), imagine you are standing in any one of the cells in the figure and looking around you. In frame A you're surrounded (in the cardinal NORTH/SOUTH/EAST/WEST directions) by cells of a different color. In most of the cells of Frame E your neighbors are like you. Frames B, C, and D are intermediate situations, and intermediate levels of SA. The figure would be more appropriately arranged in a row according to the values of their Moran's I statistics:
    A  -1.0      B  -0.4      C  0.0      D  0.4      E  0.9
so you can see increasing "smoothness" to the pattern (or, as Thomas Schelling demonstrated in 1971, increasing segregation).

4.4 Spatial sampling

Because a course in statistics is a prerequisite to this program, you should be familiar with some of the concepts reviewed here, especially ways of sampling. In environmental science we are always trying to model phenomena that often change rapidly in time and have low values of spatial autocorrelation, so a lot of attention is given to sample designs, some of which are illustrated in Figure 4.4. (which echoes Figure 3.11 and can be used to understand a pattern like that in Figure 15.8). Personally I am working with a USGS team examining landscape change using a kind of stratified sampling (which frame is that?) to select 10-km blocks from land cover data.

4.5 Distance decay

Like many concepts in geography, distance decay comes from physics, in this case the inverse-square law that demonstrates how a phenomenon emitted at a point spreads out over space (look it up at Wikipedia).

Think of the origin in Figure 4.7 as a point at which something measured and the x-axis as the distance d from the origin. Then the weight w represents how much importance to give to the data at the origin. Although we won't be doing this with ArcGIS, you could try it in a final project, and further examples are found in Section 14.4.4. (I'm not sure that the indices i and j add much to the explanation; they are probably the locations of the point from which distance is measured.)

The use of distance decay is illustrated in Figure 4.9 in which
    A) a few dozen measurements (X, Y, Z) are used first
    B) to estimate the .70 contour line, then
    C) all contours from .30 to 1.00 in increments of .1, and finally
    D) an isopleth map in which saturation (not hue!) is used to symbolize trend.
I think you'll agree that the final result is much more informative than the original table of Z values. Discuss could you put the text labels on the contours in a more attractive way?

Furthermore, the concept of "buffers" that are used in several of the ArcGIS exercises relate to this concept (look up the term in the index of the ArcGIS book).

The maps shown in Figure 4.10 illustrate choropleth mapping (see Wikipedia) and caution us against doing things like symbolizing (e.g. coloring) regions with counts. This problem will be  touched upon in Assignment #2.

4.6 Measuring distance effects as spatial autocorrelation

The frames of Figure 4.11 bear study: look at each one and then read the sentences on the previous page that relate to them. You may not follow the arguments closely at this reading, but they are real-world illustrations of A) point, C) line, C) polygon, and D) volume data. Each of these data models must deal with spatial autocorrelation in order to transform measurements at one dimension to models at another. (Incidentally, my own current research looks at graphs (a branch of mathematics illustrated in Frame B) to understand land cover rasters.)

Box 4.4 is a nice little "toy" example of measuring spatial autocorrelation that also illustrates graph theory (see Wikipedia). If you're willing to write in your book (and yes, most bookstores will buy back marked-in books), make a graph of Figure 4.12. First, create the vertices of the graph by circling each of the region's labels (1, 2, ..., 8). Next, create edges by drawing lines between any two regions that share a boundary (so there would be a line between 1 and 2 but not between 1 and 5). The Table 4.1 matrix is a simple way to represent the graph, in which binary weights represent edges. (If you don't want to write in the book, make a copy of the page.)

The point of the box is that SA is positive when neighboring regions have similar values and negative when they are dissimilar (look back at Figure 4.1). Positive SA makes the interpolation of Figure 4.9 possible.

Discuss how just the "upper triangle" of the matrix contains all of the information. How many edges are there?

4.7 Establishing dependence in space

Up to now we've been introducing spatial autocorrelation, and you may remember (I hope!) that (non-spatial) correlation
rX,Y
 is a number between -1 and +1 and can be analyzed using a regression model:
Ya + bX
that in two dimensions (X, Y) can be represented as a straight line, as in Figure 4.14.

In some ways SA is a simpler concept because we're just relating a variable to itself (hence the "auto" prefix) at varying distances: is the value of something "here" close to its values "there", and if we compare two values at different distances does the distance between the measurements seem to affect the difference in the values? If so, we have positive or negative SA, if not, zero SA.

I'm sorry that we seem to be using letters somewhat sloppily here: X sometimes represents an "independent variable" and other times "space," and Y is sometimes "dependent" variable and other times NORTH/SOUTH. Personally I try to stick to the S/T/P formulation in which X is SPACE, T is TIME and Z is the PHENOMENON being studied.

And so, again, the existence of SA allows us to interpolate, model, and make maps.

4.8 Taming geographic monsters

One of my favorite subjects (and a source of great beauty in modern science) is the fractal geometry of nature, which appears several places in the textbook, here in Box 4.6, which was foreshadowed in Figure 3.17.

If the topic interests you further, I'm ready with a lot of examples, some from my own research.

4.9 Induction and deduction and how it all comes together

As I hope you can see, I'm very impressed with the way that this chapter introduces and often elaborates many of the fundamental concepts of geography and GIScience, which not only borrows from mathematics, statistics, and physics but also is making original contributions to philosophy and natural science.