Chapter 4
- The nature of geographic data
You've been using ArcGIS, and this chapter will deepen your insights by introducing some of the basic
concepts of GIScience that have evolved from a century of geographic
research and thought:
- Scale
- Dimensionality
- Spatial autocorrelation and dependence
- Distance decay
- Statistical regression
- Fractals (again!)
Fortunately the book has highlighted statements - and lots of
informative graphics - to reinforce these concepts and to serve as
a reference.
However, you should also be aware that a lot
of potentially new ideas are illustrated in this chapter
using many graphics and boxes (with their own graphics). I
suggest circling or underlining text references to these to help the two sides of your brain cooperate.
4.1 Introduction
I began using GIS by examining raster data, which as we've seen
measures continuous fields, and I was somewhat disdainful of GIS as a
whole (and I'm still quite skeptical). Since that time I've become more
impressed with the scientific contributions GIS research has made, and
this chapter reviews some of those concepts.
4.2 The fundamental problem revisited
It is useful to contrast space and time:
|
TIME |
SPACE |
| DIMENSIONS |
1 |
1, 2 or 3 |
| DIRECTIONS |
"forward" |
east/west
north/south
up/down |
| MOVEMENT |
Inevitable |
Voluntary |
| ANALYSIS, e.g.: |
Forecast |
Interpolate |
The third row implies that we can only move in one direction
temporally (though haven't you often wanted to move into the past?) but
3 directions spatially. The last row may be obscure.
Frequently we are concerned about
forecasting in time, the easiest approach to which is to fit a line
through the data (say stock market closing prices) and extrapolate the
line into the future. But in spatial analysis we are often interested
in values near other values (what's the probability of finding a
wetland near the coastline?) so we interpolate to estimate this.
Tobler's so-called Law (introduced in Section 3.1) is mentioned
frequently throughout this book and by geographers in general. I know
Waldo Tobler fairly well but I've never asked him how he feels about
the notoriety that his law has brought. Though it isn't a scientific
law (it's frequently violated!) it is a profound insight about the
universe, and it makes
interpolation possible.
In any case, spatial autocorrelation - which we shall meet often in this course - enables GIScientists to use simplification to
represent the world, as modeled for example in Figures 3.15, 3.17, and
many other places in both textbooks. We are making use of this
fundamental principle in our ArcGIS analysis of the highly generalized
1937 Earhart/Noonan track. (When I was in the US Coast Guard we called this technique "dead reckoning.")
4.3 Spatial autocorrelation and scale
Box 4.1 needs an illustration (look at Figure 8.7) to show how 2-D (2-dimensional) polygons are built up of 1-D polylines,
which are made by joining 0-D points. Of course the world isn't built
of integer-valued objects; as any child knows (and Mandelbrot demonstrated) the universe is fractal at all scales. But ArcGIS can only handle integer-dimensional data for the moment.
Nevertheless, the analysis of just a little amount of spatial data can
represent a very wide range of patterns, as demonstrated in Figure 4.1.
To get a feel for spatial autocorrelation (which I'll abbreviate as
SA), imagine you are standing in any one of the cells in the figure and
looking around you. In frame A you're surrounded (in the cardinal
NORTH/SOUTH/EAST/WEST directions) by cells of a different color. In
most of the cells of Frame E your neighbors are like you. Frames B, C,
and D are intermediate situations, and intermediate levels of SA. The
figure would be more appropriately arranged in a row according to the
values of their Moran's I statistics:
A -1.0 B -0.4 C
0.0 D 0.4 E
0.9
so you can see increasing "smoothness" to the pattern (or, as Thomas Schelling demonstrated in 1971, increasing segregation).
4.4 Spatial sampling
Because a course in statistics is a prerequisite to this program, you
should be familiar with some of the concepts reviewed here, especially
ways of sampling. In environmental science we are always trying to
model phenomena that often change rapidly in time and have low values
of spatial autocorrelation, so a lot of attention is given to sample
designs, some of which are illustrated in Figure 4.4.
(which echoes Figure 3.11 and can be used to understand a pattern like
that in Figure 15.8). Personally I am working with a USGS team
examining landscape change using a kind of stratified sampling (which
frame is that?) to select 10-km blocks from land cover data.
4.5 Distance decay
Like many concepts in geography, distance decay comes from physics, in
this case the inverse-square law that demonstrates how a phenomenon
emitted at a point spreads out over space (look it up at Wikipedia).
Think of the origin in Figure 4.7 as a point at which something measured and the x-axis as the distance d from the origin. Then the weight w represents how much importance to give to the data at the origin. Although we won't be doing this with
ArcGIS, you could try it in a final project, and further examples are
found in Section 14.4.4. (I'm not sure that the indices i and j
add much to the explanation; they are probably the locations of the point from which distance is measured.)
The use of distance decay is illustrated in Figure 4.9 in which
A) a few dozen measurements (X, Y, Z)
are used first
B) to estimate the .70 contour line, then
C) all
contours from .30 to 1.00 in increments of .1, and finally
D) an
isopleth map in which saturation (not hue!) is used to symbolize trend.
I think you'll agree that the final result is much more informative
than the original table of Z values. Discuss could you put the text labels on the contours in a more attractive way?
Furthermore, the concept of "buffers" that are used in several of the
ArcGIS exercises relate to this concept (look up the term in the index
of the ArcGIS book).
The maps shown in Figure 4.10
illustrate choropleth mapping (see Wikipedia) and caution us against
doing things like symbolizing (e.g. coloring) regions with counts. This
problem will be touched upon in Assignment #2.
4.6 Measuring distance effects as spatial autocorrelation
The frames of Figure 4.11 bear
study: look at each one and then read the sentences on the previous
page that relate to them. You may not follow the arguments closely at
this reading, but they are real-world illustrations of A) point, C)
line, C) polygon, and D) volume data. Each of these data models must
deal with spatial autocorrelation in order to transform measurements at
one dimension to models at another. (Incidentally, my own current research looks at graphs (a branch of mathematics illustrated in Frame B) to understand land cover rasters.)
Box 4.4 is a nice little "toy" example of measuring spatial autocorrelation that also illustrates graph theory (see Wikipedia). If you're willing to write in your book (and yes, most bookstores will buy back marked-in books), make a graph of Figure 4.12. First, create the vertices of the graph by circling each of the region's labels (1, 2, ..., 8). Next, create edges
by drawing lines between any two regions that share a boundary (so
there would be a line between 1 and 2 but not between 1 and 5). The Table 4.1
matrix is a simple way to represent the graph, in which binary weights
represent edges. (If you don't want to write in the book, make a copy
of the page.)
The point of the box is that SA is positive when neighboring regions
have similar values and negative when they are dissimilar (look back at
Figure 4.1). Positive SA makes the interpolation of Figure 4.9 possible.
Discuss how just the "upper triangle" of the matrix
contains all of the information. How many edges are there?
4.7 Establishing dependence in space
Up to now we've been introducing spatial autocorrelation, and you may remember (I hope!) that (non-spatial) correlation
rX,Y
is a number between -1 and +1 and can be analyzed using a regression model:
Y = a + bX
that in two dimensions (X, Y) can be represented as a straight line, as in Figure 4.14.
In some ways SA is a simpler concept because we're just relating a
variable to itself (hence the "auto" prefix) at varying distances: is
the value of something "here" close to its values "there", and if we
compare two values at different distances does the distance between the
measurements seem to affect the difference in the values? If so, we
have positive or negative SA, if not, zero SA.
I'm sorry that we seem to be using letters somewhat sloppily here: X
sometimes represents an "independent variable" and other times "space,"
and Y is sometimes "dependent" variable and other times NORTH/SOUTH. Personally
I try to stick to the S/T/P formulation in which X is SPACE, T is TIME
and Z is the PHENOMENON being studied.
And so, again, the existence of SA allows us to interpolate, model, and make maps.
4.8 Taming geographic monsters
One of my favorite subjects (and a source of great beauty in modern
science) is the fractal geometry of nature, which appears several
places in the textbook, here in Box 4.6, which was foreshadowed in Figure 3.17.
If the topic interests you further, I'm ready with a lot of examples, some from my own research.
4.9 Induction and deduction and how it all comes together
As I hope you can see, I'm very impressed with the way that this
chapter introduces and often elaborates many of the fundamental
concepts of geography and GIScience, which not only borrows from
mathematics, statistics, and physics but also is making original
contributions to philosophy and natural science.