Chapter 6 - Uncertainty

Because we haven't come to the ArcGIS exercises dealing with projections,  we skip Longley et al Chapter 5 on georeferencing, although it is an important subject, so you should glance at the content.

This dense chapter has a lot of new concepts, and may be difficult to grasp, especially as you'll be preoccupied with Assignment #1. Use my highlights below as a guide to the ideas, and return to the discussion later in the course as you need.

6.1 Introduction

In all my classes I stress the SPACE/TIME/PHENOMENON paradigm, integrated by the concept of SCALE, which appears throughout this textbook (see the index!). This chapter adds the quality of uncertainty to the paradigm. Measurement can only be done within specified spatial, temporal, and phenomenological scale ranges; and it always involves uncertainty. 

6.2 U1: Uncertainty in the conception of geographic phenomena

Figure 6.1 suggests how uncertainty inevitably creeps into analysis. In my own research the USGS is interested in understanding land cover (LC) change:

U1: How is LC defined?
U2: How accurately can LC be measured?
U3: How reliable are forecasts of future LC?

Any phenomenon of interest to environmental analysis will entail successive increments of uncertainty - and the more important and controversial the problem, the more uncertainty clouds the picture. The figure may help your understanding of the inductive process of modeling the real world; variations on this image appear again.

A clear example of how GIS can provide an apparently concrete representation of an inherently ambiguous concept is shown in Figure 16.7 in which spatially definite boundaries are shown at 7 values of a "Pollution Potential Index." Consider some of the many abstractions and operations are used to build a model: 

  1. A finite set of measurements from 
  2. A necessarily incomplete sample
  3. Differentially weighted to produce 
  4. An equation that is 
  5. Interpolated using some spatial model run at 
  6. A specific scale then 
  7. Visualized at a given resolution.
Clearly there are many steps to be taken, each of which introduces uncertainty.

Imagine a simple example in which you are trying to predict how many people from a sample of 300 people are beer drinkers (assuming you know what that means in the first place!). The simplest null hypothesis (Ho:50%) is 150, but looking at only the last column of the study below you would say 120. Moreover, if you knew that 100 of the sample were men you could use the column proportions to predict 40 (try it). Every one of these models uses assumptions, definitions, classifications, and statistical models to predict numbers that will almost certainly be incorrect.

DRINK BEER?   MEN  WOMEN TOTAL
YES    80 40 120
NO    20 160 180
TOTAL    100 200 300

The problem with maps is that two of the visual dimensions are taken up by the spatial coordinates (EAST, NORTH) so that there is no way to represent uncertainty without a third dimension.

Finally, consider another problem in modeling. A plot of world population could be forecasted linearly from the past few decades or exponentially over the 20th century; each model would give very different results (TASK with a pencil, compare rough estimates). The y dimension of the plot can be used to represent not only forecasts but also error bars, but a 2D map cannot do this.

A simple example of fuzzy logic can be visualized in Figure 6.9 and note that this coastline model doesn't take time into account (even if you're not a kayaker like me, imagine how waves, the tides, and sea level change can alter a coastline). You can also see how important scale in geographic representation is: imagine the raster at a lower resolution (say cells twice or four times the size shown). Uncertainty can also be uncertain!

The effect of scale on uncertainty is illustrated in Table 6.1 which shows that aggregation (fewer areas) may obscure complexity. This is treated at length under the concept of spatial autocorrelation. NOTE as well that the figure though small is easy to read.

Even if you're not a geographer, you may have taken an introductory (world, human, etc) geography course that featured maps like those of Figure 6.7 that use vector lines or contours (objects) to delineate boundaries between regions that are known to be poorly understood and even unmeasurable. For purposes of discussion these conventions are tolerable, but if one is trying to make environmental decisions, the representations obviously need to be refined. Even Figure 6.6 looks rather vague and "fractal" but we know that with a georeferenced GIS the location of each pixel could be identified, even though the real world condition at that location might be neither bare, meadow, nor forest.

6.3 U2: Uncertainty in measurement and representation

GIS can be applied to many situations and if casually used can give very definite answers to even the vaguest questions: where are the elephants, what is the value of the harvested timber, is the density of potential air pollution sites higher downtown...?  So a representation like that of Figure 6.9 is a useful reminder that, for example, movement to the northeast represents travel in an uncertainty field rather than across a definite boundary. All maps should be treated this way, but to do so requires more care and insight  than users can afford.

This problem is confounded when the phenomenon itself is poorly defined, as with land use, vegetation, or other nominal value. The so-called "confusion matrix" in Table 6.2 should be read as the results of 304 fields samples from a 5-class raster map. For example, 106 pixels (or small blocks of uniformly-classified pixels) were selected from class A, of which 80 were actually found to be of class A and 26 of the other classes shown, and so forth. It is easy to calculate the percent of the samples that the map classified as correct (Task what is this percent?) and which classes are most frequently correctly classified, etc. But what if the classes themselves are ill-defined? For example, is there a universally accepted definition of what constitutes "Mojave Desert" as in the DMoj vegetation cover in Figure 6.10? Certainly there is no such class that applies to thousands of km2.

The
The discussion of uncertainty in this section is worth recasting in a simple way. Consider various ways of representing the number Pi:

A    3.141592653589793238462643383279502
B    3.14         is accurate but not precise
C    3.141593 is more precise
D    4.141593 is also precise but not accurate, and
E    314          is neither precise nor accurate.
The first number would work if you wanted to measure the circumference of the universe to the nearest millimeter, but for some purposes B or C might suffice. On the other hand, D is misleading and even dangerous, and E is useless (not even misleading).

This discussion should be considered in reinterpreting Figure 6.11 in which Frame A is more reliable but Frame B more accurate. The discussion also lies behind my objection to the way that ArcGIS treats the legends in labels with far more precision than is usually warranted; so you need to reformat the labels to correspond with defensible levels of precision.

Figure 6.13 is a variation on the simpler Figure 6.9, and difficult to interpret without a scale or the source map. Imagine crossing the region on a transect: where the terrain is steep you will pass over the 350 m contour suddenly, so there is less uncertainty, but where the slope is gentle it may be difficult to determine where the 350-m elevation is found.

This section makes further references to spatial autocorrelation (SA, introduced in Section 4.6), the tendency of near measurements to be more alike than far measurements. High spatial autocorrelation (a relatively "smooth" variation in measurements) lessens error; if a phenomenon is varying wildly with distance then it will require more sampling to model. Look back at and think about Figure 4.9; if all the measurements were the same -  say around 60 - then a flat surface would fit the data, but as complexity increases the need for more data and careful modeling becomes more critical.

Task Order the four simulations from least to most spatially autocorrelated. Look at the answer when you're done.

6.4 U3: Uncertainty in analysis

Errors compound as we operationalize and measure a phenomenon, and further arise as we analyze. To remind you of the simple notion of uncertainty, contemplate four histograms each of which represents the same set of data: a variable with a mean of 6 and a standard deviation of 1. The differences in the plots are due to the breaks, not the data. This is perhaps the simplest example of how simulation can be used to explore uncertainty. (Figure 6.15 is a more complicated simulation that may be rather difficult to interpret because the differences are so subtle.)

The problem of conflation (which we saw in the comparison of the Baltimore street vector v. DEM data, is a fairly narrow and technical example of uncertainty, shown in Figure 6.18. In the 1990s when the USGS and the US Census Bureau were experimenting with offering a single source of GIS data we saw many examples of poor conflation: the USGS data were spatially quite accurate and the Census data were temporally accurate up to the decade. In any case, the marriage was never consummated...

Figure 6.17 is a nice "toy" illustration of uncertainty in choropleth mapping, but not, I think, a very apt example of the ecological fallacy (look it up) unless it is more baldly interpreted. If you look only at the last two maps you would see a correlation between ethnicity and unemployment, but you wouldn't be justified in saying that a person of Chinese ethnicity would be unemployed. Even an inference that the incidence of unemployment among the Chinese ethnic group isn't well-founded. In any case, this kind of visual analysis is common in geography and because maps are so accessible, they easily lead to the attribution of individual processes from regional patterns.

6.5 Consolidation

Those of us who used computers decades ago used to warn people that just because the result came from a computer didn't mean the "printout" was accurate; and now of course almost everyone has had enough experience with computational error not to trust machines as much as was once the case. But only a few of us even now appreciate how all maps must lie: some reflecting the intentions of dishonest cartographers, but most simply reflecting the accretion of error through the analytical process. Only the real world - whatever that may mean - is where the "truth" can be found.