Geographic Information Systems and Science

Chapter 10 - Creating and maintaining geographic databases

We are taking a few steps backwards in the textbook because this chapter relates (pun intended) more to GTKAGIS Chapter 9. This chapter also enlarges on a number of ideas introduced in Chapter 8, discussed in Unit 4.

10.1 Introduction

To clear up what may be confusing language, here is a table that relates the terms for the parts of a data matrix to the various topic areas (I almost said "fields"). I try to use the generic terms from the headings, referring to the rows and columns of a data matrix.


DATA MATRIX	ROW	COLUMN
DATABASE	record	field
ESRI	feature	attribute
SOCIAL SCIENCE	case	variable
ENVIRONMENTAL SCIENCE	sample	measurement
STATISTICS	observation	variable

Moore's law relates to the exponential acceleration of chip speed, but there must be another law that expresses how quickly we become accustomed to equally rapid increases in database size. A few decades ago I worked with tables of a few kilobytes, but now access images millions of times larger.

10.2 Database management systems

The bullets at the beginning of this section list fairly specialized capabilities, only a few of which interest us right now: data models, indices, and query languages. The rest of the section some might find rather dry.

10.3 Storing data in DBMS tables

This brief section reviews the kind of activity we did in GTKAGIS Chapter 9. The organization of a database is similar in its abstraction and formality to the topology of spatial data. Data tables must relate to one another in a predictable and efficient way just as the points of a polyline and the lines enclosing a polygon must bear spatial relationships of a certain kind.

Figure 10.2 STATE_FIPS is used to join Table B to Table A in order to create Table C.

The text discussing Figure 10.3 is a bit long-winded, but if you follow the lines in the figure you see that Zoning Type and Owner address have been joined to Parcels.

10.4 SQL

I pointed out in GTKAGIS that you used structured query language in Exercise 8b, and elsewhere. It's a compromise between what is called a lower-level language like C++, such as:

{int temp = x; x = y; y = temp; }

and a high-level language like English. You have used SQL in ArcGIS, and it can be used in Access and a number of other programs. Although it is very powerful, its designers tried to keep it discursive enough so that any reasonably brief statement could be understood.

10.5 Geographic database types and functions

Because computers are so dumb, when cartographic symbols became digital spatial data (I find the term "geospatial" redundant) geographic research focused on unambiguous systems for organizing objects, such as the one shown in Figure 10.5. There is a hierarchy here that bears some relationship to that of the scale hierarchy we know in cartography, and indeed the generalization problem (moving from large scale to small) must relate to this object model.

A cartographer would simply use her judgment, informed by tradition, to create a useful and attractive map. Indeed, over 100 years the USGS evolved elaborate rules developed and discussed by committees to specify exactly how features were to be represented on topographic maps. But a computer must use completely explicit algorithms to represent, interrelate, and geneneralize map data. To you it might be obvious that a point cannot contain a polygon (Figure 10.6A) but a computer needs a rule.

Beyond relations are the simplest methods that can be used to transform objects into other objects, either of the same or different kinds of geometry. These methods should conform to common sense and experience and do not differ among application areas.

You may well (and justifiably) find these discussions tedious; but beyond recognizing how critical they are to environmental modeling, think about how such thinking forces us to operationalize the sloppy concepts of everyday discourse. Animals have moved and made decisions in space for billions of years, but computers can't help us unless we make what is intuitive completely explicit.

10.6 Geographic database design

This section is rather technical because the text needs to address the needs of users, designers, and GIS project planners. Suffice it to say that the lower levels in Figure 10.8 will be common to the application areas of the lower levels. You might think of adding an even lower level consisting of the geometric objects and their digital representations. This distinction is sometimes called semantic (high-level) v. syntactic (low-level).

10.7 Structuring geographic information

Subsection 10.7.1 drills down further into earlier topics, particularly that of Section 8.2.3; I suspect that the topic has been sufficiently covered earlier for most people. But the next sections worth perusing because it introduces several elegant approaches not only to the problem of indexing but also locating, addressing, and even generalizing large databases.

At least study Figure 10.13. It represents two resolutions: Index level 1 is a 2×2 grid that reports the objects in 3 of its 4 cells, while Index level 2 is a finer 4×4 grid that reports objects in 9 of its 16 cells. The fact that this fraction has decreased from .75 to .56 tells us something about the fractal dimension of the objects. In other words, this indexing technique not only helps organize data but also helps describe data about complex real-world phenomena. Task How many cells would Index 3 have, and how many would be occupied by the two objects? What of Index 0?

SIZE	INDEX	CELLS	OCCUPIED
big ↑ ↓ small	0
	1	4	3
	2	16	9
	3

The rest of the section is a variation on this "toy" example. You are familiar with the overall idea from using long/lat coordinates as well as the familiar labeled or numbered grid coordinates on a map. When these systems are used at multiple scales you get schemes like quadtrees.

10.8 & 9 Editing and maintenance of databases

If you are doing environmental science with your own database (one you've created or acquired for research) then keeping track of transactions and versions will be of little concern; but if you are working in a large organization where the data are subject to revision, then you can expect that others will have made changes to the data that may affect your results. If so, be aware of the issues raised in this section.