Chapter
10 - Creating and maintaining geographic databases
We are taking a few steps backwards in the textbook because this chapter relates (pun intended) more to GTKAGIS Chapter 9. This chapter also enlarges on a number of ideas introduced in Chapter 8, discussed in Unit 4.
10.1 Introduction
To
clear up what may be confusing language, here is a table that relates
the terms for the parts of a data matrix to the various topic areas (I
almost said "fields"). I try to use the generic terms from the
headings, referring to the rows and columns of a data matrix.
| DATA MATRIX |
ROW |
COLUMN |
DATABASE
|
record
|
field
|
ESRI
|
feature
|
attribute
|
| SOCIAL SCIENCE |
case
|
variable
|
| ENVIRONMENTAL SCIENCE |
sample
|
measurement
|
| STATISTICS |
observation |
variable |
Moore's
law relates to the exponential acceleration of chip speed, but there
must be another law that expresses how quickly we become accustomed to
equally rapid increases in database size. A few decades ago I worked
with tables of a few kilobytes, but now access images millions of times
larger.
10.2 Database management systems
The bullets at the beginning
of this section list fairly specialized capabilities, only a few of
which
interest us right now: data models, indices, and query languages. The rest of the section some might find rather dry.
10.3 Storing data in DBMS tables
This brief section reviews the kind of activity we did in GTKAGIS Chapter
9. The organization of a database is similar in its abstraction and
formality to the topology of spatial data. Data tables must relate to
one another in a predictable and efficient way just as the points of a
polyline and the lines enclosing a polygon must bear spatial
relationships of a certain kind.
Figure 10.2 STATE_FIPS is used to join Table B to Table A in order to create Table C.
The text discussing Figure 10.3
is a bit long-winded, but if you follow the lines in the figure you see
that Zoning Type and Owner address have been joined to Parcels.
10.4 SQL
I pointed out in GTKAGIS
that you used structured query language in Exercise 8b, and elsewhere.
It's a compromise between what is called a lower-level language like
C++, such as:
{int temp = x; x = y; y = temp; }
and a high-level language like English. You have used SQL in
ArcGIS, and it can be used in Access and a number of other
programs. Although it is very
powerful, its designers tried to keep it discursive enough so that any
reasonably brief statement could be understood.
10.5 Geographic database types and functions
Because
computers are so dumb, when cartographic symbols became digital spatial
data (I find the term "geospatial" redundant) geographic research
focused on unambiguous systems for organizing objects, such as the one
shown in Figure 10.5. There is
a hierarchy here that bears some relationship to that of the scale
hierarchy we know in cartography, and indeed the generalization problem
(moving from large scale to small) must relate to this object model.
A
cartographer would simply use her judgment, informed by tradition, to
create a useful and attractive map. Indeed, over 100 years the USGS
evolved elaborate rules developed and discussed by committees to
specify exactly how features were to be represented on topographic
maps. But a computer must use completely explicit algorithms to
represent, interrelate, and geneneralize map data. To you it might
be obvious that a point cannot contain a polygon (Figure 10.6A) but a computer needs a rule.
Beyond
relations are the simplest methods that can be used to transform
objects into other objects, either of the same or different kinds of
geometry. These methods should conform to common sense and experience
and do not differ among application areas.
You may well (and justifiably) find these discussions tedious; but
beyond recognizing how critical they are to environmental modeling,
think about how such thinking forces us to operationalize
the sloppy concepts of everyday discourse. Animals have moved
and made decisions in space for billions of years, but computers can't
help us unless we make what is intuitive completely explicit.
10.6 Geographic database design
This section is rather
technical because the text needs to address the needs of users,
designers, and GIS project planners. Suffice it to say that the lower
levels in Figure 10.8 will be
common to the application areas of the lower levels. You might think of
adding an even lower level consisting of the geometric objects
and their digital representations. This distinction is sometimes called
semantic (high-level) v. syntactic (low-level).
10.7 Structuring geographic information
Subsection 10.7.1 drills
down further into earlier topics, particularly that of Section 8.2.3; I
suspect that the topic has been sufficiently covered earlier for most
people. But the next sections worth perusing because it introduces
several elegant approaches not only to the problem of indexing but also
locating, addressing, and even generalizing large databases.
At least study Figure 10.13.
It represents two resolutions: Index level 1 is a 2×2 grid
that reports the objects in 3 of its 4 cells, while Index level 2 is a
finer 4×4 grid that reports objects in 9 of its 16 cells. The
fact that this fraction has decreased from .75 to .56 tells us
something about the fractal dimension of the objects. In other words,
this indexing technique not only helps organize data but also helps
describe data about complex real-world phenomena. Task How many cells would Index 3 have, and how many would be occupied by the two objects? What of Index 0?
| SIZE |
INDEX |
CELLS |
OCCUPIED |
big
↑
↓
small |
0 |
|
|
| 1 |
4 |
3 |
| 2 |
16 |
9 |
| 3 |
|
|
The
rest of the section is a variation on this "toy" example. You are
familiar with the overall idea from using long/lat coordinates as well
as the familiar labeled or numbered grid coordinates on a map. When
these systems are used at multiple scales you get schemes like
quadtrees.
10.8 & 9 Editing and maintenance of databases
If
you are doing environmental science with your own database (one you've
created or acquired for research) then keeping track of transactions
and versions will be of little concern; but if you are working in a
large organization where the data are subject to revision, then you can
expect that others will have made changes to the data that may affect
your results. If so, be aware of the issues raised in this section.