Chapter 9
- GIS data collection
This chapter is an overview of the problem of collecting data,
which is becoming less expensive in GIS. Nevertheless, you
need to
be clear about the three important forms of data:
- Data matrices containing spatial attributes
- GIS vector data that can be mapped directly
- Raster data that represents continuous fields
9.1 Introduction
The typology of data sources in Table
9.1
is fairly simple: data you collect yourself or get from other sources.
But a very important source of environmental data are simple data matrices
(often called "flat files") such
as this extract of rainfall for 30 California cities:
| ID |
NAME |
|
X |
Y |
RAINFALL |
| 1 |
Eureka |
|
254 |
650 |
39.6 |
| 2 |
Red Bluff |
|
362 |
602 |
233.3 |
| 3 |
Thermal |
|
726 |
185 |
18.2 |
.
.
. |
|
|
|
|
|
| 30 |
Colosa |
|
350 |
538 |
15.9 |
There are very easy tools for incorporating these data into most GIS -
but you can also do this with a spreadsheet and plot the data in 2D or
3D. As we shall see, however, you need some way of relating the data to
(x, y)
locations.
One aspect of GIS that is steadily improving is the ready availability
of data, either in raw or spatially referenced - even ArcGIS
shapefile - form. In September 2008 we were able to download
Arc data for Hurricane Hanna
from NOAA. Hence data capture costs for many purposes are approaching
zero, although you may be benefiting from quite expensive efforts on
the part of others.
Figure 9.1
is worth studying as a guide to your own research, although you
probably won't be digitizing. But GIS data in all their many forms can
quickly become bewildering if you don't organize them and delete what
you don't want.
9.2 Primary geographic data capture
Most of us will be dealing with vector data, but rasters
are the most common primary data source because there are so many
sensors, as illustrated in Figure
9.2, which may be confusing. Think of the plot as a generally L-shaped
"cloud" of sensors: large-pixel sensors at the bottom-right
capturing data frequently (think weather), and small-pixel sensors at
the top-left being used for special local purposes (think a plane
taking detailed photos of wetland vegetation). What's missing from
this already-full plot is spatial extent; the area being viewed.
A simple formula expresses this in general terms:
DATA VOLUME
= (SPATIAL EXTENT / SPATIAL RESOLUTION)D
where D is
the dimension of analysis (D
= 0 for a point sample, 1 for sampling along a line, 2 for a square
region, etc.). The idea is that data capture views a region of a
certain extent
(max - min coordinates) and divides it into cells of a given resolution. The
bigger the extent and the smaller the resolution, the more data. You
can extend this idea into the temporal realm as well, but at least time
has only one dimension! Data volume grows at a very rapid rate; a
substantial amount of the floor space of the sprawling USGS EROS
Data Center is devoted to storage.
And when you add to this formula the idea of multispectral
imagery - imagine sensors capturing images at multiple wavelengths in Figure 9.3, volume
grows even faster. (And we're not talking here about the even vaster
holdings of the military/government so-called
intelligence industry.)
One downside of our online course is that we cannot do fieldwork
together. These days it is fairly easy to connect a GPS
receiver to a laptop and take both outside to do real-time mapping,
resulting in a table of (x,
y, t) measurements
that can be plotted in a spreadsheet or GIS. Here for example are the
locations of a few of the GPS coordinates of the "boundary stones"
marking the limits of the US District of Columbia. Some of you could
probably do this with your cell phone!
|
LONG
LAT
77°
9'
34.07" W 38° 53' 24.44" N
77°
9'
48.60" W 38° 54'
5.69" N
77°
8'
46.20" W 38° 54' 42.52" N
77°
8'
45.68" W 38° 52'
7.49" N
77°
7' 6.14" W 38° 50'
58.92" N
|
9.3 Secondary geographic data capture
Every technical profession has its history of drudgery, but even I came
to large-scale GIS when digitizing was peaking as the main means for
capturing
spatial data. You can skim this section, but if you ever need to do
historical research, you may become familiar with these processes.
But give some attention to the discussion surrounding Figure 9.8,
particularly because it demonstrates how GIS supports both transformations among forms of representation,
and
movement between
fields (left) and
objects (right) and back again.
I do this all the time in my research. And
remember: everything becomes a pixel when it is viewed on a computer.
You should read the discussion of error (which supplements Chapter 6) to get a sense of how many
places it can arise in science (not only GIScience). This theme appears
in many places in the book and is a useful antidote to the notion that
data are always accurate. In general (cf. Figure 6.1) the idea
is that
MODEL
=
DATA
+ MEASUREMENT ERROR
= (REALITY + CONCEPTUAL ERROR) + MEASUREMENT ERROR
so there are many places for uncertainty to arise. The examples shown
here are practically illustrated by Chapter 16 of the ArcGIS book,
which are optional.
9.4 Obtaining data from external sources (data transfer)
Environmental science may be the most active area of so-called "data
transfer" in which data are shared among researchers and managers.
Look at Table 9.3
and check off the kinds of data you might want to include in a research
project. A web search on any of these keywords will bring up many hits.
Look at some of the URLs in Table
9.4 to investigate data of interest to your environmental
ideas.
Box 9.2 (War
story) When I was teaching urban planning at Boston University in the
early 1970s I invited Don Cooke to talk to my class about the arcane
topics of geocoding, DIME, and TIGER files. He stayed in New
England and became an innovator in a vast and prosperous industry; I
went to Africa to have adventure!
9.5-6 Capturing and managing data (and Conclusion)
As I've said at various points, you may not be managing a large GIS
project, but in this class you'll be conducting original research that
will require you to acquire and manage data. The exercises and
assignments are designed to give you experience in these areas; some of
the students in my classes use this research as a "pilot" for
later, more elaborate projects. Data collection probably won't be a
large part of your own research,
but you will spend some time acquiring data from secondary sources and
even more time managing and reshaping it for your purposes. Even at a
small scale you will become exposed to tradeoffs between speed,
quality, price, and scale, 2 dimensions of which were shown in Figure
9.2 and are sketched in Figure
9.17.