Maps in R – Examples

The examples here use several libraries, datasets, and shapefiles that should be downloaded and/or installed, and read in before proceeding. This will make the examples easier to follow, but they will fail if the data sets do not exist or have not been read in. For one of the example data sets, Oregon climate-station data, the data are available in two forms–as a .csv file (orstation.csv), and included as part of the orstations shape file. as a consequence, the individual variables will be referred to by their “full” names (e.g. orstationc$tann).

Required libraries:

maptools (and sp), rgdal, RColorBrewer, classInt, maps, rgeos

Shapefiles:

Data sets (rectangular data sets or data frames:

The easiest way to get all of these data files and shape-file components, is to download a copy of the workspace geog495.RData and “load” it, or read it in. Otherwise, all of them are available to download from the GeogR data page.

[geog495.RData]

Right-click on the above link, and save it to your working folder. (If you’ve forgotten where that is, type getwd() on the command line). Then read it in to R:

load("geog495.RData")

Load the libraries:

library(rgeos)
## rgeos version: 0.3-8, (SVN revision 460)
##  GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921 
##  Polygon checking: TRUE
library(maptools)     # loads sp library too
## Loading required package: sp
## Checking rgeos availability: TRUE
library(RColorBrewer) # creates nice color schemes
library(classInt)     # finds class intervals for continuous variables

Before begining, examine the “structure” of one of the shape file:

Examine the structure and contents of orcounty.shp shapefile. Each of the different approaches yields a slightly different perspective on the shapefile and its data (and all produce a lot of output, which is not echoed here).

summary(orcounty.shp)
attributes(orcounty.shp)
attr(orcounty.shp,"polygons")

Another version of the attributes() function provides information about the attached data (or attributes in the ArcGIS sense). Note the use of the “@” sign to indicate what part of the shape file we want to examine:

attributes(orcounty.shp@data)
## $names
##  [1] "NAME"       "STATE_NAME" "STATE_FIPS" "CNTY_FIPS"  "FIPS"      
##  [6] "AREA"       "POP1990"    "POP1997"    "POP90_SQMI" "HOUSEHOLDS"
## [11] "MALES"      "FEMALES"    "WHITE"      "BLACK"      "AMERI_ES"  
## [16] "ASIAN_PI"   "OTHER"      "HISPANIC"   "AGE_UNDER5" "AGE_5_17"  
## [21] "AGE_18_29"  "AGE_30_49"  "AGE_50_64"  "AGE_65_UP"  "NEVERMARRY"
## [26] "MARRIED"    "SEPARATED"  "WIDOWED"    "DIVORCED"   "HSEHLD_1_M"
## [31] "HSEHLD_1_F" "MARHH_CHD"  "MARHH_NO_C" "MHH_CHILD"  "FHH_CHILD" 
## [36] "HSE_UNITS"  "VACANT"     "OWNER_OCC"  "RENTER_OCC" "MEDIAN_VAL"
## [41] "MEDIANRENT" "UNITS_1DET" "UNITS_1ATT" "UNITS2"     "UNITS3_9"  
## [46] "UNITS10_49" "UNITS50_UP" "MOBILEHOME" "NO_FARMS87" "AVG_SIZE87"
## [51] "CROP_ACR87" "AVG_SALE87"
## 
## $data_types
##  [1] "C" "C" "C" "C" "C" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N"
## [18] "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N"
## [35] "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N" "N"
## [52] "N"
## 
## $row.names
##  [1] "0"  "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13"
## [15] "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27"
## [29] "28" "29" "30" "31" "32" "33" "34" "35"
## 
## $class
## [1] "data.frame"

Note that the $data_types part of the output indicates whether individual variables are character (“C” e.g. county names) or numerical (“N” e.g. county area).

1. Some simple maps

R has the capability of plotting some simple maps using the maptools package, which can read and plot ESRI shapefiles. Here are a couple of examples:

Oregon county census data – attribute data in the orcounty.shp shape file

# equal-frequency class intervals -- block 1
plotvar <- orcounty.shp@data$POP1990
nclr <- 8
plotclr <- brewer.pal(nclr,"BuPu")
class <- classIntervals(plotvar, nclr, style="quantile")
colcode <- findColours(class, plotclr)

This first block of code (above) just does some setting up, while this next block actually produces the map:

# block 2
plot(orcounty.shp, xlim=c(-124.5, -115), ylim=c(42,47))
plot(orcounty.shp, col=colcode, add=T)
title(main="Population 1990",
    sub="Quantile (Equal-Frequency) Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

Here’s what’s going on. First, a temporary variable plotvar is created by copying the values of orcounty.sh@data$POP1990 (from “attribute” data in the shape file). Next the number of colors is set nclr <- 8 and a set of appropriate colors is generated using the brewer.pal() function. The classIntervals() function assigns the individual observations of plotvar to one of nclr equal-frequency class intervals, and then the findColours() function (note the Canadian/U.K. spelling) determines the color for each observation (each county).

In the second block, the first use of the plot() function plots the shapefile within an appropriate range of latitudes and longitudes, while the second use plots colored polygons on top. The rest of block 2 adds a title and legend. It’s a little clunky, and the placement of labels are not great, but it’s quick.

Symbol plot: Oregon climate station data – data in the orstationc.csv file, basemap in orotl.shp

Here’s a map of the Oregon climate station data with the data coming from the rstationc.csv file, and the basemap from orotl.shp

# symbol plot -- equal-interval class intervals
plotvar <- orstationc$tann
nclr <- 8
plotclr <- brewer.pal(nclr,"PuOr")
plotclr <- plotclr[nclr:1] # reorder colors
class <- classIntervals(plotvar, nclr, style="equal")
colcode <- findColours(class, plotclr)

plot(orotl.shp, xlim=c(-124.5, -115), ylim=c(42,47))
points(orstationc$lon, orstationc$lat, pch=16, col=colcode, cex=2)
points(orstationc$lon, orstationc$lat, cex=2)
title("Oregon Climate Station Data -- Annual Temperature",
    sub="Equal-Interval Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
    fill=attr(colcode, "palette"), cex=0.6, bty="n")

Oregon climate station data – locations and data in shape file

Here’s a third map with the Oregon climate station data locations and data coming from the shape file:

# symbol plot -- equal-interval class intervals
plotvar <- orstations.shp@data$pann
nclr <- 5
plotclr <- brewer.pal(nclr,"BuPu")
class <- classIntervals(plotvar, nclr, style="fixed",
fixedBreaks=c(0,200,500,1000,2000,5000))
colcode <- findColours(class, plotclr)
orstations.pts <- orstations.shp@coords # get point data

plot(orotl.shp, xlim=c(-124.5, -115), ylim=c(42,47))
points(orstations.pts, pch=16, col=colcode, cex=2)
points(orstations.pts, cex=2)
title("Oregon Climate Station Data -- Annual Precipitation",
    sub="Fixed-Interval Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
fill=attr(colcode, "palette"), cex=0.6, bty="n")

Notice how the expression orstations.shp@data$pann refers to a specific variable (pann), contained in the data attribute of the shape file. Some other things: This map uses fixed (ahead of making the map) class intervals (fixedBreaks) and the two points() function “calls”: the first plots a colored dot (col=colcode), and the second then just plots a unfilled dot (in black) to put a nice line around each point to make the symbol more obvious.

2. Variations in color scales and representation

This set of examples illustrates some more applications of the maptools package, and some variations in the construction of class intervals for choropleth maps and symbolic representation of the Oregon county-level census data:

Oregon county census data – equal-frequency class intervals

Here, the class intervals are defined such that each class has the same number of observations.

# equal-frequency class intervals
plotvar <- orcounty.shp@data$POP1990
nclr <- 8
plotclr <- brewer.pal(nclr,"BuPu")
class <- classIntervals(plotvar, nclr, style="quantile")
colcode <- findColours(class, plotclr)

plot(orcounty.shp, xlim=c(-124.5, -115), ylim=c(42,47))
plot(orcounty.shp, col=colcode, add=T)
title(main="Population 1990",
  sub="Quantile (Equal-Frequency) Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
  fill=attr(colcode, "palette"), cex=0.6, bty="n")

Oregon county census data – equal-width class intervals

Now, the classes are defined so than each class has the same “width”.

#equal-width class intervals of 1990 population
plotvar <- orcounty.shp@data$POP1990
nclr <- 8
plotclr <- brewer.pal(nclr,"BuPu")
#plotclr <- plotclr[nclr:1] # reorder colors if appropriate
class <- classIntervals(plotvar, nclr, style="equal")
colcode <- findColours(class, plotclr)

plot(orcounty.shp, xlim=c(-124.5, -115), ylim=c(42,47))
plot(orcounty.shp, col=colcode, add=T)
title(main="Population 1990",
  sub=" Equal-Width Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
  fill=attr(colcode, "palette"), cex=0.6, bty="n")

The same data are being plotted here, the difference is the way in which class intervals are defined. Equal-frequency class intervals create a general gradient of county sizes from the small ones in the northern Willamette Valley to the larger eastern Oregon ones, while the equal-width class intervals emphasize the large differences in the size of the counties.

The bubble plot (on a map)

We tend to be influenced more by the simple area of an object on the page or screen than by whatever is being plotted. For example, here’s the 1990 county population, plotted as polygons.

Notice above that Multnomah county, the most populated one, despite being plotted in a dark color, gets kind of dominated by the larger, less populated counties that surround it. The eastern Oregon counties, the least populated, occupy the greatest area on the map. The solution is the bubble plot:

# bubble plot equal-frequency class intervals
plotvar <- orcounty.shp@data$POP1990
nclr <- 8
plotclr <- brewer.pal(nclr,"BuPu")
#plotclr <- plotclr[nclr:1] # reorder colors if appropriate
max.symbol.size=12
min.symbol.size=1
class <- classIntervals(plotvar, nclr, style="quantile")
colcode <- findColours(class, plotclr)
symbol.size <- ((plotvar-min(plotvar))/
    (max(plotvar)-min(plotvar))*(max.symbol.size-min.symbol.size)
  +min.symbol.size)

plot(orcounty.shp, xlim=c(-124.5, -115), ylim=c(42,47))
orcounty.cntr <- coordinates(orcounty.shp)
points(orcounty.cntr, pch=16, col=colcode, cex=symbol.size)
points(orcounty.cntr, cex=symbol.size)
text(-120, 46.5, "POP1990: Equal-Frequency Class Intervals")
legend(-117, 44, legend=names(attr(colcode, "table")),
  fill=attr(colcode, "palette"), cex=0.6, bty="n")

Here the size of each symbol is calculated using the county population using the code at the end of the first block, and then the symbol is plotted at the centroid of each county, which is located using the coordinates() function. Note that the symbol sizes could be made vary continuously.

Dot maps

Another way to plot data that applies to polygons (counties, states, countries, etc.), is the time-honored dot map:

# maptools dot-density maps
# warning: this can take a little while
plottitle <- "Population 1990, each dot=1000"
orpolys <- SpatialPolygonsDataFrame(orcounty.shp, data=as(orcounty.shp, "data.frame"))
plotvar <- orpolys@data$POP1990/1000.0

dots.rand <- dotsInPolys(orpolys, as.integer(plotvar), f="random")
plot(orpolys, xlim=c(-124.5, -115), ylim=c(42,47))
plot(dots.rand, add=T, pch=19, cex=0.5, col="magenta")
plot(orpolys, add=T)
title(plottitle)

dots.reg <- dotsInPolys(orpolys, as.integer(plotvar), f="regular")
plot(orpolys, xlim=c(-124.5, -115), ylim=c(42,47))
plot(dots.reg, add=T, pch=19, cex=0.5, col="purple")
plot(orpolys, add=T)
title(plottitle)

The first map randomly locates each dot, which looks nicer, but may be misleading in regions where the dots are sparse (i.e. they may look like they are identifying specific places), while the second map clearly shows a stylized depiction of population, but looks kind of strange.