1 Introduction

1.1 Spatial data and geospatial analyses in R

CRAN Spatial taskview: https://cran.r-project.org/web/views/Spatial.html

The bulk of the geospatial/GISci analysis tools are contained in the following packages:

  • maptools reading and writing spatial data, particularly shapefiles
  • sp manipulating data in one of the Spatial Data classes
  • rgdal R “bindings” to GDAL (Geospatial Data Abstraction Layer)
  • rgeos R interface to the GEOS “geometry engine” (overlays, etc.)

The book (ASDAR2) R.S. Bivand, E. Pebesma, V. Gómez-Rubio (2013) Applied Spatial Data Analysis with R, 2nd Ed., Springer.

Here’s a .pdf: http://link.springer.com/content/pdf/10.1007%2F978-1-4614-7618-4.pdf (must be on UO Network)

There is also an R package maps that includes a world database, and methods for plotting with the R core graphics.

1.2 Spatial classes

There are several related “classes” of spatial data in R, each consisting of the specific spatial coordinate or geometry data, or the coordinate or geometry data and an associate data frame:

  • SpatialPoints and SpatialPointsDataFrame
  • SpatialLines and SpatialLinesDataFrame
  • SpatialPolygons and SpatialPolygonDataFrame
  • SpatialPixels and SpatialPixelDataFrame
  • SpatialGrid and SpatialGridDataFrame
  • SpatialMultiPoints and SpatialMultiPointsDataFrame

The names of the classes pretty much describe the kind of information they contain. One way to look at the landscape of geospatial data analysis in R is that maptools and rgdal cover reading and writing the spatial data classes, sp handles plotting, conversions and manipulations (including projections with SpTransform()) and rgeos handles geospatial analysis tasks.

Many of the functions in these packages will be exercised below, but these examples are by now means exhaustive.

2 Basic mapping

Here is a very simple example of reading and plotting a couple of shape files, that illustrates use use of some of the maptools and sp functions:

2.1 Load libraries and read the shape files

Load the maptools package:

library(maptools)

Note that maptools automatically also loads the sp package, which could have been explicity loaded via library(sp). When maptools or sp is loaded, they also check for the presence of the rgeos package.

Load two shape files, one of Oreogon county outlines, the other of climate-station locations:

# read county outlines
otl_path <- "/Users/bartlein/Projects/ESSD/data/shp_files/OR/"
otl_name <- "orotl.shp"
otl_file <- paste(otl_path, otl_name, sep="")
orotl_shp <- readShapeLines(otl_file)
## Warning: use rgdal::readOGR or sf::st_read

Check what kind of spatial data class this shapefile is

class(orotl_shp)
## [1] "SpatialLinesDataFrame"
## attr(,"package")
## [1] "sp"
summary(orotl_shp)
## Object of class SpatialLinesDataFrame
## Coordinates:
##          min        max
## x -124.55840 -116.46944
## y   41.98779   46.23626
## Is projected: NA 
## proj4string : [NA]
## Data attributes:
##         NAME   
##  Baker    : 1  
##  Benton   : 1  
##  Clackamas: 1  
##  Clatsop  : 1  
##  Columbia : 1  
##  Coos     : 1  
##  (Other)  :30

Note that the shapefile is a SpatialLinesDataFrame that in addition to the geometry of the counties (the outlines) also contains data (the names). Plot the county outline shapefile:

plot(orotl_shp)

Now get the climate-station data

# read stations
orstations_path <- "/Users/bartlein/Projects/ESSD/data/shp_files/OR/"
orstations_name <- "orstations.shp"
orstations_file <- paste(orstations_path, orstations_name, sep="")
orstations_shp <- readShapePoints(orstations_file)
## Warning: use rgdal::readOGR or sf::st_read
# check stations shapefile
class(orstations_shp)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
summary(orstations_shp)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                min      max
## coords.x1 -124.567 -116.967
## coords.x2   42.050   46.150
## Is projected: NA 
## proj4string : [NA]
## Number of points: 92
## Data attributes:
##     station          latitude       longitude        elevation            tjan             tjul      
##  Min.   :350197   Min.   :42.05   Min.   :-124.6   Min.   :   2.00   Min.   :-6.000   Min.   :12.20  
##  1st Qu.:352308   1st Qu.:43.60   1st Qu.:-123.2   1st Qu.:  94.25   1st Qu.:-1.225   1st Qu.:17.30  
##  Median :354564   Median :44.46   Median :-121.7   Median : 462.00   Median : 0.750   Median :19.20  
##  Mean   :354455   Mean   :44.32   Mean   :-121.3   Mean   : 556.99   Mean   : 1.417   Mean   :19.01  
##  3rd Qu.:356663   3rd Qu.:45.28   3rd Qu.:-119.6   3rd Qu.: 863.75   3rd Qu.: 4.150   3rd Qu.:20.20  
##  Max.   :359316   Max.   :46.15   Max.   :-117.0   Max.   :1974.00   Max.   : 8.400   Max.   :26.40  
##                                                                                                      
##       tann             pjan             pjul            pann             Sta    
##  Min.   : 3.200   Min.   : 14.00   Min.   : 4.00   Min.   : 198.0   MLB    : 2  
##  1st Qu.: 8.700   1st Qu.: 39.75   1st Qu.: 7.00   1st Qu.: 299.0   ANT    : 1  
##  Median :10.400   Median : 87.50   Median : 9.00   Median : 531.0   ARL    : 1  
##  Mean   : 9.885   Mean   :141.67   Mean   :10.84   Mean   : 868.6   ASH    : 1  
##  3rd Qu.:11.200   3rd Qu.:239.25   3rd Qu.:12.25   3rd Qu.:1355.8   AST    : 1  
##  Max.   :12.600   Max.   :414.00   Max.   :34.00   Max.   :2517.0   BAN    : 1  
##                                                                     (Other):85  
##                               Name   
##  ANTELOPE 1 N USA-OR            : 1  
##  ARLINGTON USA-OR               : 1  
##  ASHLAND 1 N USA-OR             : 1  
##  ASTORIA WSO           //R USA-O: 1  
##  BAKER FAA AIRPORT USA-OR       : 1  
##  BAKER KBKR USA-OR              : 1  
##  (Other)                        :86

Note that this shape file also contains some climate data for each station, as well as the station names and locations.

2.2 Plot both shapefiles

Plot the station data:

# plot the station data on top of the outline map
plot(orotl_shp)
points(orstations_shp, pch=16, cex=0.7, col="blue")

More examples of simple maps can be found in the materials for Lecture 5 in GEOG 4/595: http://geog.uoregon.edu/bartlein/courses/geog495/lectures/lec05.html

[Back to top]

3 Extract data from a raster

This example demonstrates how to extract data from a raster for a set of target points contained in a .csv file. In this case, the raster is the Ramankutty and Foley potential natural vegetation data set (https://www.nelson.wisc.edu/sage/data-and-models/global-land-use/index.php), and the target points are the Global Charcoal Database (GCDv3) site locations (http://www.gpwg.paleofire.org)

# load packages
library(raster)
library(rasterVis)
## Loading required package: latticeExtra

3.1 Read the data sets – source and target

User the raster package to read the vegetation data

# read potential natural vegetation data sage_veg30.nc:
vegtype_path <- "/Users/bartlein/Projects/ESSD/data/nc_files/"
vegtype_name <- "sage_veg30.nc"
vegtype_file <- paste(vegtype_path, vegtype_name, sep="")
vegtype <- raster(vegtype_file, varname="vegtype")
## Loading required namespace: ncdf4
vegtype
## class       : RasterLayer 
## dimensions  : 360, 720, 259200  (nrow, ncol, ncell)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : /Users/bartlein/Projects/ESSD/data/nc_files/sage_veg30.nc 
## names       : vegetation.type 
## zvar        : vegtype

Plot the vegetation data using a rasterVis levelplot:

mapTheme <- rasterTheme(region=rev(brewer.pal(8,"Greens")))
levelplot(vegtype, margin=FALSE, par.settings=mapTheme,
                 main="Potential Natural Vegetation")

Read the charcoal data locations:

# read GCDv3 sites
csv_path <- "/Users/bartlein/Projects/ESSD/data/csv_files/"
csv_name <- "v3i_nsa_globe.csv"
csv_file <- paste(csv_path, csv_name, sep="")
gcdv3 <- read.csv(csv_file) 
str(gcdv3)
## 'data.frame':    703 obs. of  6 variables:
##  $ Site_ID     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Lat         : num  44.7 44.9 45.7 45.9 46.3 ...
##  $ Lon         : num  -111 -110 -115 -114 -115 ...
##  $ Elev        : num  2530 1884 2250 2300 1770 ...
##  $ depo_context: Factor w/ 12 levels "BOSE","BUOR",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ Site_Name   : Factor w/ 703 levels "17940 core","7-M",..: 130 605 75 33 219 530 179 62 55 97 ...
plot(gcdv3$Lon, gcdv3$Lat, pch=16, cex=0.5, col="blue")

In order to use the extract() function from raster, the target points must be turned into a SpatialPoints data set.

# turn into SpatialPoints
gcdv3_coords <- cbind(gcdv3$Lon, gcdv3$Lat)
gcdv3_pts <- SpatialPoints(coords=gcdv3_coords, proj4string = CRS("+proj=longlat +datum=WGS84"))
class(gcdv3_pts)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
# plot(gcdv3_pts, pch=16, cex=0.5)

Plot the target points on top of ths source map:

plt <- levelplot(vegtype, margin=FALSE, par.settings=mapTheme,
                 main="Potential Natural Vegetation")
plt + layer(sp.points(gcdv3_pts, col="blue", pch=16, cex=0.5))

3.2 Extract data at target points

Now extract the data for the target points:

# extract data from the raster at the target points
gcdv3_vegtype <- extract(vegtype, gcdv3_pts, method="simple")
class(gcdv3_vegtype)
## [1] "numeric"
head(gcdv3_vegtype)
## [1] 6 6 6 6 6 6

Make a dataframe of the extracted data that could be saved as a .csv file, and plot it:

pts <- data.frame(gcdv3$Lon, gcdv3$Lat, gcdv3_vegtype)
names(pts) <- c("Lon", "Lat", "vegtype")
head(pts, 10)
##          Lon     Lat vegtype
## 1  -110.6161 44.6628       6
## 2  -110.3467 44.9183       6
## 3  -114.9867 45.7044       6
## 4  -114.2619 45.8919       6
## 5  -114.6503 46.3206       6
## 6  -113.4403 45.8406       6
## 7  -114.3592 48.1658       6
## 8  -123.4600 42.0200       4
## 9  -122.5575 41.3467       6
## 10 -122.4954 41.2075       4
plotclr <- rev(brewer.pal(8,"Greens"))
plotclr <- c("#AAAAAA", plotclr)
cutpts <- c(0, 2, 4, 6, 8, 10, 12, 14, 16)
color_class <- findInterval(gcdv3_vegtype, cutpts)
plot(pts$Lon, pts$Lat, col=plotclr[color_class+1], pch=16)

Plot the extracted data at the target points on top of the source points. If the extraction is successful, the target-point colors should dissappear against the background.

plt <- levelplot(vegtype, margin=FALSE, par.settings=mapTheme,
                 main="Potential Natural Vegetation")
plotclr <- rev(brewer.pal(8,"Greens"))

cutpts <- c(0, 2, 4, 6, 8, 10, 12, 14, 16)
color_class <- findInterval(gcdv3_vegtype, cutpts)
plt + layer(sp.points(gcdv3_pts, col=plotclr[color_class], pch=16, cex=0.6)) + 
  layer(sp.points(gcdv3_pts, col="black", pch=1, cex=0.6))

[Back to top]

3.3 A second example – explicit cell selection

Here’s a second example of extracting values from an array, by referencing the specific cell or array element that a target point falls in. In this example, a netCDF file of bioclimatic variables is read using ncdf4 and the values of mtco (mean temperature of the coldest month) are extracted.

library(ncdf4)
library(lattice)
library(classInt)
library(RColorBrewer)
# set path and filename
ncpath <- "/Users/bartlein/Projects/ESSD/data/nc_files/"
ncname <- "cru10min30_bio.nc"  
ncfname <- paste(ncpath, ncname, sep="")
# open a netCDF file
ncin <- nc_open(ncfname)
print(ncin)
## File /Users/bartlein/Projects/ESSD/data/nc_files/cru10min30_bio.nc (NC_FORMAT_CLASSIC):
## 
##      19 variables (excluding dimension variables):
##         float climatology_bounds[nv,time]   
##         float gdd0[lon,lat]   
##             name: gdd0
##             long_name: Growing-degree days, 0C base
##             units: degdays
##             _FillValue: -99
##         float gdd5[lon,lat]   
##             name: gdd5
##             long_name: Growing-degree days, 5C base
##             units: degdays
##             _FillValue: -99
##         float chill[lon,lat]   
##             name: chill
##             long_name: Number of days below 5C
##             units: days
##             _FillValue: -99
##         float mtco[lon,lat]   
##             name: mtco
##             long_name: Mean temperature coldest month
##             units: C
##             _FillValue: -99
##         float mtwa[lon,lat]   
##             name: mtwa
##             long_name: Mean temperature warmest month
##             units: C
##             _FillValue: -99
##         float mipt[lon,lat]   
##             name: mipt
##             long_name: Priestley-Taylor (alpha) parameter (AE/PE)
##             units: ratio
##             _FillValue: -99
##         float aaetpt[lon,lat]   
##             name: aaetpt
##             long_name: Actual evapotranspiration (AE)
##             units: mm
##             _FillValue: -99
##         float apetpt[lon,lat]   
##             name: apetpt
##             long_name: Potential evapotranspiration (PE)
##             units: mm
##             _FillValue: -99
##         float miptever[lon,lat]   
##             name: miptever
##             long_name: Alpha -- evergreen assimilation period
##             units: ratio
##             _FillValue: -99
##         float aaetever[lon,lat]   
##             name: aaetever
##             long_name: AE -- evergreen assimilation period
##             units: mm
##             _FillValue: -99
##         float apetever[lon,lat]   
##             name: apetever
##             long_name: PE -- evergreen assimilation period
##             units: mm
##             _FillValue: -99
##         float miptdecd[lon,lat]   
##             name: miptdecd
##             long_name: Alpha -- deciduous ssimilation period
##             units: ratio
##             _FillValue: -99
##         float aaetdecd[lon,lat]   
##             name: aaetdecd
##             long_name: AE -- deciduous assimilation period
##             units: mm
##             _FillValue: -99
##         float apetdecd[lon,lat]   
##             name: apetdecd
##             long_name: PE -- deciduous assimilation period
##             units: mm
##             _FillValue: -99
##         float totsnpt[lon,lat]   
##             name: totsnpt
##             long_name: Total snow (from surface water balance)
##             units: mm
##             _FillValue: -99
##         float apr[lon,lat]   
##             name: apr
##             long_name: Annual precipitation (mm)
##             units: mm
##             _FillValue: -99
##         float pjanpann[lon,lat]   
##             name: pjanpann
##             long_name: January/Annual precipitation ratio
##             units: ratio
##             _FillValue: -99
##         float pjulpann[lon,lat]   
##             name: pjulpann
##             long_name: July/Annual precipitation ratio
##             units: ratio
##             _FillValue: -99
## 
##      4 dimensions:
##         lon  Size:720
##             standard_name: longitude
##             long_name: longitude
##             units: degrees_east
##             axis: X
##         lat  Size:360
##             standard_name: latitude
##             long_name: latitude
##             units: degrees_north
##             axis: Y
##         time  Size:12
##             standard_name: time
##             long_name: time
##             units: days since 1900-01-01 00:00:00.0 -0:00
##             axis: T
##             calendar: standard
##             climatology: climatology_bounds
##         nv  Size:2
## 
##     3 global attributes:
##         source: calculated using Sarah Shafer's version of bioclim.f
##         data: e:\Projects\cru\work\cl_2.0_regrid\regions\cru10min30\cru10min30_bio.dat
##         history: P.J. Bartlein, 6 Dec 2007 from 2 Dec 2006 data
# get longitude and latitude
lon <- ncvar_get(ncin,"lon")
nlon <- dim(lon)
head(lon)
## [1] -179.75 -179.25 -178.75 -178.25 -177.75 -177.25
lat <- ncvar_get(ncin,"lat")
nlat <- dim(lat)
head(lat)
## [1] -89.75 -89.25 -88.75 -88.25 -87.75 -87.25
print(c(nlon,nlat))
## [1] 720 360
# get the mtco data
mtco <- ncvar_get(ncin,"mtco")
dlname <- ncatt_get(ncin,"mtco","long_name")
dunits <- ncatt_get(ncin,"mtco","units")
fillvalue <- ncatt_get(ncin,"mtco","_FillValue")
dim(mtco)
## [1] 720 360
mtco[mtco==fillvalue$value] <- NA

Close the netCDF file using the nc_close() function.

# close the netCDF file
nc_close(ncin)

Plot the control data.

# levelplot of the slice
grid <- expand.grid(lon=lon, lat=lat)
cutpts <- c(-50,-40,-30,-20,-10,0,10,20,30,40,50)
levelplot(mtco ~ lon * lat, data=grid, at=cutpts, cuts=11, pretty=TRUE, 
  col.regions=(rev(brewer.pal(10,"RdBu"))))

Now extract the data for the target points:

Get the indices (j’s and k’s) of the grid cell that each target point lies in. For each target point, figure out which column (j) and row (k) a target point falls in. This code is basically the same as that used in reshaping a “short” data frame into an array. The function that is defined and executed within the sapply() function figures out which column (j) and row (‘k’) in the control-data array a target point falls in. The j’s and k’s together describe the control-data grid cells the individual target points fall in.

j <- sapply(gcdv3$Lon, function(x) which.min(abs(lon-x)))
k <- sapply(gcdv3$Lat, function(x) which.min(abs(lat-x)))
head(cbind(j,k)); tail(cbind(j,k))
##        j   k
## [1,] 139 270
## [2,] 140 270
## [3,] 131 272
## [4,] 132 272
## [5,] 131 273
## [6,] 134 272
##          j   k
## [698,] 219 218
## [699,] 219 218
## [700,] 115 271
## [701,] 114 269
## [702,] 115 269
## [703,] 123 277

Get the data for each j, k combination. The way to do this is the convert the mtco array to a vector, and then calculate an index jk for each target value:

mtco_vec <- as.vector(mtco)
jk <- (k-1)*nlon + j
gcdv3_mtco <- mtco_vec[jk]
head(cbind(j,k,jk,gcdv3_mtco,lon[j],lat[k]))
##        j   k     jk gcdv3_mtco              
## [1,] 139 270 193819      -11.4 -110.75 44.75
## [2,] 140 270 193820      -10.7 -110.25 44.75
## [3,] 131 272 195251       -6.3 -114.75 45.75
## [4,] 132 272 195252       -6.4 -114.25 45.75
## [5,] 131 273 195971       -5.8 -114.75 46.25
## [6,] 134 272 195254       -6.6 -113.25 45.75
gcdv3_mtco[is.na(gcdv3_mtco)] <- -99
pts <- data.frame(gcdv3$Lon, gcdv3$Lat, gcdv3_mtco)
names(pts) <- c("Lon", "Lat", "mtco")
head(pts, 20)
##          Lon      Lat  mtco
## 1  -110.6161 44.66280 -11.4
## 2  -110.3467 44.91830 -10.7
## 3  -114.9867 45.70440  -6.3
## 4  -114.2619 45.89190  -6.4
## 5  -114.6503 46.32060  -5.8
## 6  -113.4403 45.84060  -6.6
## 7  -114.3592 48.16580  -6.3
## 8  -123.4600 42.02000   3.3
## 9  -122.5575 41.34670  -0.6
## 10 -122.4954 41.20750   0.5
## 11 -122.5778 41.38360  -0.6
## 12 -122.5092 41.19130  -0.6
## 13 -110.1700 44.28000 -11.2
## 14 -123.5792 45.82420   4.4
## 15 -123.9067 46.10060   4.9
## 16 -119.9767 33.95639 -99.0
## 17 -127.0500 65.21667 -27.9
## 18 -108.1500 37.41667  -4.6
## 19 -105.5000 37.55000  -6.9
## 20 -112.2167 36.71667  -2.7

Plot the extracted values of mtco. To do this, the colors for plotting the different levels of mtco are generated from and RColorBrewer palette, and augmented by gray ("#AAAAAA") to handle missing values (i.e. from marine charcoal records, or those from sites “off” the cru10min30 grid).

plotclr <- rev(brewer.pal(10,"RdBu"))
plotclr <- c("#AAAAAA", plotclr)
cutpts <- c(-50,-40,-30,-20,-10,0,10,20,30,40,50)
color_class <- findInterval(gcdv3_mtco, cutpts)
plot(gcdv3$Lon, gcdv3$Lat, col=plotclr[color_class+1], pch=16)

[Back to top]

4 Clipping/Trimming/Point-in-polygon analyses

A common problem arises in dealing with spatial data is the “clipping” or “trimming” problem, in which one wants to know whether one or more points lie within the outlines of a particular polygon, or in the case of multiple polygons, which polygon an individual point lies in. More formally, this is known as the “point in polygon” problem. Here the process of clipping the na_10km_v2 climate grid to a tree-species shape file, in particular, that for Picea mariana (black spruce) is demonstrated. What we want is a list of climate data set points that lie withing the range of P. mariana.

4.1 Read the polygon and target-point files

library(maptools)
library(rgeos)

Read the shapefile for Picea mariana

# read the shape file for Picea Mariana
shp_path <- "/Users/bartlein/Projects/ESSD/data/shp_files/USGS_pp1650/"
shp_name <- "picemari.shp"
shp_file <- paste(shp_path, shp_name, sep="")
picea_shp <- readShapePoly(shp_file)
## Warning: use rgdal::readOGR or sf::st_read

Add a coordinate refernce system to the shapefile, and plot it:

proj4string(picea_shp) = CRS("+proj=longlat +datum=WGS84")
class(picea_shp)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
plot(picea_shp)

Read the na10km_v2 points as a .csv file (for illustration, in practice it would be more efficient to read it as a netCDF file).

# read the na10km_v2 points (as a .csv file)
csv_path <- "/Users/bartlein/Projects/ESSD/data/csv_files/"
csv_name <- "na10km_v2.csv"
csv_file <- paste(csv_path, csv_name, sep="")
na10km_v2 <- read.csv(csv_file)
str(na10km_v2)
## 'data.frame':    277910 obs. of  7 variables:
##  $ x     : num  2690000 2700000 2710000 2720000 2730000 2740000 2750000 2760000 2770000 2780000 ...
##  $ y     : num  -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 ...
##  $ lon   : num  -77.3 -77.2 -77.1 -77 -77 ...
##  $ lat   : num  5.12 5.1 5.08 5.05 5.03 ...
##  $ mask  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ etopo1: int  67 61 67 26 43 56 30 57 134 652 ...
##  $ srtm30: int  78 39 52 22 79 49 32 50 110 318 ...

For convenience in display (because the na10km_v2 grid includes Europe and wraps around the dateline), reduce it to only the points west of 45W:

na10km_v2 <- na10km_v2[na10km_v2$lon <= -45.0, ]

Make a SpatialPointsDataFrame of the na10km_v2 data-point locations, and plot it:

# make a SpatialPointsDataFrame
na10km_v2_coords <- cbind(na10km_v2$lon, na10km_v2$lat)
na10km_v2_pts <- SpatialPoints(coords=na10km_v2_coords, proj4string = CRS("+proj=longlat +datum=WGS84"))
class(na10km_v2_pts)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
plot(na10km_v2_pts, pch=16, cex=0.4)

4.2 Overlay the points onto the polygons

Now overlay the points onto the polygons. Note that the input shape file was a SpatialPolygonsDataFrame object, so convert it to a simpler “SpatialPolygons” file first. The over() function from rgeos takes a little while to run.

# overlay the two shape files
picea_poly <- as(picea_shp, "SpatialPolygons")
class(picea_poly)
## [1] "SpatialPolygons"
## attr(,"package")
## [1] "sp"
picea_pts <- over(na10km_v2_pts, picea_poly)
class(picea_pts)
## [1] "integer"

Make a dataframe from the “trimmed” points and plot it on top of the na10km_v2 points:

picea_pts2 <- data.frame(na10km_v2_coords[!is.na(picea_pts), ] )
plot(na10km_v2_pts, pch=16, cex=0.4)
points(picea_pts2, pch=16, cex=0.1, col="green")

[Back to top]

5 Gridding or rasterizing point data

Another common task is to grid or rasterize a set of point data, creating a gridded data set of counts, averages, minima, maxima, etc. This can be illustrated using the FPA-FOD daily fire-fire start data set: Spatial wildfire occurrence data for the United States, 1992-2013/Fire Program Analysis Fire-Occurrence Database [FPA_FOD_20150323] (3nd Edition) (Short, K.C., 2014, Earth Syst. Sci. Data, 6, 1-27) – http://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.3/. The idea here is to calculate the number and average area of the fires that occured in 0.5-degree grid cells that cover the coterminous U.S.

5.1 Read the data sets, and create empty rasters

library(raster)
library(rasterVis)

Read the fire-start data:

# read the FPA-FOD fire-start data
csv_path <- "/Users/bartlein/Projects/ESSD/data/csv_files/"
csv_name <- "fpafod_1992-2013.csv"
csv_file <- paste(csv_path, csv_name, sep="")
fpafod <- read.csv(csv_file) # takes a while
str(fpafod)
## 'data.frame':    1727476 obs. of  14 variables:
##  $ datasource    : Factor w/ 1 level "fpafod": 1 1 1 1 1 1 1 1 1 1 ...
##  $ sourceid      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ latitude      : num  40 38.9 39 38.6 38.6 ...
##  $ longitude     : num  -121 -120 -121 -120 -120 ...
##  $ year          : int  2005 2004 2004 2004 2004 2004 2004 2005 2005 2004 ...
##  $ mon           : int  2 5 5 6 6 6 7 3 3 7 ...
##  $ day           : int  2 12 31 28 28 30 1 8 15 1 ...
##  $ daynum        : int  33 133 152 180 180 182 183 67 74 183 ...
##  $ area_ha       : num  0.0405 0.1012 0.0405 0.0405 0.0405 ...
##  $ cause_original: int  9 1 5 1 1 1 1 5 5 1 ...
##  $ cause1        : int  2 1 2 1 1 1 1 2 2 1 ...
##  $ cause2        : int  8 1 5 1 1 1 1 5 5 1 ...
##  $ stateprov     : Factor w/ 52 levels "AK","AL","AR",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ agency        : Factor w/ 11 levels "BIA","BLM","BOR",..: 6 6 6 6 6 6 6 6 6 6 ...

Convert the fire-start data to a SpatialPointsDataFrame

fpafod_coords <- cbind(fpafod$longitude, fpafod$latitude)
fpafod_pts <- SpatialPointsDataFrame(coords=fpafod_coords, data=data.frame(fpafod$area_ha))
names(fpafod_pts) <- "area_ha"

Create a couple of empty rasters to hold the rasterized data:

# create (empty) rasters
cell_size <- 0.5
lon_min <- -128.0; lon_max <- -65.0; lat_min <- 25.5; lat_max <- 50.5
ncols <- ((lon_max - lon_min)/cell_size)+1; nrows <- ((lat_max - lat_min)/cell_size)+1 
us_fire_counts <- raster(nrows=nrows, ncols=ncols, xmn=lon_min, xmx=lon_max, ymn=lat_min, ymx=lat_max, res=0.5, crs="+proj=longlat +datum=WGS84")
us_fire_counts
## class       : RasterLayer 
## dimensions  : 50, 126, 6300  (nrow, ncol, ncell)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -128, -65, 25.5, 50.5  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
us_fire_area <- raster(nrows=nrows, ncols=ncols, xmn=lon_min, xmx=lon_max, ymn=lat_min, ymx=lat_max, res=0.5, crs="+proj=longlat +datum=WGS84")
us_fire_area
## class       : RasterLayer 
## dimensions  : 50, 126, 6300  (nrow, ncol, ncell)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -128, -65, 25.5, 50.5  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

5.2 Rasterize the data

Now rasterize the data, first as counts (number of fires in each grid cell), and then as the average size of the fires in each grid cell. Also plot the data (both on a log10 scale:

# rasterize
us_fire_counts <- rasterize(fpafod_coords, us_fire_counts, fun="count")
us_fire_counts
## class       : RasterLayer 
## dimensions  : 50, 126, 6300  (nrow, ncol, ncell)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -128, -65, 25.5, 50.5  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : in memory
## names       : layer 
## values      : 1, 8755  (min, max)
plot(log10(us_fire_counts), col=brewer.pal(9,"BuPu"), sub="log10 Number of Fires")

us_fire_area <- rasterize(fpafod_pts, us_fire_area, fun=mean)
us_fire_area
## class       : RasterBrick 
## dimensions  : 50, 126, 6300, 2  (nrow, ncol, ncell, nlayers)
## resolution  : 0.5, 0.5  (x, y)
## extent      : -128, -65, 25.5, 50.5  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
## data source : in memory
## names       :          ID,     area_ha 
## min values  : 1.29922e+05, 4.04686e-03 
## max values  :  1677519.00,    18210.87
plot(log10(us_fire_area$area_ha), col=brewer.pal(9,"YlOrRd"), sub="log10 Mean Area")

5.3 Write the rasterized data out as a netCDF file

Write the two rasterized data sets out as variables in a netCDF data set. Create some variables, and replace the R NAs with netCDF fillvalues:

# make necessary vectors and arrays
lon <- seq(lon_min+0.25, lon_max-0.25, by=cell_size)
lat <- seq(lat_max-0.25, lat_min+0.25, by=-1*cell_size)
print(c(length(lon), length(lat)))
## [1] 126  50
fillvalue <- 1e32
us_fire_counts2 <- t(as.matrix(us_fire_counts$layer, nrows=ncols, ncols=nrows))
dim(us_fire_counts2)
## [1] 126  50
us_fire_counts2[is.na(us_fire_counts2)] <- fillvalue

us_fire_area2 <- t(as.matrix(us_fire_area$area_ha, nrows=ncols, ncols=nrows))
dim(us_fire_area2)
## [1] 126  50
us_fire_area2[is.na(us_fire_area2)] <- fillvalue

Write out a netCDF file:

# write out a netCDF file
library(ncdf4)

# path and file name, set dname
ncpath <- "/Users/bartlein/Projects/ESSD/data/nc_files/"
ncname <- "us_fires.nc"  
ncfname <- paste(ncpath, ncname, sep="")

# create and write the netCDF file -- ncdf4 version
# define dimensions
londim <- ncdim_def("lon","degrees_east",as.double(lon)) 
latdim <- ncdim_def("lat","degrees_north",as.double(lat)) 

# define variables

dname <- "fpafod_counts" 
dlname <- "Number of fires, 1992-2013"
v1_def <- ncvar_def(dname,"1",list(londim,latdim),fillvalue,dlname,prec="single")
dname <- "fpafod_mean_area" 
dlname <- "Average Fire Size, 1992-2013"
v2_def <- ncvar_def(dname,"ha",list(londim,latdim),fillvalue,dlname,prec="single")

# create netCDF file and put arrays
ncout <- nc_create(ncfname,list(v1_def, v2_def),force_v4=TRUE)

# put variables
ncvar_put(ncout,v1_def,us_fire_counts2)
ncvar_put(ncout,v2_def,us_fire_area2)

# put additional attributes into dimension and data variables
ncatt_put(ncout,"lon","axis","X") 
ncatt_put(ncout,"lat","axis","Y")

# add global attributes
ncatt_put(ncout,0,"title","FPA-FOD Fires")
ncatt_put(ncout,0,"institution","USFS")
ncatt_put(ncout,0,"source","http://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.3/")
ncatt_put(ncout,0,"references", "Short, K.C., 2014, Earth Syst. Sci. Data, 6, 1-27")
history <- paste("P.J. Bartlein", date(), sep=", ")
ncatt_put(ncout,0,"history",history)
ncatt_put(ncout,0,"Conventions","CF-1.6")

# Get a summary of the created file:
ncout
## File /Users/bartlein/Projects/ESSD/data/nc_files/us_fires.nc (NC_FORMAT_NETCDF4):
## 
##      2 variables (excluding dimension variables):
##         float fpafod_counts[lon,lat]   (Contiguous storage)  
##             units: 1
##             _FillValue: 1.00000003318135e+32
##             long_name: Number of fires, 1992-2013
##         float fpafod_mean_area[lon,lat]   (Contiguous storage)  
##             units: ha
##             _FillValue: 1.00000003318135e+32
##             long_name: Average Fire Size, 1992-2013
## 
##      2 dimensions:
##         lon  Size:126
##             units: degrees_east
##             long_name: lon
##             axis: X
##         lat  Size:50
##             units: degrees_north
##             long_name: lat
##             axis: Y
## 
##     6 global attributes:
##         title: FPA-FOD Fires
##         institution: USFS
##         source: http://www.fs.usda.gov/rds/archive/Product/RDS-2013-0009.3/
##         references: Short, K.C., 2014, Earth Syst. Sci. Data, 6, 1-27
##         history: P.J. Bartlein, Sun Jan  7 14:35:10 2018
##         Conventions: CF-1.6
# close the file, writing data to disk
nc_close(ncout)

[Back to top]

6 Interpolating/regridding

Another common task involves tranferring values from one gridded data set to another. When the grids are identical, this is trivial, but when the “target” grid is different from the source or “control” grid, this involves interpolation (as does also the case when the target points are irregularly distributed). The most widely used method for interpolation within a control grid is bilinear interpolation, which involves finding the control grid points that surround a particular target point, and then simultaneously (linearlly) interplating in the x- and y-directions. The method is implemented in the raster package, and relatively fast version is implemented in the fields package.

The example here uses a lower-resolution verions of the ETOPO1 global DEM as the source file, and the locations of the (land-only) points in the na10km_v2 grid as the targets. These are read in here from a .csv file, but they also could have come from a netCDF file.

6.1 Open the “control” netCDF and target .csv files

Load the appropriate packages.

# load the ncdf4 package
library(ncdf4)
library(lattice)
library(fields)

Read the etopo1 netCDF file. This particular file is one in which the original 30-sec data have been aggregated (by averaging) to six minutes or one-tenth of a degree (to speed up the execution of the examples). In practice, one would work with the original higher resolution data. Do some setup an open the file, and list its contents.

# set path and filename
ncpath <- "/Users/bartlein/Projects/ESSD/data/nc_files/"
ncname <- "etopo1_ig_06min.nc"  
ncfname <- paste(ncpath, ncname,  sep="")
dname <- "elev"  
# open a netCDF file
ncin <- nc_open(ncfname)
print(ncin)
## File /Users/bartlein/Projects/ESSD/data/nc_files/etopo1_ig_06min.nc (NC_FORMAT_CLASSIC):
## 
##      1 variables (excluding dimension variables):
##         short elev[lon,lat]   
##             long_name: elevation
##             units: m
##             valid_min: -10360
##             valid_max: 6365
##             scale_factor: 1
##             add_offset: 0
##             _FillValue: -32768
##             source: z in etopo1_ig.nc -- averaged
## 
##      2 dimensions:
##         lon  Size:3600
##             standard_name: longitude
##             long_name: longitude
##             units: degrees_east
##             axis: X
##         lat  Size:1800
##             standard_name: latitude
##             long_name: latitude
##             units: degrees_north
##             axis: Y
## 
##     7 global attributes:
##         title: 1-Minute Gridded Global Relief Data (ETOPO1)
##         data: grid-registered data -- ETOPO1_Ice_g.grd
##         institution: http://www.ngdc.noaa.gov/ngdc.html
##         source: http://www.ngdc.noaa.gov/mgg/global/global.html
##         history: This file created by P.J. Bartlein, 28 Nov 2008
##         comment: average over 7x7 1-min cells
##         Conventions: CF-1.0

Get the “control” longitudes and latitudes.

# get longitude and latitude
lon <- ncvar_get(ncin,"lon")
nlon <- dim(lon)
head(lon)
## [1] -179.95 -179.85 -179.75 -179.65 -179.55 -179.45
lat <- ncvar_get(ncin,"lat")
nlat <- dim(lat)
head(lat)
## [1] -89.95 -89.85 -89.75 -89.65 -89.55 -89.45
print(c(nlon,nlat))
## [1] 3600 1800

Now read the elevations, and various attributes:

# get elevations
etopo1_array <- ncvar_get(ncin,dname)
dlname <- ncatt_get(ncin,dname,"long_name")
dunits <- ncatt_get(ncin,dname,"units")
fillvalue <- ncatt_get(ncin,dname,"_FillValue")
dsource <- ncatt_get(ncin, "elev", "source")
dim(etopo1_array)
## [1] 3600 1800
# get global attributes
title <- ncatt_get(ncin,0,"title")
institution <- ncatt_get(ncin,0,"institution")
datasource <- ncatt_get(ncin,0,"source")
history <- ncatt_get(ncin,0,"history")
Conventions <- ncatt_get(ncin,0,"Conventions")

Close the netCDF file using the nc_close() function.

# close the netCDF file
nc_close(ncin)

Produce a quick map to check that the data make sense. (Always do this!)

# levelplot of elevations
grid <- expand.grid(lon=lon, lat=lat)
cutpts <- c(-7000, -6000, -4000, -2000, 0, 500, 1000, 1500, 2000, 3000, 4000, 5000)
levelplot(etopo1_array ~ lon * lat, data=grid, at=cutpts, cuts=11, pretty=TRUE, 
  col.regions=topo.colors(12))

Open and read the .csv file containing the “target”" points.

# read na10km_v2 grid-point locations -- land-points only
csv_path <- "/Users/bartlein/Projects/ESSD/data/csv_files/"
csv_name <- "na10km_v2_pts.csv"
csv_file <- paste(csv_path, csv_name, sep="")
na10km_v2 <- read.csv(csv_file) 
str(na10km_v2)
## 'data.frame':    277910 obs. of  5 variables:
##  $ x   : num  2690000 2700000 2710000 2720000 2730000 2740000 2750000 2760000 2770000 2780000 ...
##  $ y   : num  -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 -4510000 ...
##  $ lon : num  -77.3 -77.2 -77.1 -77 -77 ...
##  $ lat : num  5.12 5.1 5.08 5.05 5.03 ...
##  $ mask: int  1 1 1 1 1 1 1 1 1 1 ...

Get the number of target points:

# get number of target points
ntarg <- dim(na10km_v2)[1]
ntarg
## [1] 277910

6.2 Interpolation

Set up to do the interpolation. Generate x’s and y’s for the target grid. In practice, these might be read in from a netCDF file. Here we need them to define an array that will hold the interpolated values.

# set dimesions of output array, and define "marginal" x- and y-values
nx <- 1078; ny <- 900
x <- seq(-5770000, 5000000, by=10000)
y <- seq(-4510000, 4480000, by=10000)

# define a vector and array that will contiain the interpolated values
interp_var <- rep(NA, ntarg)
interp_mat <- array(NA, dim=c(nx, ny))

Get the array subscripts for each point. This is similar to the work done in reshaping a “short” data frame to an array, as in Week 4

# get array subscripts for each target point
j <- sapply(na10km_v2$x, function(c) which.min(abs(x-c)))
k <- sapply(na10km_v2$y, function(c) which.min(abs(y-c)))
head(cbind(j,k,na10km_v2$lon,na10km_v2$lat))
##        j k                   
## [1,] 847 1 -77.29202 5.124395
## [2,] 848 1 -77.20858 5.099891
## [3,] 849 1 -77.12515 5.075297
## [4,] 850 1 -77.04173 5.050615
## [5,] 851 1 -76.95832 5.025843
## [6,] 852 1 -76.87492 5.000983
tail(cbind(j,k,na10km_v2$lon,na10km_v2$lat))
##              j   k                  
## [277905,] 1073 900 4.212445 47.09974
## [277906,] 1074 900 4.238897 47.00791
## [277907,] 1075 900 4.265386 46.91605
## [277908,] 1076 900 4.291913 46.82415
## [277909,] 1077 900 4.318477 46.73222
## [277910,] 1078 900 4.345078 46.64025

Now do bilinear interpolation using the fields package:

# bilinear interpolation from fields package
control_dat <- list(x=lon, y=lat, z=etopo1_array)
interp_var <- interp.surface(control_dat, cbind(na10km_v2$lon,na10km_v2$lat))

# put interpolated values into a 2-d array
interp_mat[cbind(j,k)] <- interp_var[1:ntarg]

Get a quick map:

grid <- expand.grid(x=x, y=y)
cutpts <- c(-7000, -6000, -4000, -2000, 0, 500, 1000, 1500, 2000, 3000, 4000, 5000)
levelplot(interp_mat ~ x * y, data=grid, at=cutpts, cuts=11, pretty=TRUE, 
  col.regions=topo.colors(12))

At this point, interp_mat could be written out as a variable in a netCDF file (along with dimension and attribute data). Also make a data frame of the interpolated data, which could be written out as a .csv file:

# make a dataframe of the interpolated elevations
out_df <- data.frame(cbind(na10km_v2$x, na10km_v2$y, na10km_v2$lon, na10km_v2$lat, interp_var))
names(out_df) <- c("x", "y", "lon", "lat", "elev")
head(out_df); tail(out_df)
##         x        y       lon      lat     elev
## 1 2690000 -4510000 -77.29202 5.124395 64.91837
## 2 2700000 -4510000 -77.20858 5.099891 87.74857
## 3 2710000 -4510000 -77.12515 5.075297 69.43258
## 4 2720000 -4510000 -77.04173 5.050615 52.84844
## 5 2730000 -4510000 -76.95832 5.025843 54.49635
## 6 2740000 -4510000 -76.87492 5.000983 54.86496
##              x       y      lon      lat     elev
## 277905 4950000 4480000 4.212445 47.09974 471.2087
## 277906 4960000 4480000 4.238897 47.00791 383.3458
## 277907 4970000 4480000 4.265386 46.91605 381.3808
## 277908 4980000 4480000 4.291913 46.82415 404.7478
## 277909 4990000 4480000 4.318477 46.73222 345.8679
## 277910 5000000 4480000 4.345078 46.64025 313.1289

[Back to top]