Spatial Visualization with R – Part 2 – ( Working with ShapeFiles)
Share
We are going to work with shape files using R .
Shapefiles
Shapefiles are a geospatial representation of data . The file format is developed by ESRI (GIS software company) . The shapefile format can store data in the form of points , lines, or polygons. Shapefiles are used to represent administrative borders like country , state , roads , rivers etc.
We are using raster package , it is used to download various administrative boundaries for any country . If you know ISO code for the country of interest , you can download shapefiles of the country and plot it.
We installed package “raster” as :
install.packages(“raster”)
We load package “raster” as :
library(raster)
We are using getData() function to get shapefiles of country.
getData()
It is used to get geographic data for anywhere in the world. First , we download the data to store in the object .
The syntax of getData() is :
getData(name , download =TRUE , path=’ ‘ , …)
name – It used ‘GADM’ , ‘countries’ , ‘SRTM’, ‘alt’ and ‘worldclim’ values .
download – If its value is TRUE , data will be downloaded . If it is FALSE , we cannot download data files .
path – It indicates where to store the data . The default value is the current working directory .
country – It specifies by three letter ISO codes.
The name argument has different values .
If name is ‘alt’ or ‘GADM’ , you must provide ‘Country=’ argument .
We can see ISO codes of all countries by using following code :
getData(“ISO3”)
If we are using name as “GADM” , we must provide the level of administrative subdivision . If the level is 0 then it shows country-wise division , If level is 1 then it shows first level division .
If name is “SRTM” , we must provide ‘lon’ and ‘lat’ arguments . This represent SRTM tile between the numbers.
If name is “worldclim” , we must provide arguments var and resolution res. The valid values of var are ‘tmin’,’tmax’,’prec’ and ‘bio’ . The valid values of res are 0.5 , 2.5 , 5 and 10 .
Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !
We want to represent France data in map. We first load the shapefile of France in france object . We used level = 0 gives only national boundaries .
france<-getData(‘GADM’, country=’FRA’, level=0) ##Get the Province Shapefile for France
Then , we plot france data as :
plot(france)
We are using level = 1 to give the boudaries one level below the national boundaries . It gives province level boundaries .
france<-getData(‘GADM’, country=’FRA’, level=1)
plot(france)
We are using level = 2 to give county boundaries.
france<-getData(‘GADM’, country=’FRA’, level=2)
plot(france)
We can also find map of Spain by using ISO code ESP .
spain<-getData(‘GADM’,country=’ESP’ , level=1)
plot(spain)
We want to plot map of Switzerland by using ISO code “CHE” . We used ‘alt’ method for name argument. We used mask argument as TRUE , it does not show neighboring countries in map .
x<-getData(‘alt’, country=’CHE’, mask=TRUE)
plot(x)
We want to plot map of Spain
spain<-getData(“alt”,country=’ESP’,mask=TRUE)
plot(spain)
Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !
We can also plot world map in shape files as :
climate <- getData(‘worldclim’, var=’bio’, res=2.5)
The variable “bio” in var=bio , means it is extraction 19 layers of different bioclimatic variables at the lon/lat coordinates provided.
The code below returns a raster with the 18 bioclimate variables covering the whole world with a resolution of 2.5 minutes of degrees:
BIO1 = Annual Mean Temperature
BIO2 = Mean Diurnal Range (Mean of monthly (max temp – min temp))
BIO3 = Isothermality (BIO2/BIO7) (* 100)
BIO4 = Temperature Seasonality (standard deviation *100)
BIO5 = Max Temperature of Warmest Month
BIO6 = Min Temperature of Coldest Month
BIO7 = Temperature Annual Range (BIO5-BIO6)
BIO8 = Mean Temperature of Wettest Quarter
BIO9 = Mean Temperature of Driest Quarter
BIO10 = Mean Temperature of Warmest Quarter
BIO11 = Mean Temperature of Coldest Quarter
BIO12 = Annual Precipitation
BIO13 = Precipitation of Wettest Month
BIO14 = Precipitation of Driest Month
BIO15 = Precipitation Seasonality (Coefficient of Variation)
BIO16 = Precipitation of Wettest Quarter
BIO17 = Precipitation of Driest Quarter
BIO18 = Precipitation of Warmest Quarter
BIO19 = Precipitation of Coldest Quarter
climate <- getData(‘worldclim’, var=’bio’, res=2.5)
plot(climate$bio1, main=”Annual Mean Temperature”)
We are giving an example of “SRTM” data to plot in map.
“SRTM” returns go elevation data .
spain<-getData(“GADM”,country=’ESP’,level=0)
The first argument specifies the dataset . ‘GADM’ returns the global administrative boundaries.
The second argument provides ‘ESP’ as the ISO code represents Spain.
The third argument specifies level of administrative subdivision . We used level equals to 0 for country specific boundary.
spain1<-getData(“SRTM”,lon=-5 , lat=42)
The first argument specifies SRTM go elevation data. The second argument specifies the lon(longitude) of the SRTM tile . The third argument specifies the lat(latitude) of the SRTM tile.
plot(spain1,main=”Elevation”)
We plot SRTM data in map .
plot(spain,add=TRUE )
We add Spain boundaries by using “add” argument . We add title of the plot by using “main” argument.
We want to plot USA map .We can get shapefiles of USA by using getData() .
us<-getData(‘GADM’, country=’USA’, level=2) #Get the County Shapefile for the US
We plot USA map by using following code :
plot(us)
We can check the names of columns of us dataset .
names(us)
We check the name of all states of USA .
unique(us$NAME_1)
We can plot state-wise plot of USA .
new<-subset(us,NAME_1==”Idaho” | NAME_1==”Nevada” | NAME_1==”California” | NAME_1== “Oregon”)
plot(new)
We can plot Texas map as:
texas<-subset(us,NAME_1==”Texas”)
plot(texas)
We create a subset of crime dataset. We want to see offense in our map . First , we check the list of offenses in crime data . We used unique() function to find unique values of offense.
unique(crime$offense)
We create a subset dataset violent_crimes which does not include offenses “robbery” , “rape” and “aggravated assault”.
violent_crimes <- subset(crime, offense != “robbery” & offense != “rape” & offense != “aggravated assault”)
We check the top observations of violent_crimes as :
We check the structure of violent_crimes . It shows 72461 observations and 17 variables .
str(violent_crimes)
We check the levels of offense variable in violent_crimes dataset .
levels(violent_crimes$offense)
When we check unique values of offense in violent_crimes data . It shows only four offenses.
We need to delete extra offenses of levels of offense variable .
violent_crimes$offense <- factor( violent_crimes$offense, levels =c(“murder”, “burglary”,
“auto theft”, “theft”))
Now , it shows only four offenses in our data .
We subset lon and lat values in following code :
violent_crimes <- subset(violent_crimes,
-95.39681 <= lon & lon <= -95.34188 &
29.73631 <= lat & lat <= 29.78400)
We check the structure of violent_crimes data . It shows 5050 observations and 17 variables .
str(violent_crimes)
We check the top observations in violent_crimes dataset .
We set background theme by using theme_set() function .
theme_set(theme_bw(16))
We create a map plot of Houston area and we choose black-white color and used legend of offense to show in top left corner of map .
HoustonMap <- qmap(“houston”, zoom = 14, color = “bw”, legend = “topleft”)
We add points of latitude and longitude from the violent_crimes dataset in the map.
HoustonMap +
geom_point(aes(x = lon, y = lat, colour = offense, size = offense),data = violent_crimes)
In above plot , we cannot see clearly which area can be highlighted for particular offense due to overplotting and point size . We cannot really get a feel for what crimes are taking place and where .
If we want to see average crime rate . We can get a good idea of the spatial distribution of violent_crimes by using a contour plot . We used ggmap() function to plot raster object produced by get_map object.
Contact at TJT@TechnicalJockey.com , if you are looking for an Instructor Based Online Training or Corporate Training in R !
Contour Heat Maps
A contoured heat map is another way to demonstrate data density . The contour heatmap uses two-dimensional kernel density estimation in order to create the contoured heatmap of the spatial point data.
houston <- get_map(location = “houston”, zoom = 13)
The syntax of ggmap() is –
ggmap(ggmap,extent)
ggmap – an object of class get_map
extent – space to plot the map are – “normal” , “device” or “panel”
We want to plot the map in device of Houston area.
houstonMap<-ggmap(houston, extent = “device”)
To perform a two-dimensional kernel density estimation using sts_density2d() function .The arguments of stat_density2d are –
fill – fill area level-wise
alpha – visually down transparency for less important data .
bins – combine various data point together to form .
geom – to describe shape in map
scale_fill_gradient() – it is used to create two colour gradient (low – high)
We will plot contour map over ggmap as:
houstonMap +
stat_density2d(aes(x = lon, y = lat, fill = ..level..,alpha=..level..), bins = 10, geom = “polygon”, data = crime) +
scale_fill_gradient(low = “black”, high = “red”)+
ggtitle(“Map of Crime Density in Houston”)