This walkthrough is part of the ECLR page. A video walkthrough is available from YouTube, 32 min.
In this workthrough we will create some maps. What do you mean here, we will take some data and make their features visible on a map.
We need to load some libraries.
# install.packages("tidyverse","readxl") # only use this line once on your computer (and not at all in Comp labs)
library(tidyverse) # for almost all data handling tasks
library(readxl) # to import Excel data
library(ggplot2) # for nice graphs
In this walkthrough we will only scratch the surface of what is possible in terms of mapping data. A great resource you may want to refer to if you do this more seriously is Geocomputing in R.
We shall use some data on road traffic accidents in Greater Manchester. The government provides detailed data on these. The datafile comes from the Government Data website. Download the 'Road Safety Data - Accident 1979 2020.csv` file. This file contains information on every recorded road traffic accident between 2010 and 2021 in Greater Manchester.
Save this file
(STATS19AccDataJan2010Dec2021forGMServers.csv
) in your
working directory and load it into your environment.
setwd("C:/YOUR/DIRECTORY")
accdata <- read.csv("STATS19AccDataJan2010Dec2021forGMServers.csv")
Let's look at the structure of the dataset.
head(accdata)
## Accident.Index Year Severity NumberVehicles NumberCasualties OutputDate Day
## 1 102262412010 2010 3 2 1 01/01/2010 6
## 2 102262562010 2010 3 2 1 01/01/2010 6
## 3 102264322010 2010 3 2 1 01/01/2010 6
## 4 107264182010 2010 3 3 1 01/01/2010 6
## 5 114261842010 2010 3 1 1 01/01/2010 6
## 6 114263302010 2010 3 1 1 01/01/2010 6
## OutputTime Easting Northing LocalAuthority Road1Class Road1Number
## 1 13:10 382347 390025 102 5 5166
## 2 11:10 381892 390582 102 7 0
## 3 17:30 385840 403134 102 4 664
## 4 13:49 377762 403302 107 4 666
## 5 01:55 355982 404620 114 4 577
## 6 02:25 362380 407476 114 5 5239
## CarriagewayType SpeedLimit JunctionDetail JunctionControl Road2Class
## 1 3 50 6 2 3
## 2 6 30 3 4 7
## 3 3 30 3 4 7
## 4 3 30 3 4 1
## 5 6 30 0 0 0
## 6 6 30 0 0 0
## Road2Number PedCrossingHumanControl PedCrossingPhysicalFacilities
## 1 5103 0 0
## 2 0 0 0
## 3 0 0 0
## 4 60 0 0
## 5 0 0 0
## 6 0 0 0
## LightingCondition WeatherCondition RoadSurface SpecialConditions
## 1 1 1 4 0
## 2 1 1 4 0
## 3 4 1 2 0
## 4 3 9 1 0
## 5 7 9 1 0
## 6 4 1 4 0
## CarriagewayHazard PlaceReported
## 1 0 1
## 2 0 1
## 3 0 2
## 4 0 2
## 5 0 2
## 6 0 1
accdata
has 45,626 observations (or rows) and 27
variables (or columns). Each row represents an accident and for each
accident we have the following information.
names(accdata)
## [1] "Accident.Index" "Year"
## [3] "Severity" "NumberVehicles"
## [5] "NumberCasualties" "OutputDate"
## [7] "Day" "OutputTime"
## [9] "Easting" "Northing"
## [11] "LocalAuthority" "Road1Class"
## [13] "Road1Number" "CarriagewayType"
## [15] "SpeedLimit" "JunctionDetail"
## [17] "JunctionControl" "Road2Class"
## [19] "Road2Number" "PedCrossingHumanControl"
## [21] "PedCrossingPhysicalFacilities" "LightingCondition"
## [23] "WeatherCondition" "RoadSurface"
## [25] "SpecialConditions" "CarriagewayHazard"
## [27] "PlaceReported"
Let's look at a particular accident. Let's randomly pick the 201st accident in our dataset.
accdata[201,] # this used [row,cols] as cols is empty it picks all cols
## Accident.Index Year Severity NumberVehicles NumberCasualties OutputDate Day
## 201 104268512010 2010 3 2 2 22/01/2010 6
## OutputTime Easting Northing LocalAuthority Road1Class Road1Number
## 201 21:40 392197 404013 104 6 0
## CarriagewayType SpeedLimit JunctionDetail JunctionControl Road2Class
## 201 6 30 3 4 7
## Road2Number PedCrossingHumanControl PedCrossingPhysicalFacilities
## 201 0 0 0
## LightingCondition WeatherCondition RoadSurface SpecialConditions
## 201 4 1 1 0
## CarriagewayHazard PlaceReported
## 201 0 1
The accident took place in 2010, involved 2 vehicles and happened in an area with a 30km/h speed limit. But then there are some variable values which as such don't make sense. What does it mean that the Lightning Condition was "4", or the "WeatherCondition" was "1"? To figure out what this means you need to look at the Data Dictionary. From there we can find out that it was dark but street lights were present and lit (Lightning Condition = 4) and that the weather was fine without high winds (WeatherCondition = 1).
The information we will use in this walkthrough is the location of
the accident which is encapsulated in the Easting
and
Northing
variable.
accdata[201,c("Easting","Northing")]
## Easting Northing
## 201 392197 404013
You will need to fiind a place which helps you to locate where this place (Easting 392197, Northing 404013) is. You can search the internet for a useful reference. One such possible place is this Grid Reference Finder. This lace is on Chamber Rd in Greater Manchester.
Using that reference to identify where E-0, N-0 is you will find this place to be in the Atlantic Ocean to the south west of Great Britain.
The Easting and Northing reference is basically some measure of distance from the (0,0) reference point.
Here we are using the extremely powerful ggplot
function. This is a general plotting package in R and not really
specialised for mapping. However, you will see that, given the
Easting
and Northing
information is sufficient
to display geographic features of this dataset.
ggplot(accdata, aes(x = Easting, y=Northing)) + geom_point(size=0.5)
You can basically see where in Manchester accidents happened. In essence you can see the Manchester Road network. The next image adds a few flourishes to the picture.
ggplot(accdata, aes(x = Easting, y=Northing,color = Road1Class)) +
geom_point(size=0.5) +
ggtitle("Locations of vehicle accidents in Greater Manchester")
Darker roads (with lower Road1Class
category) are main
roads. You can clearly identify the motorway ring road.
A great tool to visualise the differences between regional entities (e.g. countries) is to produce a map where a color scale represents the values of some statistic. The first time you create maps, we are afraid, you will have to struggle a little. But the rewards are great!
We start by uploading a few libraries we will need to work with maps.
We follow the advice of the Lovelace et al. and use
the tmap
package. Recall that you will have to install
these packages first in case you havn't done so yet.
library(tmap) # mapping package
library(sf) # required to deal with shape files
library(spData) # delivers shape files
library(maps)
In order to display maps we need a crucial ingredient, we need to
tell R how the countries or regions we wish to display are shaped. Such
information is saved in shape files. The tmp
package has
shape information for countries preloaded and you can access those as
follows, calling world
:
d_world <- world # save country shape files
head(world)
## Simple feature collection with 6 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -18.28799 xmax: 180 ymax: 83.23324
## Geodetic CRS: WGS 84
## # A tibble: 6 × 11
## iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 FJ Fiji Oceania Oceania Melanesia Sove… 1.93e4 8.86e5 70.0
## 2 TZ Tanzania Africa Africa Eastern … Sove… 9.33e5 5.22e7 64.2
## 3 EH Western S… Africa Africa Northern… Inde… 9.63e4 NA NA
## 4 CA Canada North Am… Americas Northern… Sove… 1.00e7 3.55e7 82.0
## 5 US United St… North Am… Americas Northern… Coun… 9.51e6 3.19e8 78.8
## 6 KZ Kazakhstan Asia Asia Central … Sove… 2.73e6 1.73e7 71.6
## # ℹ 2 more variables: gdpPercap <dbl>, geom <MULTIPOLYGON [°]>
Let's see what information is contained in this file.
names(d_world)
## [1] "iso_a2" "name_long" "continent" "region_un" "subregion" "type"
## [7] "area_km2" "pop" "lifeExp" "gdpPercap" "geom"
There are a number of variables and pieces of iformation associated
to the each country, such as the name, a two letter abbreviation
(iso_a2
), but also its size (area_km2
) and its
population (pop
). The core of the geographic information is
in the geom
variable. Let's look at Norway.
nor.geom <- d_world$geom[d_world$name_long == "Norway"]
nor.geom
## Geometry set for 1 feature
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 4.992078 ymin: 58.07888 xmax: 31.29342 ymax: 80.65714
## Geodetic CRS: WGS 84
There is a lot of detail (not all shown). But it is useful to understand how the geographic information is stored. It is stored as polygons, where the coordinates of the edge points are saved.
length(nor.geom[[1]])
## [1] 4
nor.geom[[1]][[1]]
## [[1]]
## [,1] [,2]
## [1,] 15.14282 79.67431
## [2,] 13.71852 79.66039
## [3,] 13.17077 80.01046
## [4,] 10.44453 79.65239
## [5,] 11.22231 78.86930
## [6,] 13.17060 78.02493
## [7,] 14.66956 77.73565
## [8,] 13.76259 77.38035
## [9,] 15.91315 76.77045
## [10,] 17.11820 76.80941
## [11,] 17.59441 77.63796
## [12,] 18.47172 77.82669
## [13,] 19.02737 78.56260
## [14,] 21.54383 78.95611
## [15,] 18.25183 79.70175
## [16,] 16.99085 80.05086
## [17,] 15.52255 80.01608
## [18,] 15.14282 79.67431
You can see here that, for the purpose of a world map, the shape of Fiji is saved in 4 polygons and futrher you can see that the first of these is described by 18 edge points.
d_nor <- d_world %>% filter(name_long == "Norway")
map1 <- tm_shape(d_nor) + # basic map
tm_borders() # adds borders
map1
Here you can indeed see that Norway is described by four shapes.
How many shapes are required to descripe the Philippines?
Repeat what we did for Norway. Create an object which isolates the
geom
information for the Philippines and then check its
length, using length(nor.geom[[1]])
.
Of course, just creating a map of one country is somewhat boring.
Let's create a map for only the countries in Africa. Replace the
XXXX
with the appropriate code.
d_africa <- d_world %>% filter(XXXX == "Africa")
map2 <- tm_shape(XXXX) + # basic map
tm_borders() # adds borders
map2
So far we are merely displaying geograhic information. Let's add some
other information and colour the countries according to some variable,
for instance the GDP per capita (gdpPercap
in the
d_world
file).
map3 <- tm_shape(d_africa) +
tm_borders() +
tm_fill(col = "gdpPercap") +
tm_style("cobalt") +
tm_layout(title = "GDP per capita across Africa")
map3
As usual there are multiple ways in which you can plot information. The
ggplot
package also has a function with which to plot
information in shape files. Here is how you can create a similar plot
using ggplot
.
ggplot() +
geom_sf(data = d_africa, aes(fill = gdpPercap)) +
ggtitle("Per Capita GDP in Africa") +
coord_sf()
This is already very interesting. You should try and replace the GDP
per capita information with the other variables contained in the
d_world
dataset.
This was fairly straightforward as the world
function
presented all of this information on a platter for us. What if you wish
to show regions in the UK rather than countries? And what if you want to
show some other variable?
In the next section you will learn how to do that. But from this example you should already be able to see what we need for that purpose.
$geom
variable).Let's work on an example where we need want to use geographic
entities for which we first need to import the shape file. The package
that will facilitate this is the sf
package/
library(sf)
On this occasion we will want to display information for small areas in the UK, in particular Lower Layer Super Output Area (LSOAs). In order to find the shape files for these areas you will have to search the internet for such a file, for instance search for "UK LSOA shape file" and you will eventually find a place, such as data.gov.uk.
From there you can download a shp file. Here we downloaded the
"Full_Clipped" file. This will download a zipped folder called
"infuse_lsoa_lyr_2011_clipped". You need to unzip/extract this entire
folder into your working directory. In your working directory you then
have a folder called "infuse_lsoa_lyr_2011_clipped" and in that folder
will be four files with extensions dbf
, prj
,
shp
and shx
. Make sure all these files are in
that folder. Then you can load the shape file using the following
line.
uk.lsoa <- st_read("C:/YOUR/WORKING/DIRECTORY/infuse_lsoa_lyr_2011_clipped/infuse_lsoa_lyr_2011_clipped.shp")
This should have downloaded 42,619 LSOAs and for each you have four
pieces of information, the geo_code
which is a unique
identifier, geo_label
which is a name,
geo_labelw
(which is a secondary label, NA for most LSOAs)
and finally geometry
which contains the multipolygon
information for each LSOAs.
As there are so many LSOAs the above file is rather big. We have
selected the LSOAs which we want to use below (using data on ) and saved
these in the NEcities.lsoa
object which you can load from
the
ECLR page. Place that file in your working directory.
load("NEcities_lsoa.Rdata")
For instance the first entry is:
NEcities.lsoa[1,]
## Simple feature collection with 1 feature and 3 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 335745.4 ymin: 389306.6 xmax: 336494.7 ymax: 390117
## Projected CRS: OSGB36 / British National Grid
## geo_code geo_label geo_labelw geometry
## 1 E01006512 Liverpool 031A <NA> MULTIPOLYGON (((336103.4 38...
You could plot this with any of the mapping techniques above, for
instance using ggplot
.
ggplot() +
geom_sf(data = uk.lsoa) + # or data = NEcities.lsoa
ggtitle("LSOAs in the UK") +
coord_sf()
But there is no need to do this yet. And if you tried it with
data = uk.lsoa
it would take quite a long time. We will
create a useful map shortly.
Now that we have the geometry information for LSOAs we can merge this with some information characterising these areas. Here we shall import information on the level of deprivation in each LSOAs. The index of multiple deprivation (IMD) is a measure which attempts to express how deprived (on a number of issues) an area is. Performing a web search "UK index of multiple deprivation" will lead you to a number of places where you can download this information, for instance this www.gov.uk site from which you can download the 2019 version of the IMD ("File 1: Index of multiple deprivation"). Download the xlsx file and save it to your working directory.
The actual data are in the "IMD2019" worksheet.
imd <- read_excel("C:/Rcode/ECLR/data/IMD2019_Index_of_Multiple_Deprivation.xlsx", sheet = "IMD2019")
names(imd) <- make.names(names(imd)) # this removes spaces from names
str(imd)
## tibble [32,844 × 6] (S3: tbl_df/tbl/data.frame)
## $ LSOA.code..2011. : chr [1:32844] "E01000001" "E01000002" "E01000003" "E01000005" ...
## $ LSOA.name..2011. : chr [1:32844] "City of London 001A" "City of London 001B" "City of London 001C" "City of London 001E" ...
## $ Local.Authority.District.code..2019. : chr [1:32844] "E09000001" "E09000001" "E09000001" "E09000001" ...
## $ Local.Authority.District.name..2019. : chr [1:32844] "City of London" "City of London" "City of London" "City of London" ...
## $ Index.of.Multiple.Deprivation..IMD..Rank : num [1:32844] 29199 30379 14915 8678 14486 ...
## $ Index.of.Multiple.Deprivation..IMD..Decile: num [1:32844] 9 10 5 3 5 3 2 3 3 3 ...
The variable names are all on the long side, so let's change the names to shorter ones.
names(imd) <- c("geo_code", "LSOA.name", "LAD.code", "LAD.name", "IMD.rank", "IMD.decile")
str(imd)
## tibble [32,844 × 6] (S3: tbl_df/tbl/data.frame)
## $ geo_code : chr [1:32844] "E01000001" "E01000002" "E01000003" "E01000005" ...
## $ LSOA.name : chr [1:32844] "City of London 001A" "City of London 001B" "City of London 001C" "City of London 001E" ...
## $ LAD.code : chr [1:32844] "E09000001" "E09000001" "E09000001" "E09000001" ...
## $ LAD.name : chr [1:32844] "City of London" "City of London" "City of London" "City of London" ...
## $ IMD.rank : num [1:32844] 29199 30379 14915 8678 14486 ...
## $ IMD.decile: num [1:32844] 9 10 5 3 5 3 2 3 3 3 ...
Note how we gave the LSOA code the same name as that code has in the Shape file. This will make the next step easier.
We now have two datasets (ul.lsoa
and imd
)
which both have information on LSOAs. The former the geometry and the
latter the IMD. As both have the geo_code
we can use that
to merge the data.
Before merging we shall reduce the dataset somewhat. As the
imd
dataset also has information on local authorities (e.g.
"Liverpool") we will first create a subset of these.
imd.north <- imd %>% filter(LAD.name %in% c("Sheffield", "Leeds", "Liverpool", "Newcastle upon Tyne"))
How many LSOAs are in this dataset?
Now we shall merge the data with NEcities.lsoa
. In fact
NEcities.lsoa
only contains geometries for these four
northern cities.
north.lsoa <- merge(NEcities.lsoa, imd.north)
str(north.lsoa)
## Classes 'sf' and 'data.frame': 1300 obs. of 9 variables:
## $ geo_code : chr "E01006512" "E01006513" "E01006514" "E01006515" ...
## $ geo_label : chr "Liverpool 031A" "Liverpool 060A" "Liverpool 037A" "Liverpool 037B" ...
## $ geo_labelw: chr NA NA NA NA ...
## $ LSOA.name : chr "Liverpool 031A" "Liverpool 060A" "Liverpool 037A" "Liverpool 037B" ...
## $ LAD.code : chr "E08000012" "E08000012" "E08000012" "E08000012" ...
## $ LAD.name : chr "Liverpool" "Liverpool" "Liverpool" "Liverpool" ...
## $ IMD.rank : num 14664 11173 3299 1875 330 ...
## $ IMD.decile: num 5 4 2 1 1 8 2 3 2 2 ...
## $ geometry :sfc_MULTIPOLYGON of length 1300; first list element: List of 1
## ..$ :List of 1
## .. ..$ : num [1:338, 1:2] 336103 336103 336103 336104 336105 ...
## ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA
## ..- attr(*, "names")= chr [1:8] "geo_code" "geo_label" "geo_labelw" "LSOA.name" ...
We have now a new object with the merged information on these LSOAs.
ggplot() +
geom_sf(data = north.lsoa, aes(fill = IMD.decile)) +
ggtitle("IMD, 2019, Northern Cities") +
coord_sf()
As the four cities are quite dispersed the map is not so informative
yet. It would be better to plot four separate maps. In
ggplot
this is easily achieved using the facet
method.
ggplot() +
geom_sf(data = north.lsoa, aes(fill = IMD.decile)) +
facet_wrap(vars(LAD.name)) +
ggtitle("IMD, 2019, Northern Cities") +
coord_sf()
It appears as the default of the facet_wrap
function is
to keep identical scales for all four sub plots. On many occasions this
is a very sensible default. Here, however, it isn't. Going to the help
function (type ?facet_wrap
in the console) it appears as if
one may want to try the option scales = "free"
in
facet_wrap
.
ggplot() +
geom_sf(data = north.lsoa, aes(fill = IMD.decile)) +
facet_wrap(vars(LAD.name), scales = "free") +
ggtitle("IMD, 2019, Northern Cities") +
coord_sf()
If you do that you will receive an error message
"Error in draw_panels()
: ! facet_wrap()
can't use free scales with coord_sf()
."
So, it appears as if free scales do not work with this. There are
multiple ways to deal with this. Here we use the following strategy.
First we create subsets of north.lsoa
, one for each city.
Then we create four separate pictures, which we then arrange into one
bigger one.
she <- north.lsoa %>% filter(LAD.name == "Sheffield")
leeds <- north.lsoa %>% filter(LAD.name == "Leeds")
liv <- north.lsoa %>% filter(LAD.name == "Liverpool")
newc <- north.lsoa %>% filter(LAD.name == "Newcastle upon Tyne")
p.she <- ggplot() +
geom_sf(data = she, aes(fill = IMD.decile),size = 0) +
ggtitle("IMD, 2019, Sheffield")
p.leeds <- ggplot() +
geom_sf(data = leeds, aes(fill = IMD.decile),linewidth = 0.01) +
ggtitle("IMD, 2019, Leeds")
p.liv <- ggplot() +
geom_sf(data = liv, aes(fill = IMD.decile)) +
ggtitle("IMD, 2019, Liverpool")
p.newc <- ggplot() +
geom_sf(data = newc, aes(fill = IMD.decile)) +
ggtitle("IMD, 2019, Newcastle")
Now we have the four graphs and want to combine them into one. You
can use the grid.arrange
function for this purpose.
library(gridExtra)
p.imd <- grid.arrange(p.she, p.leeds, p.liv, p.newc, nrow = 2)
Low deciles (dark shading) corresponds to the most deprived areas.
In this walkthrough you learned how to create maps in R. Importantly you need a shape file that describes the geography of your entities you want to show. You can then colour these on a pap according to certain variables. Usually you will have to
tmp
or ggplot
to create the mapEnjoy this new skill.
This walkthrough is part of the ECLR page.
Lovelace, R., Nowosad, J and Muenchow, J., Geocomputation with R, https://r.geocompx.org/.