Multivariate Dot-Density Maps in R with sf & ggplot2

By Paul Campbell | May 2, 2018

Background

Last June I did a blog post about building dot-denisty maps in R using UK Census data. It has proven to be a fairly popular post, most likely due to the maps looking like something you’re more likely to see in the Tate Modern…

R dot density map

Not only do these maps looks beautiful, but there is a strong argument that they do a better job of representing data compared to the more common choropleth methods of filling geographical regions with one colour based on one variable.

The pièce de résistance of dot-density mapping is that it does not suffer from the tendency of over-emphasising the influence of large, yet sparsely populated areas, as colour-coverage is dictated by count, not area size.

When applied to election mapping, this gives a fairer assessment of the ‘popular vote’ when compared to a standard choropleth map that will fill entire constituencies with the colour of the winning party, regardless of how close the contest was or how many people voted.

A good example of this is the webcomic xkcd and cartographer Kenneth Field’s recent interpretations of the 2016 US Presidential Election (see images below, respectively), both of which set twitter alight with debate.


You can also have a gander at the Mapping tool we built last year to look at UK election results from a dot-density context.

And for a more enlightented discussion on the troubles and strifes of accurate mapping, check out the latest FT chart doctor article, where the FT’s Alan Smith talks with professor Mark Monmonier, author of the classic book How to Lie With Maps. (while you’re at it, have a look at Steven Bernard’s piece on how the FT’s always-on-point maps are made here).


R’s New Spatial Workflow

With all of this in mind, I thought it would be a good time to update the previous blog post, this time utilising the relatively new simple features (sf) R package. sf makes it a lot easier to do geospatial analysis within a tidy framework, ergo making it work seamlessly with the tidyverse, as each geospatial element is bundled into a list and treated as a single observation of a geographic variable in a data frame. No more fortifying malarky.

This means we can go from raw data -> dot density map with a lot less code and stress than ever before. So here’s a quick demo of how to get it done, this time as a map of 2016 UK General Election results in London Constituancies…


Load Packages and Get Some Data

First lets get some election data and a constituency level shapefile then select/rename the columns we need in each and join them together.

I filter each dataset to the London region but if you’re doing this yourself and want to map another region, you can simply switch London out for the region of your choice and continue on.

library(tidyverse) # dev version of ggplot2 required devtools::install_github('hadley/ggplot2')
library(sf)
extrafont::loadfonts("win")

# election results filtered to London region
ge_data <- read_csv("http://researchbriefings.files.parliament.uk/documents/CBP-7979/HoC-GE2017-constituency-results.csv") %>% 
  filter(region_name == "London") %>% 
  select(ons_id, constituency_name, first_party, Con = con, Lab = lab, LD = ld, UKIP = ukip, Green = green)

# shapefile filtered to London region
# data available here: https://www.dropbox.com/s/4iajcx25grpx5qi/uk_650_wpc_2017_full_res_v1.8.zip?dl=0
uk <- st_read("../../data/blog_data/uk_650_wpc_2017_full_res_v1.8.shp", stringsAsFactors = FALSE, quiet = TRUE) %>% 
  st_transform(4326) %>% 
  filter(REGN == "London") %>% 
  select(ons_id = PCONCODE)

# merge the data
sf_data <- left_join(ge_data, uk) %>% 
  st_as_sf() # I'm losing sf class after join so make sf object again

head(sf_data)
## Simple feature collection with 6 features and 8 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -0.2050868 ymin: 51.34552 xmax: 0.2176442 ymax: 51.56706
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 6 x 9
##   ons_id   constituency_name     first_party   Con   Lab    LD  UKIP Green
##   <chr>    <chr>                 <chr>       <int> <int> <int> <int> <int>
## 1 E140005~ Barking               Lab         10711 32319   599  3031   724
## 2 E140005~ Battersea             Lab         22876 25292  4401   357   866
## 3 E140005~ Beckenham             Con         30632 15545  4073     0  1380
## 4 E140005~ Bermondsey and Old S~ Lab          7581 31161 18189   838   639
## 5 E140005~ Bethnal Green and Bow Lab          7576 42969  2982   894  1516
## 6 E140005~ Bexleyheath and Cray~ Con         25113 16040  1201  1944   601
## # ... with 1 more variable: geometry <MULTIPOLYGON [°]>

Generating Coordinates for each Dot

Here we create a data frame with the number of dots we want plotted in each constituency for each party. Dividing total vote count by 100 means that each dot will represent 100 votes. We then apply a random rounding algorithm on the floats to avoid any systematic bias in overall dot counts. Then we plug this data into a purrr::map_df call and let it pipe it’s way to a nice tidy tibble with coordinates columns and a categorical column for the politcal party assignment of each dot. Finally we randomise the order of rows with slice, again to avoid any bias in plotting order.

It took me a while to figure how to do the final stage in one pipe. The tricky part was realising that the ‘geometry set’ produced after the st_sample stage (generation of coordinates) has the top level ‘geometry type’ of GEOMETRY, but in order for us to be able to scrape the the coordinates with the st_coordinates function, we must first simplify the geometry type to POINT with st_cast function…

# credit to Jens von Bergmann for this algo https://github.com/mountainMath/dotdensity/blob/master/R/dot-density.R
random_round <- function(x) {
    v=as.integer(x)
    r=x-v
    test=runif(length(r), 0.0, 1.0)
    add=rep(as.integer(0),length(r))
    add[r>test] <- as.integer(1)
    value=v+add
    ifelse(is.na(value) | value<0,0,value)
    return(value)
  }

# data frame of number of dots to plot for each party (1 for every 100 votes)
num_dots <- as.data.frame(sf_data) %>% 
  select(Con:Green) %>% 
  mutate_all(funs(. / 100)) %>% 
  mutate_all(random_round)

# generates data frame with coordinates for each point + what party it is assiciated with
sf_dots <- map_df(names(num_dots), 
                  ~ st_sample(sf_data, size = num_dots[,.x], type = "random") %>% # generate the points in each polygon
                    st_cast("POINT") %>%                                          # cast the geom set as 'POINT' data
                    st_coordinates() %>%                                          # pull out coordinates into a matrix
                    as_tibble() %>%                                               # convert to tibble
                    setNames(c("lon","lat")) %>%                                  # set column names
                    mutate(Party = .x)                                            # add categorical party variable
                  ) %>% 
  slice(sample(1:n())) # once map_df binds rows randomise order to avoid bias in plotting order

head(sf_dots)
## # A tibble: 6 x 3
##       lon   lat Party
##     <dbl> <dbl> <chr>
## 1 -0.0721  51.5 Lab  
## 2 -0.431   51.6 Lab  
## 3 -0.0686  51.6 Green
## 4 -0.0118  51.5 Lab  
## 5 -0.293   51.6 Lab  
## 6 -0.321   51.6 Con

We’re now ripe for plotting with ggplot2.


Visualise the Votes

Here’s my ggplot2 code for the map output. Plotting this many points on a standard sized plot image won’t be particularly insightful as there will be severe over-plotting. So play around with your image size until it’s looking good, then adjust the text and legend sizes to compensate for the enlarged plot

# colour palette for our party points
pal <- c("Con" = "#0087DC", "Lab" = "#DC241F", "LD" = "#FCBB30", "UKIP" = "#70147A", "Green" = "#78B943")

# plot it and save as png big enough to avoid over-plotting of the points
p <- ggplot() +
  geom_sf(data = sf_data, fill = "transparent",colour = "white") +
  geom_point(data = sf_dots, aes(lon, lat, colour = Party)) +
  scale_colour_manual(values = pal) +
  coord_sf(crs = 4326, datum = NA) +
  theme_void(base_family = "Iosevka", base_size = 48) +
  labs(x = NULL, y = NULL,
       title = "UK General Election 2017\n",
       subtitle = "London Constituencies\n1 dot = 100 votes",
       caption = "Map by Culture of Insight @PaulCampbell91 | Data Sources: House of Commons Library, Alasdair Rae") +
  guides(colour = guide_legend(override.aes = list(size = 18))) +
  theme(legend.position = c(0.82, 1.03), legend.direction = "horizontal",
        plot.background = element_rect(fill = "#212121", color = NA), 
        panel.background = element_rect(fill = "#212121", color = NA),
        legend.background = element_rect(fill = "#212121", color = NA),
        legend.key = element_rect(fill = "#212121", colour = NA),
        plot.margin = margin(1, 1, 1, 1, "cm"),
        text =  element_text(color = "white"),
        title =  element_text(color = "white"),
        plot.title = element_text(hjust = 0.5),
        plot.caption = element_text(size = 32)
  )

ggsave("../../static/img/party_points.png", plot = p, dpi = 320, width = 80, height = 70, units = "cm")

The results should look something like this…

R sf dot density election map london UK

Job’s a good’un. Let’s compare it to a choropleth map of London seat winners…

ggplot() +
  geom_sf(data = sf_data, aes(fill = first_party), colour = "white") +
  scale_fill_manual(values = pal, name = "Seat Winner") +
  coord_sf(crs = 4326, datum = NA) +
  theme_void() +
  theme(legend.position = c(0.8, 0.9), legend.direction = "horizontal")

What do we think is the most insightful map? Luckily we don’t have to choose one or the other, can use both! No one map will be able to give the you all the answers so I find that it’s best to combine techniques for maximum insight. The choropleth gives us a clear indication as to who won where, and the dot-density looks under the hood and gives us an idea of the count and diversity of votes within each constituency.


That’s all for now. I know I didn’t go into great detail about the code so if you have any questions or want to kick off a heated mapping debate, please do leave a comment below or catch me on twitter.


A few shout outs


A presto!

comments powered by Disqus