4 Maps

There are numerous ways to make a map with plotly – each with it’s own strengths and weaknesses. Generally speaking the approaches fall under two categories: integrated or custom. Integrated maps leverage plotly.js’ built-in support for rendering a basemap layer. Currently there are two supported ways of making integrated maps: either via Mapbox or via an integrated d3.js powered basemap. The integrated approach is convenient if you need a quick map and don’t necessarily need sophisticated representations of geo-spatial objects. On the other hand, the custom mapping approach offers complete control since you’re providing all the information necessary to render the geo-spatial object(s). Section 4.2 covers making sophisticated maps (e.g., cartograms) using the sf R package, but it’s also possible to make custom plotly maps via other tools for geo-computing (e.g., sp, ggmap, etc).

It’s worth noting that plotly aims to be a general purpose visualization library, and thus, doesn’t aim to be the most fully featured geo-spatial visualization toolkit. That said, there are benefits to using plotly-based maps since the mapping APIs are very similar to the rest of plotly, and you can leverage larger plotly ecosystem (e.g., linking views client side like Figure 16.23). However, if you run into limitations with plotly’s mapping functionality, there is a very rich set of tools for interactive geospatial visualization in R, including but not limited to: leaflet, mapview, mapedit, tmap, and mapdeck (Robin Lovelace 2019).

4.1 Integrated maps

4.1.1 Overview

If you have fairly simple latitude/longitude data and want to make a quick map, you may want to try one of plotly’s integrated mapping options (i.e., plot_mapbox() and plot_geo()). Generally speaking, you can treat these constructor functions as a drop-in replacement for plot_ly() and get a dynamic basemap rendered behind your data. Furthermore, all the scatter-based layers we learned about in Section 3 work as you’d expect it to with plot_ly().¹² For example, Figure 4.1 uses plot_mapbox() and add_markers() to create a bubble chart:

plot_mapbox(maps::canada.cities) %>%
  add_markers(
    x = ~long, 
    y = ~lat, 
    size = ~pop, 
    color = ~country.etc,
    colors = "Accent",
    text = ~paste(name, pop),
    hoverinfo = "text"
  )

FIGURE 4.1: A mapbox powered bubble chart showing the population of various cities in Canada. For the interactive, see https://plotly-r.com/interactives/mapbox-bubble.html

The Mapbox basemap styling is controlled through the layout.mapbox.style attribute. The plotly package comes with support for 7 different styles, but you can also supply a custom URL to a custom mapbox style. To obtain all the pre-packaged basemap style names, you can grab them from the official plotly.js schema():

styles <- schema()$layout$layoutAttributes$mapbox$style$values
styles
#>  [1] "basic"             "streets"          
#>  [3] "outdoors"          "light"            
#>  [5] "dark"              "satellite"        
#>  [7] "satellite-streets" "open-street-map"  
#>  [9] "white-bg"          "carto-positron"   
#> [11] "carto-darkmatter"  "stamen-terrain"   
#> [13] "stamen-toner"      "stamen-watercolor"

Any one of these values can be used for a mapbox style. Figure 4.2 demonstrates the satellite earth imagery basemap.

layout(
  plot_mapbox(), 
  mapbox = list(style = "satellite")
)

FIGURE 4.2: Zooming in on earth satellite imagery using plot_mapbox(). For the interactive, see https://plotly-r.com/interactives/satellite.html

Figure 4.3 demonstrates how to create an integrated plotly.js dropdown menu to control the basemap style via the layout.updatemenus attribute. The idea behind an integrated plotly.js dropdown is to supply a list of buttons (i.e., menu items) where each button invokes a plotly.js method with some arguments. In this case, each button uses the relayout method to modify the layout.mapbox.style attribute.¹³

style_buttons <- lapply(styles, function(s) {
  list(
    label = s, 
    method = "relayout", 
    args = list("mapbox.style", s)
  )
})
layout(
  plot_mapbox(), 
  mapbox = list(style = "dark"),
  updatemenus = list(
    list(y = 0.8, buttons = style_buttons)
  )
)

FIGURE 4.3: Providing a dropdown menu to control the styling of the mapbox baselayer. For the interactive, see https://plotly-r.com/interactives/mapbox-style-dropdown.html

The other integrated mapping solution in plotly is plot_geo(). Compared to plot_mapbox(), this approach has support for different mapping projections, but styling the basemap is limited and can be more cumbersome. Figure 4.4 demonstrates using plot_geo() in conjunction with add_markers() and add_segments() to visualize flight paths within the United States. Whereas plot_mapbox() is fixed to a mercator projection, the plot_geo() constructor has a handful of different projection available to it, including the orthographic projection which gives the illusion of the 3D globe.

Click to show code

library(plotly)
library(dplyr)
# airport locations
air <- read.csv(
  'https://plotly-r.com/data-raw/airport_locations.csv'
)
# flights between airports
flights <- read.csv(
  'https://plotly-r.com/data-raw/flight_paths.csv'
)
flights$id <- seq_len(nrow(flights))

# map projection
geo <- list(
  projection = list(
    type = 'orthographic',
    rotation = list(lon = -100, lat = 40, roll = 0)
  ),
  showland = TRUE,
  landcolor = toRGB("gray95"),
  countrycolor = toRGB("gray80")
)

plot_geo(color = I("red")) %>%
  add_markers(
    data = air, x = ~long, y = ~lat, text = ~airport,
    size = ~cnt, hoverinfo = "text", alpha = 0.5
  ) %>%
  add_segments(
    data = group_by(flights, id),
    x = ~start_lon, xend = ~end_lon,
    y = ~start_lat, yend = ~end_lat,
    alpha = 0.3, size = I(1), hoverinfo = "none"
  ) %>%
  layout(geo = geo, showlegend = FALSE)

FIGURE 4.4: Using the integrated orthographic projection to visualize flight patterns on a ‘3D’ globe. For the interactive, see https://plotly-r.com/interactives/geo-flights.html

One nice thing about plot_geo() is that it automatically projects geometries into the proper coordinate system defined by the map projection. For example, in Figure 4.5 the simple line segment is straight when using plot_mapbox() yet curved when using plot_geo(). It’s possible to achieve the same effect using plot_ly() or plot_mapbox(), but the relevant marker/line/polygon data has to be put into an sf data structure before rendering (see Section 4.2.1 for more details).

map1 <- plot_mapbox() %>% 
  add_segments(x = -100, xend = -50, y = 50, yend = 75) %>%
  layout(
    mapbox = list(
      zoom = 0,
      center = list(lat = 65, lon = -75)
    )
  )

map2 <- plot_geo() %>% 
  add_segments(x = -100, xend = -50, y = 50, yend = 75) %>%
  layout(geo = list(projection = list(type = "mercator")))

library(htmltools)
browsable(tagList(map1, map2))

A comparison of plotly’s integrated mapping solutions: plot_mapbox() (top) and plot_geo() (bottom). The plot_geo() approach will transform line segments to correctly reflect their projection into a non-cartesian coordinate system.

FIGURE 4.5: A comparison of plotly’s integrated mapping solutions: plot_mapbox() (top) and plot_geo() (bottom). The plot_geo() approach will transform line segments to correctly reflect their projection into a non-cartesian coordinate system.

4.1.2 Choropleths

In addition to scatter traces, both of the integrated mapping solutions (i.e., plot_mapbox() and plot_geo()) have an optimized choropleth trace type (i.e., the choroplethmapbox and choropleth trace types). Comparatively speaking, choroplethmapbox is more powerful because you can fully specify the feature collection using GeoJSON, but the choropleth trace can be a bit easier to use if it fits your use case.

Figure 4.6 shows the population density of the U.S. via the choropleth trace using the U.S. state data from the datasets package (R Core Team 2016). By simply providing a z attribute, plotly_geo() objects will try to create a choropleth, but you’ll also need to provide locations and a locationmode. It’s worth noting that the locationmode is currently limited to countries and US states, so if you need to a different geo-unit (e.g., counties, municipalities, etc), you should use the choroplethmapbox trace type and/or use a “custom” mapping approach as discussed in Section 4.2.

density <- state.x77[, "Population"] / state.x77[, "Area"]

g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  lakecolor = toRGB('white')
)

plot_geo() %>%
  add_trace(
    z = ~density, text = state.name, span = I(0),
    locations = state.abb, locationmode = 'USA-states'
  ) %>%
  layout(geo = g)

FIGURE 4.6: A map of U.S. population density using the state.x77 data from the datasets package.

Choroplethmapbox is more flexible than choropleth because you supply your own GeoJSON definition of the choropleth via the geojson attribute. Currently this attribute must be a URL pointing to a geojson file. Moreover, the location should point to a top-level id attribute of each feature within the geojson file. Figure 4.7 demonstrates how we could visualize the same information as Figure 4.6, but this time using choroplethmapbox.

plot_ly() %>%
  add_trace(
    type = "choroplethmapbox",
    # See how this GeoJSON URL was generated at
    # https://plotly-r.com/data-raw/us-states.R
    geojson = paste(c(
      "https://gist.githubusercontent.com/cpsievert/",
      "7cdcb444fb2670bd2767d349379ae886/raw/",
      "cf5631bfd2e385891bb0a9788a179d7f023bf6c8/", 
      "us-states.json"
    ), collapse = ""),
    locations = row.names(state.x77),
    z = state.x77[, "Population"] / state.x77[, "Area"],
    span = I(0)
  ) %>%
  layout(
    mapbox = list(
      style = "light",
      zoom = 4,
      center = list(lon = -98.58, lat = 39.82)
    )
  ) %>%
  config(
    mapboxAccessToken = Sys.getenv("MAPBOX_TOKEN"),
    # Workaround to make sure image download uses full container
    # size https://github.com/plotly/plotly.js/pull/3746
    toImageButtonOptions = list(
      format = "svg", 
      width = NULL, 
      height = NULL
    )
  )

FIGURE 4.7: Another map of U.S. population density, this time using choroplethmapbox with a custom GeoJSON file.

Figures 4.6 and 4.7 aren’t an ideal way to visualize state population a graphical perception point of view. We typically use the color in choropleths to encode a numeric variable (e.g., GDP, net exports, average SAT score, etc) and the eye naturally perceives the area that a particular color covers as proportional to its overall effect. This ends up being misleading since the area the color covers typically has no sensible relationship with the data encoded by the color. A classic example of this misleading effect in action is in US election maps – the proportion of red to blue coloring is not representative of the overall popular vote (Newman 2016).

Cartograms are an approach to reducing this misleading effect and grants another dimension to encode data through the size of geo-spatial features. Section 4.2.2 covers how to render cartograms in plotly using sf and cartogram.

4.2 Custom maps

4.2.1 Simple features (sf)

The sf R package is a modern approach to working with geo-spatial data structures based on tidy data principles (Pebesma 2018; Wickham 2014 b). The key idea behind sf is that it stores geo-spatial geometries in a list-column of a data frame. This allows each row to represent the real unit of observation/interest – whether it’s a polygon, multi-polygon, point, line, or even a collection of these features – and as a result, works seamlessly inside larger tidy workflows.¹⁴ The sf package itself does not really provide geo-spatial data – it provides the framework and utilities for storing and computing on geo-spatial data structures in an opinionated way.

There are numerous packages for accessing geo-spatial data as simple features data structures. A couple notable examples include rnaturalearth and USAboundaries. The rnaturalearth package is better for obtaining any map data in the world via an API provided by https://www.naturalearthdata.com/ (South 2017). The USAboundaries package is great for obtaining map data for the United States at any point in history (Mullen and Bratt 2018). It doesn’t really matter what tool you use to obtain or create an sf object – once you have one, plot_ly() knows how to render it:

library(rnaturalearth)
world <- ne_countries(returnclass = "sf")
class(world)
#> [1] "sf"    "data.frame"
plot_ly(world, color = I("gray90"), stroke = I("black"), span = I(1))

FIGURE 4.8: Rendering all the world’s countries using plot_ly() and the ne_countries() function from the rnaturalearth package.

How does plot_ly() know how to render the countries? It’s because the geo-spatial features are encoded in special (geometry) list-column. Also, meta-data about the geo-spatial structure are retained as special attributes of the data. Figure 4.9 augments the print method for sf to data frames to demonstrate that all the information needed to render the countries (i.e., polygons) in Figure 4.8 is contained within the world data frame. Note also, that sf provides special dplyr methods for this special class of data frame so that you can treat data manipulations as if it were a ‘tidy’ data structure. One thing about this method is that the special ‘geometry’ column is always retained – if we try to just select the name column, then we get both the name and the geometry.

library(sf)
world %>%
  select(name) %>%
  print(n = 4)

FIGURE 4.9: A diagram of a simple features data frame. The geometry column tracks the spatial features attached to each row in the data frame.

There are actually 4 different ways to render sf objects with plotly: plot_ly(), plot_mapbox(), plot_geo(), and via ggplot2’s geom_sf(). These functions render multiple polygons using a single trace by default, which is fast, but you may want to leverage the added flexibility of multiple traces. For example, a given trace can only have one fillcolor, so it’s impossible to render multiple polygons with different colors using a single trace. For this reason, if you want to vary the color of multiple polygons, make sure the split by a unique identifier (e.g. name), as done in Figure 4.10. Note that, as discussed for line charts in Figure 3.2, using multiple traces automatically adds the ability to filter name via legend entries.

canada <- ne_states(country = "Canada", returnclass = "sf")
plot_ly(canada, split = ~name, color = ~provnum_ne)

FIGURE 4.10: Using split and color to create a choropleth map of provinces in Canada.

Another important feature for maps that may require you to split multiple polygons into multiple traces is the ability to display a different hover-on-fill for each polygon. By providing text that is unique within each polygon and specifying hoveron='fills', the tooltip behavior is tied to the trace’s fill (instead of displayed at each point along the polygon).

plot_ly(
  canada, 
  split = ~name, 
  color = I("gray90"), 
  text = ~paste(name, "is \n province number", provnum_ne),
  hoveron = "fills",
  hoverinfo = "text",
  showlegend = FALSE
)

FIGURE 4.11: Using split, text, and hoveron='fills' to display a tooltip specific to each Canadian province.

Although the integrated mapping approaches (plot_mapbox() and plot_geo()) can render sf objects, the custom mapping approaches (plot_ly() and geom_sf()) are more flexible because they allow for any well-defined mapping projection. Working with and understanding map projections can be intimidating for a causal map maker. Thankfully, there are nice resources for searching map projections in a human-friendly interface, like http://spatialreference.org/. Through this website, one can search desirable projections for a given portion of the globe and extract commands for projecting their geo-spatial objects into that projection. One way to perform the projection is to supply the relevant PROJ4 command to the st_transform() function in sf (PROJ contributors 2018).

# filter the world sf object down to canada
canada <- filter(world, name == "Canada")
# coerce cities lat/long data to an official sf object
cities <- st_as_sf(
  maps::canada.cities, 
  coords = c("long", "lat"),
  crs = 4326
)

# A PROJ4 projection designed for Canada
# http://spatialreference.org/ref/sr-org/7/
# http://spatialreference.org/ref/sr-org/7/proj4/
moll_proj <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 
+units=m +no_defs"

# perform the projections
canada <- st_transform(canada, moll_proj)
cities <- st_transform(cities, moll_proj)

# plot with geom_sf()
p <- ggplot() + 
  geom_sf(data = canada) +
  geom_sf(data = cities, aes(size = pop), color = "red", alpha = 0.3)
ggplotly(p)

FIGURE 4.12: The population of various Canadian cities rendered on a custom basemap using a Mollweide projection.

Some geo-spatial objects have an unnecessarily high resolution for a given visualization. In these cases, you may want to consider simplifying the geo-spatial object to improve the speed of the R code and responsiveness of the visualization. For example, we could recreate Figure 4.8 with a much higher resolution by specifying scale = "large" in ne_countries() this gives us a sf object with over 50 times more spatial coordinates than the default scale. The higher resolution allows us to zoom in better on more complex geo-spatial regions, but it allow leads to slower R code, larger HTML files, and slower responsiveness. Sievert (2018 b) explores this issue in more depth and demonstrates how to use the st_simplify() function from sf to simplify features before plotting them.

sum(rapply(world$geometry, nrow))
#> [1] 10586

world_large <- ne_countries(scale = "large", returnclass = "sf")
sum(rapply(world_large$geometry, nrow))
#> [1] 548121

Analogous to the discussion surrounding 3.2, it pays to be aware of the tradeoffs involved with rendering plotly graphics using one or many traces, and knowledgeable about how to leverage either approach. Specifically, by default, plotly attempts to render all simple features in a single trace, which is performant, but doesn’t have a lot of interactivity.

plot_mapbox(
  world_large, 
  color = NA, 
  stroke = I("black"), 
  span = I(0.5)
)

For those interested in learning more about geocomputation in R with sf and other great R packages like sp and raster, Robin Lovelace (2019) provides lots of nice and freely available learning resources (Pebesma and Bivand 2005; Hijmans 2019).

4.2.2 Cartograms

Cartograms distort the size of geo-spatial polygons to encode a numeric variable other than the land size. There are numerous types of cartograms and they are typically categorized by their ability to preserve shape and maintain contiguous regions. Cartograms has been shown to be an effective approach to both encode and teach about geo-spatial data, though the effects certainly vary by cartogram type (Nusrat S, Alam MJ, Kobourov S. 2018). The R package cartogram provides an interface to several popular cartogram algorithms (Jeworutzki 2018). A number of other R packages provide cartogram algorithms, but the great thing about cartogram is that all the functions can take an sf (or sp) object as input and return an sf object. This makes it incredibly easy to go from raw spatial objects, to transformed objects, to visual. Figure 4.13 demonstrates a continuous area cartogram of US population in 2014 using a rubber sheet distortion algorithm from James A. Dougenik, Nicholas R. Chrisman, Duane R. Niemeyer (1985).

library(cartogram)
library(albersusa)

us_cont <- cartogram_cont(usa_sf("laea"), "pop_2014")

plot_ly(us_cont) %>% 
  add_sf(
    color = ~pop_2014, 
    split = ~name, 
    span = I(1),
    text = ~paste(name, scales::number_si(pop_2014)),
    hoverinfo = "text",
    hoveron = "fills"
  ) %>%
  layout(showlegend = FALSE) %>%
  colorbar(title = "Population \n 2014")

FIGURE 4.13: A cartogram of US population in 2014. A cartogram sizes the area of geo-spatial objects proportional to some metric (e.g., population).

Figure 4.14 demonstrates a non-continuous Dorling cartogram of US population in 2014 from Dorling, D (1996). This cartogram does not try to preserve the shape of polygons (i.e., states), but instead uses circles instead to represent each geo-spatial object, then encodes the variable of interest (i.e., population) using the area of the circle.

us <- usa_sf("laea")
us_dor <- cartogram_dorling(us, "pop_2014")

plot_ly(stroke = I("black"), span = I(1)) %>% 
  add_sf(
    data = us, 
    color = I("gray95"),
    hoverinfo = "none"
  ) %>%
  add_sf(
    data = us_dor, 
    color = ~pop_2014,
    split = ~name, 
    text = ~paste(name, scales::number_si(pop_2014)), 
    hoverinfo = "text", 
    hoveron = "fills"
  ) %>%
  layout(showlegend = FALSE)

FIGURE 4.14: A dorling cartogram of US population in 2014. A dorling cartogram sizes the circles proportional to some metric (e.g., population).

Figure 4.15 demonstrates a non-continuous cartogram of US population in 2014 from Olson, J. M. (1976). In contrast to the Dorling cartogram, this approach does preserve the shape of polygons. The implementation behind Figure 4.15 is to simply take the implementation of Figure 4.14 and change cartogram_dorling() to cartogram_ncont().

FIGURE 4.15: A non-contiguous cartogram of US population in 2014 that preserves shape.

A popular class of contiguous cartograms that do not preserve shape are sometimes referred to as tile cartograms (aka tilegrams). At the time of writing, there doesn’t seem to be a great R package for computing tilegrams, but Pitch Interactive provides a nice web service where you can generate tilegrams from existing or custom data https://pitchinteractiveinc.github.io/tilegrams/. Moreover, the service allows you to download a TopoJSON file of the generated tilegram, which we can read in R and convert into an sf object via geojsonio (Chamberlain and Teucher 2018). Figure 4.16 demonstrates a tilegram of U.S. Population in 2016 exported directly from Pitch’s free web service.

library(geojsonio)
tiles <- geojson_read("~/Downloads/tiles.topo.json", what = "sp")
tiles_sf <- st_as_sf(tiles)
plot_ly(tiles_sf, split = ~name)

FIGURE 4.16: A tile cartogram of U.S. population in 2016.

References

Chamberlain, Scott, and Andy Teucher. 2018. Geojsonio: Convert Data from and to ’Geojson’ or ’Topojson’. https://CRAN.R-project.org/package=geojsonio.

Dorling, D. 1996. “Area Cartograms: Their Use and Creation.” Concepts and Techniques in Modern Geography (CATMOG).

Hijmans, Robert J. 2019. Raster: Geographic Data Analysis and Modeling. https://CRAN.R-project.org/package=raster.

James A. Dougenik, Nicholas R. Chrisman, Duane R. Niemeyer. 1985. “An Algorithm to Construct Continuous Area Cartograms.” The Professional Geographer.

Jeworutzki, Sebastian. 2018. Cartogram: Create Cartograms with R. https://CRAN.R-project.org/package=cartogram.

Mullen, Lincoln A., and Jordan Bratt. 2018. “USAboundaries: Historical and Contemporary Boundaries of the United States of America.” Journal of Open Source Software 3 (23): 314. https://doi.org/10.21105/joss.00314.

Newman, Mark. 2016. “Maps of the 2016 Us Presidential Election Results.” Blog. http://www-personal.umich.edu/~mejn/election/2016/.

Nusrat S, Alam MJ, Kobourov S. 2018. “Evaluating Cartogram Effectiveness.” IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2016.2642109.

Olson, J. M. 1976. “Noncontiguous Area Cartograms.” The Professional Geographer.

Pebesma, Edzer. 2018. Sf: Simple Features for R. https://CRAN.R-project.org/package=sf.

Pebesma, Edzer J., and Roger S. Bivand. 2005. “Classes and Methods for Spatial Data in R.” R News 5 (2): 9–13. https://CRAN.R-project.org/doc/Rnews/.

PROJ contributors. 2018. PROJ Coordinate Transformation Software Library. Open Source Geospatial Foundation. https://proj4.org/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Robin Lovelace, Jannes Muenchow, Jakub Nowosad. 2019. Geocomputation with R. Chapman and Hall/CRC.

Sievert, Carson. 2018b. “Learning from and Improving Upon Ggplotly Conversions.” Blog. https://blog.cpsievert.me/2018/01/30/learning-improving-ggplotly-geom-sf/.

South, Andy. 2017. Rnaturalearth: World Map Data from Natural Earth. https://CRAN.R-project.org/package=rnaturalearth.

Wickham, Hadley. 2014b. “Tidy Data.” The Journal of Statistical Software 59 (10). http://www.jstatsoft.org/v59/i10/.

Unfortunately, non-scatter traces currently don’t work with plot_mapbox()/ plot_geo() meaning that, for one, raster (i.e., heatmap) maps are not natively supported.↩︎
To see more examples of creating and using plotly.js’ integrated dropdown functionality to modify graphs, see https://plot.ly/r/dropdowns/↩︎
This is way more intuitive compared to older workflows based on, say using ggplot2::fortify() to obtain a data structure where a row to represents particular point along a feature and having another column track which point belongs to each feature (for example).↩︎