There are numerous ways to make a map with plotly – each with it’s own strengths and weaknesses. Generally speaking the approaches fall under two categories: integrated or custom. Integrated maps leverage plotly.js’ built-in support for rendering a basemap layer. Currently there are two supported ways of making integrated maps: either via Mapbox or via an integrated d3.js powered basemap. The integrated approach is convenient if you need a quick map and don’t necessarily need sophisticated representations of geo-spatial objects. On the other hand, the custom mapping approach offers complete control since you’re providing all the information necessary to render the geo-spatial object(s). Section 4.2 covers making sophisticated maps (e.g., cartograms) using the sf R package, but it’s also possible to make custom plotly maps via other tools for geo-computing (e.g., sp, ggmap, etc).
It’s worth noting that plotly aims to be a general purpose visualization library, and thus, doesn’t aim to be the most fully featured geo-spatial visualization toolkit. That said, there are benefits to using plotly-based maps since the mapping APIs are very similar to the rest of plotly, and you can leverage larger plotly ecosystem (e.g., linking views client side like Figure 16.23). However, if you run into limitations with plotly’s mapping functionality, there is a very rich set of tools for interactive geospatial visualization in R, including but not limited to: leaflet, mapview, mapedit, tmap, and mapdeck (Robin Lovelace 2019).
4.1 Integrated maps
If you have fairly simple latitude/longitude data and want to make a quick map, you may want to try one of plotly’s integrated mapping options (i.e.,
plot_geo()). Generally speaking, you can treat these constructor functions as a drop-in replacement for
plot_ly() and get a dynamic basemap rendered behind your data. Furthermore, all the scatter-based layers we learned about in Section 3 work as you’d expect it to with
plot_ly().12 For example, Figure 4.1 uses
add_markers() to create a bubble chart:
The Mapbox basemap styling is controlled through the
layout.mapbox.style attribute. The plotly package comes with support for 7 different styles, but you can also supply a custom URL to a custom mapbox style. To obtain all the pre-packaged basemap style names, you can grab them from the official plotly.js
styles <- schema()$layout$layoutAttributes$mapbox$style$values styles #>  "basic" "streets" #>  "outdoors" "light" #>  "dark" "satellite" #>  "satellite-streets" "open-street-map" #>  "white-bg" "carto-positron" #>  "carto-darkmatter" "stamen-terrain" #>  "stamen-toner" "stamen-watercolor"
Any one of these values can be used for a mapbox style. Figure 4.2 demonstrates the satellite earth imagery basemap.
Figure 4.3 demonstrates how to create an integrated plotly.js dropdown menu to control the basemap style via the
layout.updatemenus attribute. The idea behind an integrated plotly.js dropdown is to supply a list of buttons (i.e., menu items) where each button invokes a plotly.js method with some arguments. In this case, each button uses the relayout method to modify the
The other integrated mapping solution in plotly is
plot_geo(). Compared to
plot_mapbox(), this approach has support for different mapping projections, but styling the basemap is limited and can be more cumbersome. Figure 4.4 demonstrates using
plot_geo() in conjunction with
add_segments() to visualize flight paths within the United States. Whereas
plot_mapbox() is fixed to a mercator projection, the
plot_geo() constructor has a handful of different projection available to it, including the orthographic projection which gives the illusion of the 3D globe.
Click to show code
library(plotly) library(dplyr) # airport locations air <- read.csv( 'https://plotly-r.com/data-raw/airport_locations.csv' ) # flights between airports flights <- read.csv( 'https://plotly-r.com/data-raw/flight_paths.csv' ) flights$id <- seq_len(nrow(flights)) # map projection geo <- list( projection = list( type = 'orthographic', rotation = list(lon = -100, lat = 40, roll = 0) ), showland = TRUE, landcolor = toRGB("gray95"), countrycolor = toRGB("gray80") ) plot_geo(color = I("red")) %>% add_markers( data = air, x = ~long, y = ~lat, text = ~airport, size = ~cnt, hoverinfo = "text", alpha = 0.5 ) %>% add_segments( data = group_by(flights, id), x = ~start_lon, xend = ~end_lon, y = ~start_lat, yend = ~end_lat, alpha = 0.3, size = I(1), hoverinfo = "none" ) %>% layout(geo = geo, showlegend = FALSE)
One nice thing about
plot_geo() is that it automatically projects geometries into the proper coordinate system defined by the map projection. For example, in Figure 4.5 the simple line segment is straight when using
plot_mapbox() yet curved when using
plot_geo(). It’s possible to achieve the same effect using
plot_mapbox(), but the relevant marker/line/polygon data has to be put into an sf data structure before rendering (see Section 4.2.1 for more details).
map1 <- plot_mapbox() %>% add_segments(x = -100, xend = -50, y = 50, yend = 75) %>% layout( mapbox = list( zoom = 0, center = list(lat = 65, lon = -75) ) ) map2 <- plot_geo() %>% add_segments(x = -100, xend = -50, y = 50, yend = 75) %>% layout(geo = list(projection = list(type = "mercator"))) library(htmltools) browsable(tagList(map1, map2))
In addition to scatter traces, both of the integrated mapping solutions (i.e.,
plot_geo()) have an optimized choropleth trace type (i.e., the choroplethmapbox and choropleth trace types). Comparatively speaking, choroplethmapbox is more powerful because you can fully specify the feature collection using GeoJSON, but the choropleth trace can be a bit easier to use if it fits your use case.
Figure 4.6 shows the population density of the U.S. via the choropleth trace using the U.S. state data from the datasets package (R Core Team 2016). By simply providing a
plotly_geo() objects will try to create a choropleth, but you’ll also need to provide
locations and a
locationmode. It’s worth noting that the
locationmode is currently limited to countries and US states, so if you need to a different geo-unit (e.g., counties, municipalities, etc), you should use the choroplethmapbox trace type and/or use a “custom” mapping approach as discussed in Section 4.2.
Choroplethmapbox is more flexible than choropleth because you supply your own GeoJSON definition of the choropleth via the
geojson attribute. Currently this attribute must be a URL pointing to a geojson file. Moreover, the
location should point to a top-level id attribute of each feature within the geojson file. Figure 4.7 demonstrates how we could visualize the same information as Figure 4.6, but this time using choroplethmapbox.
plot_ly() %>% add_trace( type = "choroplethmapbox", # See how this GeoJSON URL was generated at # https://plotly-r.com/data-raw/us-states.R geojson = paste(c( "https://gist.githubusercontent.com/cpsievert/", "7cdcb444fb2670bd2767d349379ae886/raw/", "cf5631bfd2e385891bb0a9788a179d7f023bf6c8/", "us-states.json" ), collapse = ""), locations = row.names(state.x77), z = state.x77[, "Population"] / state.x77[, "Area"], span = I(0) ) %>% layout( mapbox = list( style = "light", zoom = 4, center = list(lon = -98.58, lat = 39.82) ) ) %>% config( mapboxAccessToken = Sys.getenv("MAPBOX_TOKEN"), # Workaround to make sure image download uses full container # size https://github.com/plotly/plotly.js/pull/3746 toImageButtonOptions = list( format = "svg", width = NULL, height = NULL ) )
Figures 4.6 and 4.7 aren’t an ideal way to visualize state population a graphical perception point of view. We typically use the color in choropleths to encode a numeric variable (e.g., GDP, net exports, average SAT score, etc) and the eye naturally perceives the area that a particular color covers as proportional to its overall effect. This ends up being misleading since the area the color covers typically has no sensible relationship with the data encoded by the color. A classic example of this misleading effect in action is in US election maps – the proportion of red to blue coloring is not representative of the overall popular vote (Newman 2016).
Cartograms are an approach to reducing this misleading effect and grants another dimension to encode data through the size of geo-spatial features. Section 4.2.2 covers how to render cartograms in plotly using sf and cartogram.
4.2 Custom maps
4.2.1 Simple features (sf)
The sf R package is a modern approach to working with geo-spatial data structures based on tidy data principles (Pebesma 2018; Wickham 2014b). The key idea behind sf is that it stores geo-spatial geometries in a list-column of a data frame. This allows each row to represent the real unit of observation/interest – whether it’s a polygon, multi-polygon, point, line, or even a collection of these features – and as a result, works seamlessly inside larger tidy workflows.14 The sf package itself does not really provide geo-spatial data – it provides the framework and utilities for storing and computing on geo-spatial data structures in an opinionated way.
There are numerous packages for accessing geo-spatial data as simple features data structures. A couple notable examples include rnaturalearth and USAboundaries. The rnaturalearth package is better for obtaining any map data in the world via an API provided by https://www.naturalearthdata.com/ (South 2017). The USAboundaries package is great for obtaining map data for the United States at any point in history (Mullen and Bratt 2018). It doesn’t really matter what tool you use to obtain or create an sf object – once you have one,
plot_ly() knows how to render it:
plot_ly() know how to render the countries? It’s because the geo-spatial features are encoded in special (geometry) list-column. Also, meta-data about the geo-spatial structure are retained as special attributes of the data. Figure 4.9 augments the print method for sf to data frames to demonstrate that all the information needed to render the countries (i.e., polygons) in Figure 4.8 is contained within the
world data frame. Note also, that sf provides special dplyr methods for this special class of data frame so that you can treat data manipulations as if it were a ‘tidy’ data structure. One thing about this method is that the special ‘geometry’ column is always retained – if we try to just select the
name column, then we get both the name and the geometry.
There are actually 4 different ways to render sf objects with plotly:
plot_geo(), and via ggplot2’s
geom_sf(). These functions render multiple polygons using a single trace by default, which is fast, but you may want to leverage the added flexibility of multiple traces. For example, a given trace can only have one
fillcolor, so it’s impossible to render multiple polygons with different colors using a single trace. For this reason, if you want to vary the color of multiple polygons, make sure the
split by a unique identifier (e.g.
name), as done in Figure 4.10. Note that, as discussed for line charts in Figure 3.2, using multiple traces automatically adds the ability to filter
name via legend entries.
Another important feature for maps that may require you to
split multiple polygons into multiple traces is the ability to display a different hover-on-fill for each polygon. By providing
text that is unique within each polygon and specifying
hoveron='fills', the tooltip behavior is tied to the trace’s fill (instead of displayed at each point along the polygon).
Although the integrated mapping approaches (
plot_geo()) can render sf objects, the custom mapping approaches (
geom_sf()) are more flexible because they allow for any well-defined mapping projection. Working with and understanding map projections can be intimidating for a causal map maker. Thankfully, there are nice resources for searching map projections in a human-friendly interface, like http://spatialreference.org/. Through this website, one can search desirable projections for a given portion of the globe and extract commands for projecting their geo-spatial objects into that projection. One way to perform the projection is to supply the relevant PROJ4 command to the
st_transform() function in sf (PROJ contributors 2018).
# filter the world sf object down to canada canada <- filter(world, name == "Canada") # coerce cities lat/long data to an official sf object cities <- st_as_sf( maps::canada.cities, coords = c("long", "lat"), crs = 4326 ) # A PROJ4 projection designed for Canada # http://spatialreference.org/ref/sr-org/7/ # http://spatialreference.org/ref/sr-org/7/proj4/ moll_proj <- "+proj=moll +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs" # perform the projections canada <- st_transform(canada, moll_proj) cities <- st_transform(cities, moll_proj) # plot with geom_sf() p <- ggplot() + geom_sf(data = canada) + geom_sf(data = cities, aes(size = pop), color = "red", alpha = 0.3) ggplotly(p)
Some geo-spatial objects have an unnecessarily high resolution for a given visualization. In these cases, you may want to consider simplifying the geo-spatial object to improve the speed of the R code and responsiveness of the visualization. For example, we could recreate Figure 4.8 with a much higher resolution by specifying
scale = "large" in
ne_countries() this gives us a sf object with over 50 times more spatial coordinates than the default scale. The higher resolution allows us to zoom in better on more complex geo-spatial regions, but it allow leads to slower R code, larger HTML files, and slower responsiveness. Sievert (2018b) explores this issue in more depth and demonstrates how to use the
st_simplify() function from sf to simplify features before plotting them.
Analogous to the discussion surrounding 3.2, it pays to be aware of the tradeoffs involved with rendering plotly graphics using one or many traces, and knowledgeable about how to leverage either approach. Specifically, by default, plotly attempts to render all simple features in a single trace, which is performant, but doesn’t have a lot of interactivity.
For those interested in learning more about geocomputation in R with sf and other great R packages like sp and raster, Robin Lovelace (2019) provides lots of nice and freely available learning resources (Pebesma and Bivand 2005; Hijmans 2019).
Cartograms distort the size of geo-spatial polygons to encode a numeric variable other than the land size. There are numerous types of cartograms and they are typically categorized by their ability to preserve shape and maintain contiguous regions. Cartograms has been shown to be an effective approach to both encode and teach about geo-spatial data, though the effects certainly vary by cartogram type (Nusrat S, Alam MJ, Kobourov S. 2018). The R package cartogram provides an interface to several popular cartogram algorithms (Jeworutzki 2018). A number of other R packages provide cartogram algorithms, but the great thing about cartogram is that all the functions can take an sf (or sp) object as input and return an sf object. This makes it incredibly easy to go from raw spatial objects, to transformed objects, to visual. Figure 4.13 demonstrates a continuous area cartogram of US population in 2014 using a rubber sheet distortion algorithm from James A. Dougenik, Nicholas R. Chrisman, Duane R. Niemeyer (1985).
library(cartogram) library(albersusa) us_cont <- cartogram_cont(usa_sf("laea"), "pop_2014") plot_ly(us_cont) %>% add_sf( color = ~pop_2014, split = ~name, span = I(1), text = ~paste(name, scales::number_si(pop_2014)), hoverinfo = "text", hoveron = "fills" ) %>% layout(showlegend = FALSE) %>% colorbar(title = "Population \n 2014")
Figure 4.14 demonstrates a non-continuous Dorling cartogram of US population in 2014 from Dorling, D (1996). This cartogram does not try to preserve the shape of polygons (i.e., states), but instead uses circles instead to represent each geo-spatial object, then encodes the variable of interest (i.e., population) using the area of the circle.
us <- usa_sf("laea") us_dor <- cartogram_dorling(us, "pop_2014") plot_ly(stroke = I("black"), span = I(1)) %>% add_sf( data = us, color = I("gray95"), hoverinfo = "none" ) %>% add_sf( data = us_dor, color = ~pop_2014, split = ~name, text = ~paste(name, scales::number_si(pop_2014)), hoverinfo = "text", hoveron = "fills" ) %>% layout(showlegend = FALSE)
Figure 4.15 demonstrates a non-continuous cartogram of US population in 2014 from Olson, J. M. (1976). In contrast to the Dorling cartogram, this approach does preserve the shape of polygons. The implementation behind Figure 4.15 is to simply take the implementation of Figure 4.14 and change
A popular class of contiguous cartograms that do not preserve shape are sometimes referred to as tile cartograms (aka tilegrams). At the time of writing, there doesn’t seem to be a great R package for computing tilegrams, but Pitch Interactive provides a nice web service where you can generate tilegrams from existing or custom data https://pitchinteractiveinc.github.io/tilegrams/. Moreover, the service allows you to download a TopoJSON file of the generated tilegram, which we can read in R and convert into an sf object via geojsonio (Chamberlain and Teucher 2018). Figure 4.16 demonstrates a tilegram of U.S. Population in 2016 exported directly from Pitch’s free web service.
Chamberlain, Scott, and Andy Teucher. 2018. Geojsonio: Convert Data from and to ’Geojson’ or ’Topojson’. https://CRAN.R-project.org/package=geojsonio.
Dorling, D. 1996. “Area Cartograms: Their Use and Creation.” Concepts and Techniques in Modern Geography (CATMOG).
Hijmans, Robert J. 2019. Raster: Geographic Data Analysis and Modeling. https://CRAN.R-project.org/package=raster.
James A. Dougenik, Nicholas R. Chrisman, Duane R. Niemeyer. 1985. “An Algorithm to Construct Continuous Area Cartograms.” The Professional Geographer.
Jeworutzki, Sebastian. 2018. Cartogram: Create Cartograms with R. https://CRAN.R-project.org/package=cartogram.
Mullen, Lincoln A., and Jordan Bratt. 2018. “USAboundaries: Historical and Contemporary Boundaries of the United States of America.” Journal of Open Source Software 3 (23): 314. https://doi.org/10.21105/joss.00314.
Newman, Mark. 2016. “Maps of the 2016 Us Presidential Election Results.” Blog. http://www-personal.umich.edu/~mejn/election/2016/.
Nusrat S, Alam MJ, Kobourov S. 2018. “Evaluating Cartogram Effectiveness.” IEEE Trans Vis Comput Graph. https://doi.org/10.1109/TVCG.2016.2642109.
Olson, J. M. 1976. “Noncontiguous Area Cartograms.” The Professional Geographer.
Pebesma, Edzer. 2018. Sf: Simple Features for R. https://CRAN.R-project.org/package=sf.
Pebesma, Edzer J., and Roger S. Bivand. 2005. “Classes and Methods for Spatial Data in R.” R News 5 (2): 9–13. https://CRAN.R-project.org/doc/Rnews/.
PROJ contributors. 2018. PROJ Coordinate Transformation Software Library. Open Source Geospatial Foundation. https://proj4.org/.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Robin Lovelace, Jannes Muenchow, Jakub Nowosad. 2019. Geocomputation with R. Chapman and Hall/CRC.
Sievert, Carson. 2018b. “Learning from and Improving Upon Ggplotly Conversions.” Blog. https://blog.cpsievert.me/2018/01/30/learning-improving-ggplotly-geom-sf/.
South, Andy. 2017. Rnaturalearth: World Map Data from Natural Earth. https://CRAN.R-project.org/package=rnaturalearth.
Wickham, Hadley. 2014b. “Tidy Data.” The Journal of Statistical Software 59 (10). http://www.jstatsoft.org/v59/i10/.
Unfortunately, non-scatter traces currently don’t work with
plot_geo()meaning that, for one, raster (i.e., heatmap) maps are not natively supported.↩︎
This is way more intuitive compared to older workflows based on, say using
ggplot2::fortify()to obtain a data structure where a row to represents particular point along a feature and having another column track which point belongs to each feature (for example).↩︎