# 32 The data-plot-pipeline

As Chapter 2.1 first introduced, we can express multi-layer plotly graphs as a sequence (or, more specifically, a directed acyclic graph) of dplyr data manipulations and mappings to visuals. For example, to create Figure 32.1, we could group txhousing by city to ensure the first layer of add_lines() draws a different line for each city, then filter() down to Houston so that the second call to add_lines() draws only Houston.

allCities <- txhousing %>%
group_by(city) %>%
plot_ly(x = ~date, y = ~median) %>%
add_lines(alpha = 0.2, name = "Texan Cities", hoverinfo = "none")

allCities %>%
filter(city == "Houston") %>%
add_lines(name = "Houston")

Sometimes the directed acyclic graph property of a magrittr pipeline can be too restrictive for certain types of plots. In this example, after filtering the data down to Houston, there is no way to recover the original data inside the pipeline. The add_fun() function helps to work-around this restriction44 – it works by applying a function to the plotly object, but does not affect the data associated with the plotly object. This effectively provides a way to isolate data transformations within the pipeline45. Figure 32.2 uses this idea to highlight both Houston and San Antonio.

allCities %>%
plot %>% filter(city == "Houston") %>%
}) %>%
plot %>% filter(city == "San Antonio") %>%
})

It is useful to think of the function supplied to add_fun() as a “layer” function – a function that accepts a plot object as input, possibly applies a transformation to the data, and maps that data to visual objects. To make layering functions more modular, flexible, and expressive, the add_fun() allows you to pass additional arguments to a layer function. Figure 32.3 makes use of this pattern, by creating a reusable function for layering both a particular city as well as the first, second, and third quartile of median monthly house sales (by city).

# reusable function for highlighting a particular city
layer_city <- function(plot, name) {
plot %>% filter(city == name) %>% add_lines(name = name)
}

# reusable function for plotting overall median & IQR
layer_iqr <- function(plot) {
plot %>%
group_by(date) %>%
summarise(
q1 = quantile(median, 0.25, na.rm = TRUE),
m = median(median, na.rm = TRUE),
q3 = quantile(median, 0.75, na.rm = TRUE)
) %>%
add_lines(y = ~m, name = "median", color = I("black")) %>%
ymin = ~q1, ymax = ~q3,
name = "IQR", color = I("black")
)
}

allCities %>%
add_fun(layer_city, "San Antonio")

A layering function does not have to be a data-plot-pipeline itself. Its only requirement on a layering function is that the first argument is a plot object and it returns a plot object. This provides an opportunity to say, fit a model to the plot data, extract the model components you desire, and map those components to visuals. Furthermore, since plotly’s add_*() functions don’t require a data.frame, you can supply those components directly to attributes (as long as they are well-defined), as done in Figure 32.4 via the forecast package (Hyndman 2018).

library(forecast)
layer_forecast <- function(plot) {
d <- plotly_data(plot)
series <- with(d,
ts(median, frequency = 12, start = c(2000, 1), end = c(2015, 7))
)
fore <- forecast(ets(series), h = 48, level = c(80, 95))
plot %>%
add_ribbons(x = time(fore$mean), ymin = fore$lower[, 2],
ymax = fore$upper[, 2], color = I("gray95"), name = "95% confidence", inherit = FALSE) %>% add_ribbons(x = time(fore$mean), ymin = fore$lower[, 1], ymax = fore$upper[, 1], color = I("gray80"),
name = "80% confidence", inherit = FALSE) %>%
add_lines(x = time(fore$mean), y = fore$mean, color = I("blue"),
name = "prediction")
}

txhousing %>%
group_by(city) %>%
plot_ly(x = ~date, y = ~median) %>%
add_lines(alpha = 0.2, name = "Texan Cities", hoverinfo="none") %>%
add_fun(layer_forecast)
1. Credit to Winston Chang and Hadley Wickham for this idea. The add_fun() is very much like layer_f() function in ggvis.↩︎