## 2.3 Bars & histograms

The `add_bars()`

and `add_histogram()`

functions wrap the bar and histogram plotly.js trace types. The main difference between them is that bar traces require bar heights (both `x`

and `y`

), whereas histogram traces require just a single variable, and plotly.js handles binning in the browser.^{7} And perhaps confusingly, both of these functions can be used to visualize the distribution of either a numeric or a discrete variable. So, essentially, the only difference between them is where the binning occurs.

Figure 2.21 compares the default binning algorithm in plotly.js to a few different algorithms available in R via the `hist()`

function. Although plotly.js has the ability to customize histogram bins via xbins/ybins, R has diverse facilities for estimating the optimal number of bins in a histogram that we can easily leverage.^{8} The `hist()`

function alone allows us to reference 3 famous algorithms by name (Sturges 1926); (Freedman and Diaconis 1981); (Scott 1979), but there are also packages (e.g. the **histogram** package) which extend this interface to incorporate more methodology (Mildenberger, Rozenholc, and Zasada. 2009). The `price_hist()`

function below wraps the `hist()`

function to obtain the binning results, and map those bins to a plotly version of the histogram using `add_bars()`

.

```
p1 <- plot_ly(diamonds, x = ~price) %>% add_histogram(name = "plotly.js")
price_hist <- function(method = "FD") {
h <- hist(diamonds$price, breaks = method, plot = FALSE)
plot_ly(x = h$mids, y = h$counts) %>% add_bars(name = method)
}
subplot(
p1, price_hist(), price_hist("Sturges"), price_hist("Scott"),
nrows = 4, shareX = TRUE
)
```

Figure 2.22 demonstrates two ways of creating a basic bar chart. Although the visual results are the same, its worth noting the difference in implementation. The `add_histogram()`

function sends all of the observed values to the browser and lets plotly.js perform the binning. It takes more human effort to perform the binning in R, but doing so has the benefit of sending less data, and requiring less computation work of the web browser. In this case, we have only about 50,000 records, so there is much of a difference in page load times or page size. However, with 1 Million records, page load time more than doubles and page size nearly doubles.^{9}

```
p1 <- plot_ly(diamonds, x = ~cut) %>% add_histogram()
p2 <- diamonds %>%
dplyr::count(cut) %>%
plot_ly(x = ~cut, y = ~n) %>%
add_bars()
subplot(p1, p2) %>% hide_legend()
```

### 2.3.1 Multiple numeric distributions

It is often useful to see how the numeric distribution changes with respect to a discrete variable. When using bars to visualize multiple numeric distributions, I recommend plotting each distribution on its own axis, rather than trying to overlay them on a single axis.^{10}. This is where the `subplot()`

infrastructure, and its support for trellis displays, comes in handy. Figure 2.23 shows a trellis display of diamond price by diamond color. Note how the `one_plot()`

function defines what to display on each panel, then a split-apply-recombine strategy is employed to generate the trellis display.

```
one_plot <- function(d) {
plot_ly(d, x = ~price) %>%
add_annotations(
~unique(clarity), x = 0.5, y = 1,
xref = "paper", yref = "paper", showarrow = FALSE
)
}
diamonds %>%
split(.$clarity) %>%
lapply(one_plot) %>%
subplot(nrows = 2, shareX = TRUE, titleX = FALSE) %>%
hide_legend()
```