19 Working with JSON

JavaScript and other web technologies are intimidating and time-consuming to learn, but by borrowing some knowledge of R’s data structures38, we can get up and running with useful examples fairly quickly. JavaScript Object Notation (JSON) is a popular data-interchange format that JavaScript uses to work with data. As turns out, working with JSON in JS is somewhat similar to working with list()s in R – both are recursive and heterogeneous data structures that have similar semantics for accessing values. In JSON, there are three basic building blocks: objects, arrays, and primitive data types (e.g., number, string, boolean, null, undefined).

Loosely speaking, a JSON array is similar to a un-named list() in R and a JSON object is similar to an un-named list(). In fact, if you’re already comfortable creating and subsetting named and un-named list()s in R, you can transfer some of that knowledge to JSON arrays and objects.

19.1 Assignment, subsetting, and iteration

In R, the <- operator assigns a value to a name, and the [[ operator extracts a list element by index:

arr <- list("hello", "world", 10)
arr[[1]]
#> "hello"

In JS, the = assigns a value to a name. When assigning a new name, you should include the var keyword (or similar) to avoid creation of a global variable. The [ operator extracts list elements by index, but be careful, indexing in JS starts at 0 (not 1)!

var arr = ["hello", "world", 10];
arr[0]
// "hello"

In R, the $ and [[ operator can be used to extract list elements by name. The difference is that $ does partial matching of names, while [[ requires the exact name.

obj <- list(x = c("hello", "world"), zoo = 10)
obj$z
#> 10
obj[["zoo"]]
#> 10

In JS, the . and [ operator can be used to extract list elements by name. In either case, the naming must be exact.

var obj = {
  x: ["hello", "world"],
  zoo: 10
}
obj.zoo
// 10
obj['zoo']
// 10

Unlike R list()s, arrays and objects in JS come with properties and methods that can be accessed via the . operator. Arrays, in particular, have a length property and a map() method for applying a function to each array element:

arr.length
// 3
arr.map(function(item) { return item + 1; });
// ["hello1", "world1", 11]

In R, both the lapply() and purrr::map() family of functions provide a similar functional interface. Also, note that operators like + in JS do even more type coercion than R, so although item + 1 works for strings in JS, it would throw an error in R (an that’s ok, most times you probably don’t want to add a string to a number)! If instead, you wanted to only add 1 to numeric values, you could use is.numeric() in R within an if else statement.

purrr::map(arr, function(item) {
  if (is.numeric(item)) item + 1 else item
})
#> [[1]]
#> [1] "hello"
#> 
#> [[2]]
#> [1] "world"
#> 
#> [[3]]
#> [1] 11

In JS, you can use the typeof keyword to get the data type as well as the conditional ternary operator (condition ? exprT : exprF) to achieve the same task.

arr.map(function(item) { 
  return typeof item == "number" ? item + 1 : item; 
});
// ["hello", "world", 11]

There are a handful of other useful array and object methods, but to keep things focused, we’ll only cover what’s required to comprehend Section 20. A couple examples in that section use the filter() method, which like map() applies a function to each array element, but expects a logical expression and returns only the elements that meet the condition.

arr.filter(function(item) { return typeof item == "string"; });
// ["hello", "world"]

19.2 Mapping R to JSON

In R, unlike JSON, there is no distinction between scalars and vectors of length 1. That means there is ambiguity as to what a vector of length 1 in R should map to in JSON. The jsonlite package defaults to an array of length 1, but this can be avoided by setting auto_unbox = TRUE.

jsonlite::toJSON("A string in R")
#> ["A string in R"]
jsonlite::toJSON("A string in R", auto_unbox = TRUE)
#> "A string in R"

It’s worth noting that plotly.js, which consumes JSON objects, has specific expectations and rules about scalars versus arrays of length 1. If you’re calling the plotly.js library directly in JS, as we’ll see later in Section 20, you’ll need to be mindful of the difference between scalars and arrays of length 1. Some attributes, like text and marker.size, accept both scalars and arrays and apply different rules based on the difference. Some other attributes, like x, y, and z only accept arrays and will error out if given a scalar. To learn about these rules and expectations, you can use the schema() function from R to inspect plotly.js’ specification as shown in Figure 19.1. Note that attributes with a val_type of 'data_array' require an array while attributes with an arrayOk: true field accept either scalars or arrays.

schema()

FIGURE 19.1: Using the plotly schema() to obtain more information about expected attribute types. For the interactive, see https://plotly-r.com/interactives/json-schema.html

In JSON, unlike R, there is no distinction between a heterogeneous and homogeneous collection of data types. In other words, in R, there is an important difference between list(1, 2, 3) and c(1, 2, 3) (the latter is an atomic vector and has a different set of rules). In JSON, there is no strict notion of a homogeneous collection, so working with JSON arrays is essentially like being forced to use list() in R. This subtle fact can lead to some surprising results when trying to serialize R vectors as JSON arrays. For instance, if you wanted to create a JSON array, say [1,"a",true] using R objects, you may be tempted to do the following:

jsonlite::toJSON(c(1, "a", TRUE))
#> ["1","a","TRUE"] 

But this actually creates an array of strings instead of the array with a number, string, and boolean that we desire. The problems actually lies in the fact that c() coerces the collection of values into an atomic vector. Instead, you should use list() over c():

jsonlite::toJSON(list(1, "a", TRUE), auto_unbox = TRUE)
#> [1,"a",true]

  1. If you’d like a nice succinct overview on the topic, see http://adv-r.had.co.nz/Data-structures.html↩︎