fromJSON nested json object parsed as data.frame instead of list of lists (simplifyDataFrame = TRUE)
Considering this simple json example:
[
{
"array": [
1,
2,
3
],
"data.frame": [
{
"name": "John",
"age": 30,
"car": "car"
},
{
"name": "John",
"age": 30,
"car": "car"
}
],
"boolean": true,
"number": 123,
"object": {
"a": "b",
"c": "d",
"e": "f"
},
"string": "Hello World"
},
{
"array": [
1,
2,
3
],
"data.frame": [
{
"name": "John",
"age": 30,
"car": "car"
},
{
"name": "John",
"age": 30,
"car": "car"
}
],
"boolean": true,
"number": 123,
"object": {
"a": "b",
"c": "d",
"e": "f"
},
"string": "Hello World"
}
]
I get the following structure in R using fromJSON
json <- '[{"array":[1,2,3],"data.frame":[{"name":"John","age":30,"car":"car"},{"name":"John","age":30,"car":"car"}],"boolean":true,"number":123,"object":{"a":"b","c":"d","e":"f"},"string":"Hello World"},{"array":[1,2,3],"data.frame":[{"name":"John","age":30,"car":"car"},{"name":"John","age":30,"car":"car"}],"boolean":true,"number":123,"object":{"a":"b","c":"d","e":"f"},"string":"Hello World"}]'
df <- jsonlite::fromJSON(json)
'data.frame': 2 obs. of 6 variables:
$ array :List of 2
..$ : int 1 2 3
..$ : int 1 2 3
$ data.frame:List of 2
..$ :'data.frame': 2 obs. of 3 variables:
.. ..$ name: chr "John" "John"
.. ..$ age : int 30 30
.. ..$ car : chr "car" "car"
..$ :'data.frame': 2 obs. of 3 variables:
.. ..$ name: chr "John" "John"
.. ..$ age : int 30 30
.. ..$ car : chr "car" "car"
$ boolean : logi TRUE TRUE
$ number : int 123 123
$ object :'data.frame': 2 obs. of 3 variables:
..$ a: chr "b" "b"
..$ c: chr "d" "d"
..$ e: chr "f" "f"
$ string : chr "Hello World" "Hello World"
instead I am aiming to get the following structure:
'data.frame': 2 obs. of 6 variables:
$ array :List of 2
..$ : int 1 2 3
..$ : int 1 2 3
$ data.frame:List of 2
..$ :'data.frame': 2 obs. of 3 variables:
.. ..$ name: chr "John" "John"
.. ..$ age : int 30 30
.. ..$ car : chr "car" "car"
..$ :'data.frame': 2 obs. of 3 variables:
.. ..$ name: chr "John" "John"
.. ..$ age : int 30 30
.. ..$ car : chr "car" "car"
$ boolean : logi TRUE TRUE
$ number : int 123 123
$ object :List of 2
..$ :List of 3
.. ..$ a: chr "b"
.. ..$ c: chr "d"
.. ..$ e: chr "f"
..$ :List of 3
.. ..$ a: chr "b"
.. ..$ c: chr "d"
.. ..$ e: chr "f"
$ string : chr "Hello World" "Hello World"
where the object field is a list of lists instead of data.frame. I've played with simplifyDataFrame
but this obviously changes how the data.frame field is converted (and the top level class). Instead I would like to keep the data.frame field to be structured as shown above, while having the option to convert a nested object as a list of lists.
In the end I want to create a tibble from above json looking like:
> json %>% jsonlite::fromJSON() %>% tibble::as_tibble()
# A tibble: 2 x 6
array data.frame boolean number object string
<list> <list> <lgl> <int> <list> <chr>
1 <int [3]> <data.frame [2 × 3]> TRUE 123 <list [3]> Hello World
2 <int [3]> <data.frame [2 × 3]> TRUE 123 <list [3]> Hello World
(Above tibble would produce the initial json using toJSON(auto_unbox = TRUE, pretty=TRUE))
I'm not sure if I'm overlooking some options or wether this is the intended fixed behaviour.
I could obviously transform from the current structure to the desired structure doing
df[sapply(df, class) == "data.frame"] <- lapply(df[sapply(df, class) == "data.frame"], purrr::transpose)
but it seems to me it would be more appropriate when handled by jsonlite.
I can't use dplyr's mutate_if(is.data.frame, purrr::transpose) since it internally wants to convert to a tibble which is not possible because
df %>% mutate_if(is.data.frame,purrr::transpose)
#> Error: Column `object` must be a 1d atomic vector or a list
also in the latest development version of tibble (1.4.99) where columns are allowed to be data.frames this fails
df %>% mutate_if(is.data.frame,purrr::transpose)
#> Error: Evaluation error: Attempt to query lazy column with non-natural slicing index.
Stumbling over the same issue. Has there been any progress and/or is there a way to tell jsonlite to parse nested structures as either lists or tibbles?
You can use fromJSON(json, simplifyVector = FALSE) or you can use jsonlite::parse_json() which uses this as default.
Cool, thanks.
A bit out of bounds, but: Is there a way to also tell mongolite to parse this way within $find()? I know that there's handler, however I'm totally not familiar with handlers and couldn't get it to work for ending up with the desired nested structure.
The easiest way may be to use iter() see https://jeroen.github.io/mongolite/query-data.html#iterating
@svdwoude this is indeed intended behavior as described in section 2.4.4 of the paper. There is currently no built-in option to simplify the top-level dataframe, but not the columns.