jsonlite icon indicating copy to clipboard operation
jsonlite copied to clipboard

fromJSON nested json object parsed as data.frame instead of list of lists (simplifyDataFrame = TRUE)

Open svdwoude opened this issue 7 years ago • 6 comments

Considering this simple json example:

[
  {
    "array": [
      1,
      2,
      3
    ],
    "data.frame": [
      {
        "name": "John",
        "age": 30,
        "car": "car"
      },
      {
        "name": "John",
        "age": 30,
        "car": "car"
      }
    ],
    "boolean": true,
    "number": 123,
    "object": {
      "a": "b",
      "c": "d",
      "e": "f"
    },
    "string": "Hello World"
  },
  {
    "array": [
      1,
      2,
      3
    ],
    "data.frame": [
      {
        "name": "John",
        "age": 30,
        "car": "car"
      },
      {
        "name": "John",
        "age": 30,
        "car": "car"
      }
    ],
    "boolean": true,
    "number": 123,
    "object": {
      "a": "b",
      "c": "d",
      "e": "f"
    },
    "string": "Hello World"
  }
]

I get the following structure in R using fromJSON

json <- '[{"array":[1,2,3],"data.frame":[{"name":"John","age":30,"car":"car"},{"name":"John","age":30,"car":"car"}],"boolean":true,"number":123,"object":{"a":"b","c":"d","e":"f"},"string":"Hello World"},{"array":[1,2,3],"data.frame":[{"name":"John","age":30,"car":"car"},{"name":"John","age":30,"car":"car"}],"boolean":true,"number":123,"object":{"a":"b","c":"d","e":"f"},"string":"Hello World"}]'

df <- jsonlite::fromJSON(json)
'data.frame':	2 obs. of  6 variables:
 $ array     :List of 2
  ..$ : int  1 2 3
  ..$ : int  1 2 3
 $ data.frame:List of 2
  ..$ :'data.frame':	2 obs. of  3 variables:
  .. ..$ name: chr  "John" "John"
  .. ..$ age : int  30 30
  .. ..$ car : chr  "car" "car"
  ..$ :'data.frame':	2 obs. of  3 variables:
  .. ..$ name: chr  "John" "John"
  .. ..$ age : int  30 30
  .. ..$ car : chr  "car" "car"
 $ boolean   : logi  TRUE TRUE
 $ number    : int  123 123
 $ object    :'data.frame':	2 obs. of  3 variables:
  ..$ a: chr  "b" "b"
  ..$ c: chr  "d" "d"
  ..$ e: chr  "f" "f"
 $ string    : chr  "Hello World" "Hello World"

instead I am aiming to get the following structure:

'data.frame':	2 obs. of  6 variables:
 $ array     :List of 2
  ..$ : int  1 2 3
  ..$ : int  1 2 3
 $ data.frame:List of 2
  ..$ :'data.frame':	2 obs. of  3 variables:
  .. ..$ name: chr  "John" "John"
  .. ..$ age : int  30 30
  .. ..$ car : chr  "car" "car"
  ..$ :'data.frame':	2 obs. of  3 variables:
  .. ..$ name: chr  "John" "John"
  .. ..$ age : int  30 30
  .. ..$ car : chr  "car" "car"
 $ boolean   : logi  TRUE TRUE
 $ number    : int  123 123
 $ object    :List of 2
  ..$ :List of 3
  .. ..$ a: chr "b"
  .. ..$ c: chr "d"
  .. ..$ e: chr "f"
  ..$ :List of 3
  .. ..$ a: chr "b"
  .. ..$ c: chr "d"
  .. ..$ e: chr "f"
 $ string    : chr  "Hello World" "Hello World"

where the object field is a list of lists instead of data.frame. I've played with simplifyDataFrame but this obviously changes how the data.frame field is converted (and the top level class). Instead I would like to keep the data.frame field to be structured as shown above, while having the option to convert a nested object as a list of lists.

In the end I want to create a tibble from above json looking like:

> json %>% jsonlite::fromJSON() %>% tibble::as_tibble()
# A tibble: 2 x 6
  array     data.frame           boolean number object     string     
  <list>    <list>               <lgl>    <int> <list>     <chr>      
1 <int [3]> <data.frame [2 × 3]> TRUE       123 <list [3]> Hello World
2 <int [3]> <data.frame [2 × 3]> TRUE       123 <list [3]> Hello World

(Above tibble would produce the initial json using toJSON(auto_unbox = TRUE, pretty=TRUE))

I'm not sure if I'm overlooking some options or wether this is the intended fixed behaviour.

svdwoude avatar Aug 29 '18 00:08 svdwoude

I could obviously transform from the current structure to the desired structure doing

df[sapply(df, class) == "data.frame"] <- lapply(df[sapply(df, class) == "data.frame"], purrr::transpose)

but it seems to me it would be more appropriate when handled by jsonlite.

I can't use dplyr's mutate_if(is.data.frame, purrr::transpose) since it internally wants to convert to a tibble which is not possible because

df %>% mutate_if(is.data.frame,purrr::transpose)
#> Error: Column `object` must be a 1d atomic vector or a list

also in the latest development version of tibble (1.4.99) where columns are allowed to be data.frames this fails

df %>% mutate_if(is.data.frame,purrr::transpose)
#> Error: Evaluation error: Attempt to query lazy column with non-natural slicing index.

svdwoude avatar Aug 29 '18 21:08 svdwoude

Stumbling over the same issue. Has there been any progress and/or is there a way to tell jsonlite to parse nested structures as either lists or tibbles?

jankowtf avatar Jan 10 '20 12:01 jankowtf

You can use fromJSON(json, simplifyVector = FALSE) or you can use jsonlite::parse_json() which uses this as default.

jeroen avatar Jan 10 '20 12:01 jeroen

Cool, thanks.

A bit out of bounds, but: Is there a way to also tell mongolite to parse this way within $find()? I know that there's handler, however I'm totally not familiar with handlers and couldn't get it to work for ending up with the desired nested structure.

jankowtf avatar Jan 10 '20 13:01 jankowtf

The easiest way may be to use iter() see https://jeroen.github.io/mongolite/query-data.html#iterating

jeroen avatar Jan 10 '20 13:01 jeroen

@svdwoude this is indeed intended behavior as described in section 2.4.4 of the paper. There is currently no built-in option to simplify the top-level dataframe, but not the columns.

jeroen avatar Jan 10 '20 22:01 jeroen