Learning nested JSON
Currently when running, all that is returned is the schema from "root", even though I have over 100,000 documents that have many nested attributes.
Currently in vectorizers there are the following:
basevectorizer.py
boolvectorizer.py
numbervectorizer.py
stringvectorizer.py
timestampvectorizer.py
Is there something I'm not quite understanding when it comes to "learning" deeper JSON than beyond 'root'?
The code should automatically learn the schema of nested documents. There was a bug in the sample code that I just fixed, that might have caused the issue. Use vectorizer.extend(docs) for learning the schema, where docs is a list of JSON documents, or use vectorizer.extend([doc]) when learning the schema incrementally.
Hello arsarabi,
Thank you for making your code available.
I've also had no luck learning nested attributes. Do I need to define a vectorizer of type "object" to be able to learn nested JSON objects?
Suppose I have a set of documents that match the following schema:
{
"nestedobject": {
"stringattr1": "some string",
"numberattr1": 42,
"stringattr2": "another string"
},
"stringattr3": "a third string",
"booleanattr1": true
}
...do I need to define additional vectorizers beyond those you provide in the sample code?
If I (only) use the vectorizers provided in the sample code, the only learned features are:
0: root has "booleanattr1"
1: root has "stringattr3"
2: root has "nestedobject"
Thank you in advance for answering this (very basic) usage question :).
Hello,
It has been a while since I worked on this but I believe it should work with nested JSON out of the box following the usage steps. Could you provide sample code that recreates the issue? Thanks!