Register handlers for dicts
Consider the following dictionary:
requirements: Dict[UrnId, RequirementData]
where
@dataclass
class UrnId:
urn: str
id: str
with keys=False
"requirements": {
"UrnId(urn='ms-001', id='REQ_ms001_101')": {
with keys=True
"requirements": {
"json://\"ms-001:REQ_ms001_101\"": {
None of them are really useful as I want to serialize perhaps as
"requirements": {
"ms-001:REQ_ms001_101": {
Can I override this behaviour some how?
Is there a way to override serialization of dicts in general?
Ideally, I would like to modify the dict upon serialization based on key so that urn is one level and id another.
So that,
"requirements": {
"UrnId(urn='U1', id='I1')": {...}
"UrnId(urn='U1', id='I2')": {...}
becomes (i.e. adding one more level):
"requirements": {
"U1": {
"I1": {...},
"I2": {...}
}
Potential code:
Not sure if there is a way to call flatten using the handlers instead of <call flatten recursively here>
class CustomDictHandler(jsonpickle.handlers.BaseHandler):
def flatten(self, obj, data):
if isinstance(obj, dict):
new_dict = {}
for key, value in obj.items():
if isinstance(key, UrnId):
if key.urn not in new_dict:
new_dict[key.urn] = {}
new_dict[key.urn][key.id] = <call flatten recursively here>(value)
return new_dict
return jsonpickle.handlers.BaseHandler.flatten(self, obj, data)
At the moment I don't believe that's possible in general by hooking into jsonpickle. That being said, it may be possible to define a custom __getstate__ on UrnId that returns what you're looking for when encoded.
Ok. I tried adding __getstate__ to UrnId class but it does change the key for me unless I did something wrong.
You meant for the key representation without "json://"?
Because, I looked at https://jsonpickle.readthedocs.io/en/latest/api.html#object.getstate and can't see how a getState on UrnId can manipulate the dictionary structure?
Or is there are way to manipulate the dictionary structure? If so, would you mind elaborating a little to it started?
I can give you an example __getstate__ if you can send me an entire minimal reproducible example (MRE) to get the output that you are currently getting, and an example of the output you want from that script.
Here's a few notes in case it helps.
I'll preface this with this important detail about the library's purpose -- jsonpickle isn't really geared towards customizing some of these more aesthetic aspects of serialization.
jsonpickle's primary use case is being able to reconstitute objects through json. Because of this, we will always need to embed some amount of out-of-band metadata in order to accomplish that goal.
Another important note is that you're using dicts with specialized objects as its keys. That's something that JSON itself cannot represent, so jsonpickle has to do something that's both general-purpose and capable enough to handle being able to reconstitute objects. That's why we have to serialize objects to embedded json inside strings ~ complex objects keys require special handling.
Now, with that out of the way, there is one thing you can leverage about jsonpickle's behavior to get pretty close to your first example:
"requirements": {
"ms-001:REQ_ms001_101": {
When jsonpickle hits the UrnId objects in the dict's keys it will use repr(obj) to stringify the object when creating its corresponding json key.
We use embedded json when using json_str = jsonpickle.encode(requirements, keys=True) and that expects you to also use jsonpickle.decode(json_str, keys=True) in order to reconstitute the objects.
But, if you'd like to keep the json representation simpler what you can do is rely on jsonpickle for serializing the values of the dict (its guts) and then handle restoring the keys yourself. This might be a good enough middle ground for your use case.
Also, maybe you don't even care about decoding the objects, and all you care about is the encoded representation. If that's the case you can omit the from_repr_string() staticmethod and skip the latter part of the test function below.
So, here's the basic approach / workaround:
- Don't use
keys=Trueso that jsonpickle usesrepr(key)on the dict keys. - Implement
__repr__so that the dict keys look nicer. - Restore dict keys manually from the
__repr__string after loading from jsonpickle to restore a copy of the original dict.
import json
from dataclasses import dataclass
import jsonpickle
@dataclass
class Requirement:
package: str
version: tuple
@dataclass
class UrnId:
urn: str
id: str
def __hash__(self):
return hash(repr(self))
def __repr__(self):
return f'{self.urn}:{self.id}'
@staticmethod
def from_repr(string):
if isinstance(string, UrnId):
return string
try:
urn, new_id = string.split(':', 1)
except ValueError:
return None
return UrnId(urn, new_id)
def test_dataclass_custom_restoration():
"""Restore objects manually to simplify the JSON represntation"""
requirements = {
UrnId('ms-001', 'REQ_ms001_101'): Requirement('pkz', [1, 0, 2]),
UrnId('ms-002', 'REQ_ms002_101'): Requirement('pkz', [1, 1, 0]),
UrnId('ms-003', 'REQ_ms003_101'): Requirement('pkz', [2, 0, 0]),
}
encoded = jsonpickle.encode(requirements)
# If all you care about is the JSON output you can stop here.
# If you need to restore objects from the JSON above, continue below.
decoded = jsonpickle.decode(encoded)
# Reconstitute the top-level dict keys
new_requirements = {}
for key, value in decoded.items():
new_key = UrnId.from_repr(key)
if new_key is None:
continue
new_requirements[new_key] = value
assert requirements == new_requirements
I consider this a nice middle ground because the resulting JSON looks like the following:
{
"ms-001:REQ_ms001_101": {"py/object": "__main__.Requirement", "package": "pkz", "version": [1, 0, 2]},
"ms-002:REQ_ms002_201": {"py/object": "__main__.Requirement", "package": "pkz", "version": [1, 1, 0]},
"ms-003:REQ_ms003_301": {"py/object": "__main__.Requirement", "package": "pkz", "version": [2, 0, 0]}
}
It's simpler, and while it could be simpler, it's not too bad. Any further simplification of the data will require you to handle the serialization yourself.
I've closed this issue for now since it doesn't seem like there's anything actionable left to do, but please feel free to continue the conversation if you have any questions or discussion topics.
I can give you an example
__getstate__if you can send me an entire minimal reproducible example (MRE) to get the output that you are currently getting, and an example of the output you want from that script.
@Theelx I totallt missed your response. Sorry about that. Reading up @davvid answer now.
@davvid Thank you for the very informative answer.
I realized as I started reading your reply that I didn't state that we are not deserializing the data (so we can skip from_repr). The JSON output is for 3rd party tools and it should be as clean as possible, since they have no use of python metadata for deserialization. For enumerations I could handle this using a customer handler.
With your solution we get:
{ "ms-001:REQ_ms001_101": {"py/object": "main.Requirement", "package": "pkz", "version": [1, 0, 2]}, "ms-002:REQ_ms002_201": {"py/object": "main.Requirement", "package": "pkz", "version": [1, 1, 0]}, "ms-003:REQ_ms003_301": {"py/object": "main.Requirement", "package": "pkz", "version": [2, 0, 0]} }
but py/object and package are really not wanted. However, this is an improvement.
However, it does not solve the challenge of restructuring the dicts (as mentioned initially) so that
So that,
"requirements": {
"UrnId(urn='U1', id='I1')": {...}
"UrnId(urn='U1', id='I2')": {...}
becomes (i.e. adding one more level):
"requirements": {
"U1": {
"I1": {...},
"I2": {...}
}
Any further simplification of the data will require you to handle the serialization yourself.
Do you mean without jsonpickle or manually by? @Theelx mentioned getstate could that solve our dilemma some how?
And just to confirm, there will not be a new feature implemented that allows us to register handlers for dicts?