Metadata API: Metadata "factory" constructors misbehave with inheritance
class DerivedMetadata(Metadata):
def ok(self):
print("ok")
md = DerivedMetadata.from_file("root.json")
md.ok() # fails because md type is Metadata, not DerivedMetadata
This seems to happen because the construction path goes
- DerivedMetadata.from_file()
- DerivedMetadata.from_bytes()
- JsonDeserializer.deserialize() here deserializer doesn't know the correct class anymore and calls Metadata.from_dict()
- Metadata.from_dict()
Well spotted! The easiest fix would probably be to pass the class to JsonDeserializer.deserialize(), like so:
diff --git a/tuf/api/serialization/json.py b/tuf/api/serialization/json.py
index b043310f..050ef0dc 100644
--- a/tuf/api/serialization/json.py
+++ b/tuf/api/serialization/json.py
@@ -25,15 +25,22 @@ from tuf.api.serialization import (
SignedSerializer,
)
+# TODO: Adopt in MetadataDeserializer and define below generic types there
+from typing import Type, TypeVar
+
+# Generic type for **instances** of subclasses of Metadata
+TMetadata = TypeVar("TMetadata", bound=Metadata)
+
+# Generic type for **subclasses** of Metadata
+CMetadata = Type[TMetadata]
class JSONDeserializer(MetadataDeserializer):
"""Provides JSON to Metadata deserialize method."""
- def deserialize(self, raw_data: bytes) -> Metadata:
+ def deserialize(self, cls: CMetadata, raw_data: bytes) -> TMetadata:
"""Deserialize utf-8 encoded JSON bytes into Metadata object."""
try:
json_dict = json.loads(raw_data.decode("utf-8"))
- metadata_obj = Metadata.from_dict(json_dict)
+ metadata_obj = cls.from_dict(json_dict)
except Exception as e:
raise DeserializationError from e
An alternative would be to require subclasses of Metadata use their custom MetadataDeserializer implementation. This seems cleaner, but maybe unnecessarily complex as long as the subclass can use the base Metadata from_dict factory.
@jku, was there a specific use case for a DerivedMetadata class? Do you think we can move forward with the patch from above?
@jku, was there a specific use case for a
DerivedMetadataclass?
Nothing urgent at all: I was just experimenting with providing some editing helpers for Metadata. All of that is possible via other designs.
The fix seems right, although we could maybe improve the existing TypeVar in metadata.py instead of adding a new one -- I used T as the type name there thinking it would never be used out of context but your TMetadata might be better...
EDIT: oops, it's not the same typevar actually, never mind me
Speaking of T...
https://github.com/theupdateframework/python-tuf/blob/8ae944ccb256ba0aa13a62f691fd79d4e16c11ad/tuf/api/metadata.py#L77
... is this recursive definition intentional? That is, Metadata being a subclass of either of "Root", "Timestamp", "Snapshot", "Targets", but also containing them?
At least from my understanding class Metadata(Generic[T]): means "Metadata is a Generic[T]", or can it also indicate that "Metadata has a Generic[T]"? (typing is still fairly new to me)
so it should mean Metadata is a Generic, and it's generic over the type T -- this is often used to mean that the Generic is container for type T, as it is here (even if our container is a bit unique in that there is always one and only one T in Metadata).
@jku, unless you need this resolved quickly, I think it would be a good self-contained task for someone to get started with python-tuf development. AFAICS what's missing is:
- apply/revise patch from above
- adopt
MetadataDeserializeraccordingly (beware of type check case handling and cyclic-imports!) - add a test
Sounds good to me.
Maybe I should point out that e.g. the return value annotation change for deserialize() is not really required: annotating as Metadata is likely just fine (as long as the correct constructor gets called).
Hi, I would like to take up this issue.
We just recently revamped some of this de/serialization logic in securesystemslib. Our goal was to make it useable for python-tuf and in-toto, for different metadata containers (traditional and new dsse envelopes), and for different payloads.
We realized that we get less coupling and more flexibility, if we use custom deserializers per class-to-be-deserialized-into instead of adding more options to an existing deserializer.
To still allow some code re-use, we separated JSON deserialization from calling the class constructor. That is, the generic JSONDeserializer just does json.loads plus some common error handling, which can be used by e.g. a library-specific AnyMetadataDeserializer, which in addition knows which constructor (factories) to call.
We should consider switching python-tuf to the new de/serialization facility from securesystemslib, especially if we want to make DSSE available here. Until then I advice not touching the python-tuf de/serializers, unless we need to resolve the inheritance issue before that.
The switch is probably not a good first issue. So I'm removing the label. WDYT, @jku?
I'm sure you're right... I'm not very familiar with serialisation ideas: happy to discuss (maybe after OSS EU) if you want comments