serializer TypedSerializer should only include effectively used types

Is your feature request related to a problem? Please describe.

We'd like to use EclipseSerializer to serialize objects to a database to be (later) deserialized by a potentially different instance. Therefore we need the TypedSerializer but the problem is, that the output produced is HUGE and seems to include all class information the serializer knows about. Example: A sample object is serialied to about 2.5kB of binary data using the normal Serializer but to 65kB with the TypedSerializer even when not many other classes have been serialized before and the cache seemed to only include around 150 known classes.

Describe the solution you'd like

I'd like to see a strategy to only include class information about classes effectively encountered in the serialized data and try to reduce the amount of extra data of TypedSerializer

This would open many new usecases for Eclipse Serializer

Dec 07 '23 12:12 mkeller-ergon

Many thanks for your proposal. You are right, that would open may new use cases.

As the type-information is required to ensure correct deserialization we would need the possibility to export the complete type-information from the serializer and to import that information into the de-serializer.

Exporting the type-information is already possible but requires a more convenient API.
Importing type-information is not yet possible, we need to extend the foundation to allow serializer creation with a provided type dictionary.

Dec 11 '23 13:12 hg-ms

I'm reasoning to integrate Eclipse Serializer into a messaging framework. For doing so, I would like to get the serialized bytes to be put as message payload and be able to extract the type information and pass those using message headers. I was looking on TypedSerializer implementation but still struggling with how to extract the correct type information.

Is there any further docs which I could study?

Cheers,

Simon

Feb 05 '24 21:02 zambrovski

@zambrovski: Unfortunately, there is no further documentation beyond the source code and the reference manual https://docs.eclipsestore.io/manual/serializer/index.html. If you have a limited set of classes to serialize you may consider using the simpler Serializer implementation instead of the TypedSerializer. In that case no type-information must be exchanged. The drawback is that every class that should be (de)serialized must be registered at initialization. If this simple serializer is not sufficient a custom implementation that provides the type-information separated from the serialized data could be implemented. As you don’t want the type-information to be included in the serialized data this would be a simplified version of the TypedSerializer.

Here are some tips how to do this.

Type information are stored in the TypeDictionary, it is provided by the PersistenceManager. You can register a PersistenceTypeDefinitionRegistrationObserver at the TypeDictionary to collect new PersistenceTypeDefinitions during serialization. This is what the SerializerTypeInfoStrategy implementations do.
As the serialize and deserialize methods don’t need to include the type-information in the serialized output they can be as simple as in the Serializer.java implementation.
PersistenceTypeDefinitions can be converted to text using the PersistenceTypeDictionaryAssembler.
The TypeDefinitionBuilder can be used to create PersistenceTypeDefinitions from text.
The TypeDefinitionImporter is used to import PersistenceTypeDefinitions.

Feb 07 '24 13:02 hg-ms

Very interesting. I'll try this out.

The question for me is still - what types are used during current serialization. If I understand correctly, the TypeDictionary contains all types seen by the Serializer so far.

The more crucial information is what types are in the "message". Knowing that I could send this information via meta-data with the message, receive on the other side, register the types for deserializer dynamically and deserialize the message. (of course if all of this goes well).

Feb 08 '24 15:02 zambrovski