Graph model for domain metadata
Description
In TG there is a subsystem responsible for generation of domain metadata. It was created specifically for EQL.
EQL3 metadata has the following structure:
- Unordered set of entity metadata instances, where each instance:
- Has a set of property metadata instances corresponding to its declared properties.
- Composite/Union property metadata includes a set of subproperty metadata instances.
This structure resembles a graph, although it is more akin to a forest.
There are several places where it is replicated:
-
EqlPropertyMetadata,EqlEntityMetadata,EqlEntityMetadataHolder. -
QuerySourceInfoProvider,QuerySourceInfo,AbstractQuerySourceItem. -
AbstractRetrievalModel(less resemblance, but could benefit by reusing a similar abstraction)
Each implementation of the structure is different and, as a result, supports only specialised operations. One consequence is that more effort is required to understand, use and maintain them.
It is proposed to introduce a single abstraction for modelling entities and their properties -- a graph. Graphs are well-established data structures, and there are Java libraries that provide them:
The previous example could be modelled with a graph as follows:
This structure has the following differences:
- Properties are represented as graph arcs instead of tree nodes.
- The type of a property is given by the tail of its arc (i.e., near the arrow) rather than being a part of it.
Note that such graphs could easily accomodate loops and directed cycles.
The advantages of this abstraction is the ability to reduce all operations to graph traversal, which should lend itself nicely to reuse and provide clarity of expression.
Expected outcome
A graph model for domain metadata is evaluated and adopted if deemed to be fitting and:
- Exhibits reasonable memory consumption (TBD quantitavely).
- Graph traversal can be clearly expressed and arbitrary traversal rules can be applied.
This is an attempt to answer the following questions:
- How to structure top-level metadata?
- Graph
- Forest
- How to structure property metadata?
- Should it include the enclosing type?
- Should it include the property type?
The following criteria should be used to compare alternatives:
- Space complexity of metadata structures.
- Time complexity of metadata operations.
- Metadata interface -- simplicity is desirable.
Metadata Overview
In general, metadata describes:
- Types
- Entities
- Nature (persistent, synthetic, union)
- Nature-specific metadata
- Persistent: table name
- Union: query model for each member property
- Synthetic: query models
- Information from annotations (title, desc)
- Composite
- Primitive
- Collectional
- Other (
PropertyDescriptor, ...)
- Entities
- Properties
- Nature
- Nature-specific information
- Persistent: column name
- Calculated: expression
- Collectional: link property
- Crit-only: mnemonics
- Auxiliary information (requiredness, key member, ...)
Metadata Structure Options
Type metadata can be expressed with the following ADT:
TypeMetadata = EntityMetadata
| CompositeTypeMetadata
| PrimitiveTypeMetadata
| CollectionalTypeMetadata
| OtherTypeMetadata
Let Tn be the total number of types and Pn the average number of properties in a type.
Options for TypeMetadata structure (referred to as TMn):
-
TM1 Stores a set of
PropertyMetadata(applicable only to[Entity,CompositeType]Metadata).This may not necessarily be a set but any data structure.
Let
TM1_S = O(Tn * Pn)be the space required to store properties.Let
TM1_Ptbe the time required to retrieve a property given this structure. Most likelyMap String PropertyMetadatawill be used, thusTM1_Pt = O(1).
Options for PropertyMetadata structure (referred to as PMn):
-
PM1 Stores enclosing
EntityMetadatain a field.Increases space complexity by
PM1_S = Tn * Pn. -
PM2 Stores property type
TypeMetadatain a field.Increases space complexity by
PM2_S = Tn * Pn.
Metadata Operations
Assume that DomainMetadata is the entry point for accessing metadata.
Operations on metadata (referred to as OPn):
-
OP1 Retrieve type metadata by class.
DomainMetadata.typeMetadata :: Class -> TypeMetadata -
OP2 Retrieve property metadata given its enclosing type and name.
If TM1, then:
EntityMetadata.propertyMetadata :: String -> PropertyMetadata CompositeTypeMetadata.propertyMetadata :: String -> PropertyMetadataOtherwise:
DomainMetadata.propertyMetadata :: EntityMetadata -> String -> PropertyMetadata DomainMetadata.propertyMetadata :: CompositeTypeMetadata -> String -> PropertyMetadata -
OP3 Access enclosing type of a property.
If PM1, then:
PropertyMetadata.enclosingType :: EntityMetadata | CompositeTypeMetadataOtherwise:
DomainMetadata.enclosingType :: PropertyMetadata -> EntityMetadata | CompositeTypeMetadata -
OP4 Access the type of a property.
If PM2, then:
PropertyMetadata.type :: TypeMetadataOtherwise:
DomainMetadata.propertyType :: PropertyMetadata -> TypeMetadata
TM1, PM1 and PM2 share the following:
- If present:
-
Metadata interface: simple.
OP2, OP3 and OP4 don't require access to
DomainMetadata. -
Space complexity: potential overhead if the same relationships are stored in the top-level metadata structure.
-
Time complexity: reduced.
No need to access
DomainMetadatawhich may perform the operaion inefficiently.
-
- If absent:
-
Metadata interface: increased complexity.
OP2, OP3 and OP4 require access to
DomainMetadata. -
More dynamic control over type-property relationships since they are not tied to
PropertyMetadata. Although, this advantage seems to bring no benefit, as type-property relationships remain fixed after metadata generation.
-
Top-level metadata structure
In describing how each structure performs operations outlined above it is assumed that neither of PM1, PM2, TM1 are present, as their presence would remove the need for the top-level metadata structure to support these operations.
Graph
Internal representation:
-
TypeMetadataas nodes. -
PropertyMetadataas arcssource -> target, wheresourceis the enclosing type andtargetis the property type.
Operations:
-
OP1 Achievable by additionally introducing
Map Class TypeMetadata. -
OP2 Achievable by:
a. Retrieve
[Entity,CompositeType]Metadatathrough OP1 if not given it directly. b. FindPropertyMetadatawith the given name among outgoing edges. -
OP3 Find
sourceof the arc. -
OP4 Find
targetof the arc.
Criteria:
-
Space complexity:
A * (Tn * Pn) + Tn + M.A-- space required for a single arc:-
source :: TypeMetadata -
target :: TypeMetadata -
value :: PropertyMetadata
M = Tn * 2-- space required for efficient OP1.Lower bound:
O(A * (Tn * Pn) + Tn). Upper bound:O(A * (Tn * Pn) + Tn + PM1_S + PM2_S + TM1_S).Leaves choices of TM1, PM1, PM2 open.
-
-
Time complexity. Let
NtandEtbe the time required to find a node and an arc in a graph respectively.
-
OP1 --
O(1) -
OP2
- With TM1:
O(1) + TM1_Pt - Without TM1:
O(1) + Nt + O(Pn / 2)
- With TM1:
-
OP3
- With PM1:
O(1) - Without PM1:
O(1) + Et + O(1)
- With PM1:
-
OP4
- With PM2:
O(1) - Without PM2:
O(1) + Et + O(1)
- With PM2:
Forest
Internal representation: Set TypeMetadata.
This representation requires TM1 to be able to access properties from types.
Operations:
-
OP1 Achievable by changing the internal representation to
Map Class TypeMetadata. -
OP2 Achievable by: a. Retrieve
[Entity,CompositeType]Metadatathrough OP1 if not given it directly. b. FindPropertyMetadatawith the given name in the set of properties. -
OP3 Search through all
TypeMetadatafor a type that encloses the given property.Inefficient without PM1.
-
OP4 Impossible without PM2.
Criteria:
-
Space complexity:
D + K + L.D-- space required for the top-level structure:-
Set TypeMetadata:Tn. -
Map Class TypeMetadata:Tn * 2.
K-- optional overhead that depends on the structure of property metadata. Given that we want both effiency of OP3 and possibility of OP4,K = PM1_S + PM2_S.L-- space required by TM1 to make this data structure work at all.L = TM1_SLower bound:
O(Tn + PM1_S + PM2_S + TM1_S). Upper bound:O(Tn + PM1_S + PM2_S + TM1_S).Strictly requires TM1, PM2. Requires PM1 for efficiency of OP3.
-
-
Time complexity:
-
OP1 --
O(1). -
OP2 --
O(1) + TM1_Pt, due to TM1. -
OP3
- With PM1:
O(1). - Without PM1:
O(Tn * Pn).
- With PM1:
-
OP4 --
O(1), due to PM2.
Conclusion
- Graph offers more freedom in choosing the structure for type and property metadata:
- Space complexity of type and property metadata can be reduced at the cost of increased time
complexity for operations. Although, this cost will depend on the graph implementation (
NtandEt).
- Space complexity of type and property metadata can be reduced at the cost of increased time
complexity for operations. Although, this cost will depend on the graph implementation (
- Forest imposes some structure options on type and property metadata: TM1, PM2. PM1 is almost as strictly imposed, since its absence results in highly inefficient OP3. Forest operations fair better in terms of time complexity thanks to the increased space complexity.