Unexpected results with multiple optionals at the same level
It seems like having multiple optionals in a matcha query returns unexpected results. E.g. in this example (from an internal app), the query returns different results depending on which optional comes first:
> (matcha/construct-1 {:notation ?notation
:name ?name
:company-number ?company_number
:company-uri ?company_uri
:expiry-date ?expiry_date}
[[uri skos:notation ?notation]
[uri reg:holder ?holder]
[?holder foaf:name ?name]
(matcha/optional [[?holder org:identifier ?company_number]
[?holder owl:sameAs ?company_uri]])
(matcha/optional [[uri reg:expiryDate ?expiry_date]])]
matcha-db)
> {:notation "CBDU195810",
:name "1ST WASTE MANAGEMENT CONSULTANTS LTD",
:company-number "03736942",
:company-uri #object[java.net.URI 0x2cbb354 "http://business.data.gov.uk/id/company/03736942"],
:expiry-date _0}
vs
> (matcha/construct-1 {:notation ?notation
:name ?name
:company-number ?company_number
:company-uri ?company_uri
:expiry-date ?expiry_date}
[[uri skos:notation ?notation]
[uri reg:holder ?holder]
[?holder foaf:name ?name]
(matcha/optional [[uri reg:expiryDate ?expiry_date]])
(matcha/optional [[?holder org:identifier ?company_number]
[?holder owl:sameAs ?company_uri]])]
matcha-db)
> {:notation "CBDU195810",
:name "1ST WASTE MANAGEMENT CONSULTANTS LTD",
:company-number _0,
:company-uri _1,
:expiry-date #object[java.time.LocalDate 0x2802054f "2020-09-20"]}
Not sure what the problem is, but this result was surprising.
Rewriting this as nested optionals should work (nested optionals is explicitly supported like in SPARQL):
(matcha/construct-1 {:notation ?notation
:name ?name
:company-number ?company_number
:company-uri ?company_uri
:expiry-date ?expiry_date}
[[uri skos:notation ?notation]
[uri reg:holder ?holder]
[?holder foaf:name ?name]
(matcha/optional [[?holder org:identifier ?company_number]
[?holder owl:sameAs ?company_uri]
(matcha/optional [[uri reg:expiryDate ?expiry_date]])])]
matcha-db))
It looks like there may be a problem with your initial query though, probably when we parse the query forms for optionals, and lift their variables out a level. We might not be expecting two or more optionals at the same level, and I think we should.
I suspect they might be stomping on each other; but will need to view the query macro expansions.
This is still a problem on 0.1.12
Some data
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix drl: <http://environment.data.gov.uk/asset-management/def/drl/> .
@prefix drl-aims: <http://environment.data.gov.uk/asset-management/def/drl/aims/> .
@prefix drl-edit: <http://environment.data.gov.uk/asset-management/def/drl/edit/> .
@prefix drl-edit-type: <http://environment.data.gov.uk/asset-management/def/drl/edit/type/> .
@prefix drl-id: <http://environment.data.gov.uk/asset-management/id/drl/> .
drl-id:Beacon-optional-fields a drl:Attribute;
drl:attributeOf drl-id:Beacon;
drl:delivery "fff";
drl:format drl-id:format-8978f27f-d038-4b51-a49c-233628c1881e;
drl:lastEdit drl-edit:f1de288f-3943-4b88-bc94-b76169462a8f;
drl:mandatory false;
drl:mapping "fff";
dcterms:description "Optional Fields"@en;
rdfs:label "Optional Fields"@en .
drl-edit:f1de288f-3943-4b88-bc94-b76169462a8f drl-edit:type drl-edit-type:created .
drl-id:format-8978f27f-d038-4b51-a49c-233628c1881e a drl:ReferenceFormat;
drl:referenceCode "fff";
drl:referenceType "fff";
rdfs:label "Reference" .
drl-id:Beacon-picklist-attribute a drl:Attribute;
drl:attributeOf drl-id:Beacon;
drl:delivery "11";
drl:format drl-id:AssetCategoryLand;
drl:lastEdit drl-edit:c1183385-ea7a-45b5-a800-f6406267cd50;
drl:mandatory false;
drl:mapping "111";
drl:purpose "11";
dcterms:description "asdf"@en;
rdfs:label "Picklist Attribute"@en .
drl-edit:c1183385-ea7a-45b5-a800-f6406267cd50 drl-edit:type drl-edit-type:created .
There are two Attributes in the data, and there are three optional things.
- Mandatory present in both of these things
- Purpose is present in just one
- Edit date is not present in either
I've tried building with non-nested and nested optionals.
Not Nested
It looks like build use the first matching optional,
Putting mandatory first excludes the purpose on the item that should have it
(matcha/build [:id ?id]
{:mandatory ?mandatory
:purpose ?purpose
:edit-date ?editDate}
[[?id rdf:a drldef:Attribute]
[?id drldef:last-edit ?edit]
[?edit drl-edit:type ?editType]
(matcha/optional
[[?id drldef:mandatory ?mandatory]])
(matcha/optional
[[?id drldef:purpose ?purpose]])
(matcha/optional
[[?edit drl-edit:date ?editDate]])]
db)
=>
({:id
#object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"],
:mandatory false}
{:id
#object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"],
:mandatory false})
Swapping purpose and mandatory returns the same item with it's purpose but without the mandatory
(matcha/build [:id ?id]
{:mandatory ?mandatory
:purpose ?purpose
:edit-date ?editDate}
[[?id rdf:a drldef:Attribute]
[?id drldef:last-edit ?edit]
[?edit drl-edit:type ?editType]
(matcha/optional
[[?id drldef:purpose ?purpose]])
(matcha/optional
[[?id drldef:mandatory ?mandatory]])
(matcha/optional
[[?edit drl-edit:date ?editDate]])]
db)
=>
({:id
#object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"],
:purpose "11"}
{:id
#object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"],
:mandatory false})
If I put the date optional first it matches the next one whichever it is following the same pattern as above.
Nested Optionals
Nested it seems to short circuit out when the first one in the vector does not match
(matcha/build [:id ?id]
{:mandatory ?mandatory
:purpose ?purpose
:edit-date ?editDate}
[[?id rdf:a drldef:Attribute]
[?id drldef:last-edit ?edit]
[?edit drl-edit:type ?editType]
(matcha/optional
[[?edit drl-edit:date ?editDate]
(matcha/optional
[[?id drldef:mandatory ?mandatory]
(matcha/optional
[[?id drldef:purpose ?purpose]])])])]
db)
=>
({:id
#object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"]}
{:id
#object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"]})
(matcha/build [:id ?id]
{:mandatory ?mandatory
:purpose ?purpose
:edit-date ?editDate}
[[?id rdf:a drldef:Attribute]
[?id drldef:last-edit ?edit]
[?edit drl-edit:type ?editType]
(matcha/optional
[[?id drldef:purpose ?purpose]
(matcha/optional [[?id drldef:mandatory ?mandatory]])])]
db)
=>
({:id
#object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"],
:mandatory false,
:purpose "11"}
{:id
#object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"]})
I tried nesting deeper and deeper and all sorts of different permutations without any success.
Got it!
Somewhat.
I am able to achieve to match correctly with a non-nested series of all permutation starting from the biggest and working down. This is unwieldy with my three optionals but works
(matcha/build [:id ?id]
{:mandatory ?mandatory
:purpose ?purpose
:edit-date ?editDate}
[[?id rdf:a drldef:Attribute]
[?id drldef:last-edit ?edit]
[?edit drl-edit:type ?editType]
(matcha/optional [[?id drldef:mandatory ?mandatory]
[?id drldef:purpose ?purpose]
[?edit drl-edit:date ?editDate]])
(matcha/optional [[?id drldef:mandatory ?mandatory]
[?id drldef:purpose ?purpose]])
(matcha/optional [[?id drldef:mandatory ?mandatory]
[?edit drl-edit:date ?editDate]])
(matcha/optional [[?id drldef:purpose ?purpose]
[?edit drl-edit:date ?editDate]])
(matcha/optional [[?id drldef:mandatory ?mandatory]])
(matcha/optional [[?id drldef:purpose ?purpose]])
(matcha/optional [[?edit drl-edit:date ?editDate]])]
db)
=>
({:id
#object[java.net.URI 0x2aee0e8b "http://environment.data.gov.uk/asset-management/id/drl/Beacon-picklist-attribute"],
:mandatory false,
:purpose "11"}
{:id
#object[java.net.URI 0x69843f6a "http://environment.data.gov.uk/asset-management/id/drl/Beacon-optional-fields"],
:mandatory false})
The final bit about permutations there is interesting. I think I know roughly what causes it, but it'll need some time to investigate how exactly to fix it.
Essentially I think the problem is that the optionals interfer with each other (making them not optional to each other).
For example:
(grafter.matcha.alpha/build [:id ?id] [[?id :p ?o] (grafter.matcha.alpha/optional [[?id :p2 ?o2]])])
Expands correctly to:
(->>
(do
(seq
(clojure.core.logic.pldb/with-db
(grafter.matcha.alpha/index-if-necessary
db-or-idx__50404__auto__)
(clojure.core.logic/run*
[?id]
(clojure.core.logic/fresh
[?o ?o2]
(grafter.matcha.alpha/triple ?id :p ?o)
(clojure.core.logic/conda
[(clojure.core.logic/conde
[(grafter.matcha.alpha/triple ?id :p2 ?o2)])]
[clojure.core.logic/succeed]))))))
(grafter.matcha.alpha/unify-solutions '[?id])
(grafter.matcha.alpha/replace-vars-with-vals [:id '?id])
(grafter.matcha.alpha/group-subjects)
seq)
But this:
(grafter.matcha.alpha/construct [:id ?id]
[[?id :p ?o]
(grafter.matcha.alpha/optional [[?id :p2 ?o2]])
(grafter.matcha.alpha/optional [[?id :p3 ?o2]])])
Expands to this:
(->>
(do
(seq
(clojure.core.logic.pldb/with-db
(grafter.matcha.alpha/index-if-necessary
db-or-idx__50404__auto__)
(clojure.core.logic/run*
[?id]
(clojure.core.logic/fresh
[?o ?o2]
(grafter.matcha.alpha/triple ?id :p ?o)
(clojure.core.logic/conda
[(clojure.core.logic/conde
[(grafter.matcha.alpha/triple ?id :p2 ?o2)])]
[(clojure.core.logic/conde
[(grafter.matcha.alpha/triple ?id :p3 ?o2)])]
[clojure.core.logic/succeed]))))))
(grafter.matcha.alpha/unify-solutions '[?id])
(grafter.matcha.alpha/replace-vars-with-vals [:id '?id])
(grafter.matcha.alpha/group-subjects)
seq)
I think that fresh o2 should be lexically scoped over each clause of the conde. This is basically what I meant by the comment:
probably when we parse the query forms for optionals, and lift their variables out a level. We might not be expecting two or more optionals at the same level, and I think we should. I suspect they might be stomping on each other;
So I suspect it needs to be something like:
(->>
(do
(seq
(clojure.core.logic.pldb/with-db
(grafter.matcha.alpha/index-if-necessary
db-or-idx__50404__auto__)
(clojure.core.logic/run*
[?id]
(clojure.core.logic/fresh
[?o]
(grafter.matcha.alpha/triple ?id :p ?o)
(clojure.core.logic/conda
[(clojure.core.logic/conde
[(clojure.core.logic/fresh [?o2] (grafter.matcha.alpha/triple ?id :p2 ?o2))])]
[(clojure.core.logic/fresh [?o2] (clojure.core.logic/conde [(grafter.matcha.alpha/triple ?id :p3 ?o2)]))]
[clojure.core.logic/succeed]))))))
(grafter.matcha.alpha/unify-solutions '[?id])
(grafter.matcha.alpha/replace-vars-with-vals [:id '?id])
(grafter.matcha.alpha/group-subjects)
seq)