Server.Java icon indicating copy to clipboard operation
Server.Java copied to clipboard

TPF algorithm for index data source is incorrect if a requested triple pattern contains a variable multiple times

Open hartig opened this issue 10 years ago • 13 comments

The algorithm that generates TPF responses for the index data source is incorrect for cases in which the requested triple pattern contains a specific variable multiple times. For instance, suppose the client requests triple pattern (?x, foaf:knows, ?x) with variable ?x explicitly mentioned in the HTTP request (using both, the subject parameter and the object parameter). In response to this request, the existing algorithm would return all triples (from the dataset) that have foaf:knows in the predicate position; hence, the algorithm ignores the additional constraint that subject and object must be equivalent.

hartig avatar Dec 25 '15 16:12 hartig

Depends on the semantics of TPF i presume. Did we include this in the formalization?

mielvds avatar Dec 28 '15 09:12 mielvds

Yes we did. In the ISWC paper we (i) defined the notion of matching triples for a given triple pattern and (ii) formalized triple pattern selectors in terms of this notion. By these definitions, a triple (ex:Bob,foaf:knows,ex:Alice) is not a matching triple for the triple pattern (?x,foaf:knows,?x) and, thus, this triple must not be included in the data of the TPF of that triple pattern.

BTW, what we did not capture formally is the possibility of omitting elements of a triple pattern in a request.

@RubenVerborgh

hartig avatar Dec 28 '15 11:12 hartig

Yeah, that's what I was wondering about. {null, p, null} =?= {?x, p, ?x}

mielvds avatar Dec 28 '15 11:12 mielvds

Triple patterns with multiple occurrences of the same variable are part of TPF. They can have the following forms in theory:

  • ?s ?p/<p> ?s (plausible)
  • ?s ?s ?o/<o> (implausible)
  • ?s/<s> ?p ?p (implausible)
  • ?s ?s ?s (extremely implausible)

It is not decided yet how this will be encoded at the HTTP interface level. Currently, an empty or non-specified value for any of the components means “variable”. Hence, omissions are the same as (unique) variables.

However, this mechanism is evidently not sufficient to say that two variables are the same. For this, we could either use explicit variables (?x) or blank nodes (_:x).

Note that, in any case, the server is free to return more data than its selector prescribes, so even though (ex:Bob,foaf:knows,ex:Alice) is not a match of (?x,foaf:knows,?x), the server is free to include it in any fragment (as long as the triple is part of the dataset). This allows us to, for instance, also return labels.

RubenVerborgh avatar Dec 28 '15 11:12 RubenVerborgh

@mielvds (null,p,null) is to be interpreted as (?x,p,?y) where ?x and ?y are two arbitrary variables such that ?x != ?y.

@RubenVerborgh Regarding the encoding of variables that are specified explicitly at the HTTP interface level, the original code of Java.Server assumed they are given as strings that start with a question mark (e.g., see line 196 in https://github.com/LinkedDataFragments/Server.Java/blob/27d48fd6211f732b606625270f908b977b0e2d85/src/org/linkeddatafragments/servlet/BasicLdfServlet.java#L196) and turned them silently into unspecified variables. The code in my recent refactoring assumes the same encoding (strings that start with a question mark), but interprets such parameters as explicitly specified variables (see line 33 in https://github.com/LinkedDataFragments/Server.Java/blob/master/src/org/linkeddatafragments/util/TriplePatternElementParser.java#L33).

@RubenVerborgh Regarding your comment that "the server is free to return more data than its selector prescribes," please note that, by the formalization in our ISWC paper, adding more data is not possible.

hartig avatar Dec 28 '15 12:12 hartig

Regarding encoding, I think that the _ and ? cases should be the same. Blank variables are scoped to the entire request (IRI), so their name matters as well.

Agree on the formalization and still stand by that theoretical definition of TPF. Strictly speaking, a TPF with more data isn't a TPF. However, in practice, clients can (and should) be relaxed about this. From the formalization viewpoint, the extra data could just be metadata.

RubenVerborgh avatar Dec 28 '15 20:12 RubenVerborgh

I agree that blank nodes in TPF requests should be dealt with in the same way as (specific, named) variables are dealt with. Therefore, I would extend the TriplePatternElement interface by adding the following three methods:

  • boolean isNamedVariable(): returns true if the element is a specific variable that has a name (i.e., it is denoted by a string that begins with a question mark)
  • boolean isUnnamedVariable(): returns true if the element is a specific variable that does not have a name (i.e., it is denoted by a blank node)
  • TermType asAnonymousVariable(): returns a representation of the element as a blank node (assuming it is a specific, but non-named variable).

@RubenVerborgh, @mielvds OK?

hartig avatar Dec 28 '15 22:12 hartig

Ok. I just wonder if we can just call isUnnamedVariable() isBlank() or isAnonymous() instead, since that's what we are talking about. I agree it makes sense because of the variable context, but on the other hand, a new 'name' might be confusing.

At least it could be isAnonymousVariable() to be consistent with asAnonymousVariable()?

mielvds avatar Dec 29 '15 08:12 mielvds

Can't the parser just autoconvert non-named variables to named variables? After all, ?subject= is a shortcut for ?subject=?x.

RubenVerborgh avatar Dec 29 '15 09:12 RubenVerborgh

@RubenVerborgh I think that automatically converting unspecified variables into named variables is not a good idea because it may cause problems if the named variable that you would assign to an omitted parameter happens to be the value of another parameter in the request. For instance, if we interpret subject= as subject=?x, we have a problem if the HTTP request contains object=?x. Similarly, such an automatic naming of unspecified variables may cause problems for something like the bindings-restricted TPF interface that Carlos and I are working on.

hartig avatar Dec 30 '15 11:12 hartig

Would this also be a problem if the assignment was done intelligently? Like: always generate a variable name that is not already used?

RubenVerborgh avatar Dec 30 '15 12:12 RubenVerborgh

If it was only for pure TPFs, such an intelligent assignment would work (but it might have a slight impact on the server performance because it is extra work). However, I do not have an idea of how to do it in a generic way; that is, in a way that would be guaranteed to be correct even when used by external extensions for TP-based interfaces that assume additional parameters with variables (such as our bindings-restricted TPF interface).

hartig avatar Dec 30 '15 13:12 hartig

The TriplePatternElement interface has been extended as discussed above. However, the actual bug that this ticket is about still exists.

hartig avatar Jan 04 '16 22:01 hartig