jena icon indicating copy to clipboard operation
jena copied to clipboard

jena-fuseki-access - Propagate request/service context

Open vtermanis opened this issue 3 years ago • 7 comments

What

Process request/service context in AccessControlledDataset the same way as is done in (parent) query processor, i.e. honour the server => dataset => endpoint context value priority order.

Why

Standard SPARQL queries honour the context of the endpoint (see docs) as well as setting of reuqest-specific timeout via the timeout URL parameter. These two features allow one to:

  • Override context values specified at server and/or dataset level (e.g. disable access to text indexing or change the default timeout for the endpoint)
  • Specify per-request timeouts

(See bottom of summary of example. See also mailing list thread.)

How

  • Use QueryExec to create the QueryExecution instead of QueryExecutionFactory (the latter does not consider HTTPActions's context)
  • Use same timeout parameter logic for pre-request timeouts as in base SPARQL query processor

Note: I am not convinced that the way I've updated AccessControlledDataset (and promoted a helper method from SPARQLQueryProcessor.java to public) is the right way to go. But I can confirm that the following use-case works:

  1. Define a dataset DS1 with jena:text indexing enabled
  2. Define a dataset DS2, with AccessControlledDataset wrapping DS1. (The access actual rules are irrelevant here.)
  3. Define service A exposing a query endpoint for DS1, with extended context: ja:context [ ja:cxtName "http://jena.apache.org/text#index" ; ja:cxtValue false ] ;
  4. Define service B the same as A, but for DS2

Current behaviour (pre-patch):

  • jena:text is exposed only with query endpoint of B. SPARQL queries against A do not match text-indexed properties.

Expected behaviour (post patch):

  • Neither A nor B match text-indexed properties

vtermanis avatar May 03 '22 10:05 vtermanis

From users@ thread https://lists.apache.org/thread/h0c81qjl8oc83yl2xf7xvt4l0pw4grrf

It looks like there is an issue as to whether the text dataset should push down the context setting for the index or not. Requiring endpoint configuration isn't so user-friendly.

This PR may contain a change to make anyway but this jena-fuseki-access uses context settings DataAccessCtl.symAuthorizationService itself so allowing the endpoints to modify context may be a security risk. Needs investigation - it's been a long time since I looked at the code!

An outstanding question from email:

  • Doesn't that [setting the context to an illegal value] cause warnings in the Fuseki log?

It'll need some tests.

afs avatar May 03 '22 11:05 afs

  • Doesn't that [setting the context to an illegal value] cause warnings in the Fuseki log?

(copied from thread reply):

Yes it does - three (expected) warnings from TextQueryPF:

  1. Context setting 'symbol:http://jena.apache.org/text#index'is not a TextIndex
  2. Failed to find the text index : tried context and as a text-enabled dataset
  3. No text index - no text search performed

It'll need some tests.

I'd be happy to assist with this (time permitting) if I'm pointed in the right direction.

vtermanis avatar May 03 '22 15:05 vtermanis

Apologies - I updated the description since it had a mistake. The expected behaviour section has changed to:

Current behaviour (pre-patch):

  • jena:text is exposed only with query endpoint of B. SPARQL queries against A do not match text-indexed properties.

Expected behaviour (post patch):

  • Neither A nor B match text-indexed properties

vtermanis avatar May 03 '22 15:05 vtermanis

(Changing to draft as suggested, since the best way to address this is still being debated.)

vtermanis avatar May 04 '22 08:05 vtermanis

@vtermanis -- there has been a bug fixes for general context handling with queries and these are now on the main branch. I don't think it immediately relates to your report but I wanted to make sure you are working against the right state of the codebase.

afs avatar Jul 14 '22 13:07 afs

@afs thanks for the heads-up, do you mean:

  • https://github.com/apache/jena/pull/1444

Is that the main one or other there others in particular? (Though it'll probably be clear when I rebase what's changed and look at the query servlet & processor code)

vtermanis avatar Jul 28 '22 19:07 vtermanis

#1444 is about transactions.

#1375 and parts of some others around that time.

afs avatar Jul 29 '22 07:07 afs