prql icon indicating copy to clipboard operation
prql copied to clipboard

Consider Datalog-like logic variable based JOINs

Open justjake opened this issue 3 years ago • 2 comments

I don't have the time to write a detailed proposal at the moment, but I think you should seriously consider added Datalog-like logic variables as an alternative syntax for JOIN. The datalog style can present a substantial simplification of graph traversal use-case compared to SQL.

There's an example of a traversal in Datalog, versus the equivalent SQL (SQLite dialect):

image

(Generated by my toy Datalog to SQL compiler [github])

Related concepts on the subject:

  • The existing SPARQL query language tries to mash up SQL-like syntax with Datalog's join syntax
  • Logica is a Datalog-like language from Google that compiles to several SQL dialects: https://logica.dev/
  • Mentat is a Datalog -> SQLite compiler for a Datomic-like Datalog diaglect. It was originally developed by Mozilla, now independent: https://github.com/qpdb/mentat
  • Well-publicized tutorial on the Clojure/Datomic dialect: http://www.learndatalogtoday.org/

justjake avatar Jun 28 '22 01:06 justjake

Thanks for the issue! I'm a big fan of Datalog and have followed logica for a while.

How would you see this working for simple joins? Do you think it's possible to design something that is familiar enough for users with only relational experience? What would a design without access to WITH RECURSIVE look like?

My sense is that analytical tables are increasingly de-normalized, and so highly complex joins ("find 1st cousins given parent-child relationships") are less important than making more standard joins easy, and allowing for more complex join conditions ("join orders & addresses, without exploding on duplicate addresses").

I'll check out Percival — that looks really cool. I just saw the page was live — awesome!

max-sixty avatar Jun 28 '22 02:06 max-sixty

I don't know Datalog, but this seems quite concise way of expressing joins?

@justjake Could you help me understand by expressing your example about in a form of a function that takes a table an input?

In pseudo-python:


edge = { 'a': [...], 'b': [...] }

def join_path(table):
   ?

join_path(edge)

We need this, because ultimately, our join must be defined as function that is applied to a whole table.

aljazerzen avatar Jun 29 '22 17:06 aljazerzen