logica icon indicating copy to clipboard operation
logica copied to clipboard

Plain run functions for basic python script

Open relaxin101 opened this issue 1 year ago • 8 comments

Hello, I'm trying to write some scripts where I wanted to try using logica because the whole WITH AS syntax of plain sql got a bit heavy

While looking through the tutorials, I couldn't find a plain function to take a connection and a program and just run it in a python script, so I created my own:

def logica_psql(rules:str, target:str, conn: psycopg2.extensions.connection):
    """
        rules: The logica program
        target: The target predicate to be selected
        conn: A psycopg2 connection
        return: a cursor containing the results
    """
    program = universe.LogicaProgram(parse.ParseFile(rules, import_root=None)['rule'])

    sql = program.FormattedPredicateSql(target)

    return psql_logica.PostgresExecute(sql, conn)


def logica_sqlite3(rules: str, target: str, conn: sqlite3.Connection):
    """
        rules: The logica program
        target: The target predicate to be selected
        conn: A sqlit3 connection
        return: a cursor containing the results
    """
    program = universe.LogicaProgram(parse.ParseFile(rules, import_root=None)['rule'])

    sql = program.FormattedPredicateSql(target)
    cursor = conn.cursor()
    cursor.execute(sql)
    return cursor

Now I want to make sure that a) I didn't miss some builtin functionality that handles this. If not, does it make sense to add these kinds of standard functions? b) I didn't make some major mistake in the logic here, especially for the postgres function, since I got lost with how postgres is handled in the library

Thanks for any help :D

relaxin101 avatar Feb 06 '25 10:02 relaxin101

Hello @relaxin101 ! 😁 Did you try RunPredicateFromString? That one would deremine engine from the program, but also accepts connection.

Your functions look good. RunPredicateFromString catches errors and produces a nice message in case say syntax errors. But if you prefer to debug your code in a separate context these functions are perfect.

EvgSkv avatar Feb 06 '25 23:02 EvgSkv

Hello @relaxin101 ! 😁 Did you try RunPredicateFromString? That one would deremine engine from the program, but also accepts connection.

Your functions look good. RunPredicateFromString catches errors and produces a nice message in case say syntax errors. But if you prefer to debug your code in a separate context these functions are perfect.

Thanks for the quick answer: I checked it out and I think I'd prefer a cursor instead of a dataframe but I knew I missed something... thanks for the pointer! Two questions though: -) This sqlite parsing (essentially inserting newlines, no?) seems kinda weird, shouldn't that be handled by the sql compiler and not by a run function? Will my code break by not handling that? -) What purpose does the psql_execute function serve? It isn't used in the RunQueryPandas function as far as i could tell

Final note: Do you want/need help with some code cleanup? e.g. parser_py.parse.parseFile() taking the actual string as an input was quite confusing initially

relaxin101 avatar Feb 06 '25 23:02 relaxin101

(sorry for delay, saw the question only today)

This sqlite parsing (essentially inserting newlines, no?) seems kinda weird, shouldn't that be handled by the sql compiler and not by a run function? Will my code break by not handling that?

Do you refer to this parsing? That one is indeed hack-y. The reason it's done is that most SQL dialects allow you to run multiple statements and return you result of the last one. However SQLite is the only (among supported ones) being so strict that it requires you to send statements one by one. Logica compiler can break the result by statements, and it's used when running workflows in CoLab. But this function is for running a single long statement and it specialcases SQLite, as other engines can run the statement just fine. On a philosophycal note: Readable intuitive code is important to me, but I have come to conclusioin that sharp edges are unavoidable. And in this case in particular I am not certain that there is a right way of handling the difference between engines, and especially that it's easy to figure out what the right way is. So I am in peace with this weirdness of SQLite handling :-)

And this made me realize: your SQLite running function has a limitation - it won't run for statements that write to DB, for the reason described above. Example: CoLab.

What purpose does the psql_execute function serve? It isn't used in the RunQueryPandas function as far as i could tell

Again I'm not sure to which function you refer :-) This one? Looks like it creates connection, while RunQueryPandas does not.

Do you want/need help with some code cleanup? e.g. parser_py.parse.parseFile() taking the actual string as an input was quite confusing initially

Thanks for asking! Code cleanup is not a high priority at this point. Small comments here and there may help, but no refactoring will probably not be worth it at this point. I'd rather invest in taking Logica as a tool and applying it to domain specific problems that could make communities excited about the opportunities that it opens.

EvgSkv avatar Feb 11 '25 18:02 EvgSkv

No worries, I missed your answer as well after all 😅

ad SQLite parsing: Yes, that's what I'm referring to, thanks for the clarification!

ad PSQL execution: Yes, also the function I was referencing, but I was more confused by this segment. What purpose does this logical table/record/idk have? It seems postgres-specific and I can't wrap my head around what purpose it's supposed to serve

Thanks for asking! Code cleanup is not a high priority at this point. Small comments here and there may help, but no refactoring will probably not be worth it at this point. I'd rather invest in taking Logica as a tool and applying it to domain specific problems that could make communities excited about the opportunities that it opens.

Ok, thank you for clarifying. The reason I'm asking is that I'm considering to incorporate logica into my Bachelor's thesis and was wondering how stable the project structure would be since I think I'll kinda have to hook into some internals. But if changes are going to remain minor for the time being, that should be fine. In case vector similarity search is something you want to incorporate in the near future, please let me know, that'd be particularly interesting to me :)

relaxin101 avatar Feb 13 '25 09:02 relaxin101

Yes, also the function I was referencing, but I was more confused by this segment. What purpose does this logical table/record/idk have? It seems postgres-specific and I can't wrap my head around what purpose it's supposed to serve

Alright, let's quote that piece of code here:

    for t in types:
      if t != 'logicarecord893574736':  # Empty record.
        psycopg2.extras.register_composite(t, cursor, globally=True)

We are registring the types that are used in the program in the pyscopg2. This is done so that when the data is retrieved into a dataframe, then composite types in the output will be python dictionaries. Otherwise as far as I recall they are not easy to read from Python.

The reason I'm asking is that I'm considering to incorporate logica into my Bachelor's thesis and was wondering how stable the project structure would be since I think I'll kinda have to hook into some internals. But if changes are going to remain minor for the time being, that should be fine. In case vector similarity search is something you want to incorporate in the near future, please let me know, that'd be particularly interesting to me :)

Yeah, I think it's safe to say that no major changes are coming in near months. It could also be practical for the purpses of Bachelor's thesis to work with a snapshot of the Logica code.

Is your bachelor's about the vector similarity search? Is it the RAG of LLMs area, or something else? I was thinking in this direction, but didn't have any specific plans yet. In any case, I'd be very curious to read your Bachelors thesis!

Let me know if you have further questions!

EvgSkv avatar Feb 19 '25 06:02 EvgSkv

We are registring the types that are used in the program in the pyscopg2. This is done so that when the data is retrieved into a dataframe, then composite types in the output will be python dictionaries. Otherwise as far as I recall they are not easy to read from Python.

Ahhhh, ok that makes a lot of sense, thank you!

Yeah, I think it's safe to say that no major changes are coming in near months. It could also be practical for the purpses of Bachelor's thesis to work with a snapshot of the Logica code.

Yes, that's what I basically did, just wanted to make sure there wasn't anything big coming, a service I used just deprecated their entire api recently ._.

Is your bachelor's about the vector similarity search? Is it the RAG of LLMs area, or something else? I was thinking in this direction, but didn't have any specific plans yet. In any case, I'd be very curious to read your Bachelors thesis!

Phew, I'm still figuring that out myself what it's going to end up as. The basic idea was to use the structure of a Knowledge Graph as basis for a GNN over the text embeddings of a document corpus and trying to produce a kind of "context-aware" similarity search, i.e. if two documents are linked in the graph searching for words that appear in only one of them should bump the ranking of the other as well just because of their influence on each otother'smbeddings.

How does that tie in to logica you might ask? I essentially just found the idea of not having to write another graph query in SQL very appealing because the data source I'm accessing is a relational database, and thought it might be worth integrating. However, I honestly don't know if the logica aspect will make the final cut as the GNN stuff is getting more tedious than I expected, and I stumbled over logica after most SQL mappings were already written :(

If you're still interested, I'd be glad to get your thoughts ^^

Let me know if you have further questions! Will do, thank you very much!

relaxin101 avatar Mar 13 '25 16:03 relaxin101

Sound like a challenging project :-) If/when you publish anything on this feel free to drop a link here, especially if you end up making any use of Logica 😁

EvgSkv avatar Mar 26 '25 11:03 EvgSkv

Actually I'm trying to write the main query for the vector search using logica - I'm kinda stuck on the following though:

I'm using a vector extension to sqlite3, sqlite_vec. This extension exposes additional vector column types and functions to sqlite3, which I am using to order the results. I tried writing this in logica, but I got kinda stuck. This is the basic idea:

        sql = logica_compile(f"""
                     @Engine("sqlite");
                     @OrderBy(Result, "vec_distance_cosine(?, text_embedding) ");
                     @Limit(Result, {page_size});
                     Result(uuid, type) :- node_embeddings(rowid: rowid, text_embedding: text_embedding),
                     nodes(rowid: rowid,uuid: uuid,type: type), type == "document";
                     """,
        "Result")
        sql = sql[:-1] + f" OFFSET {page_offset};" # pagination hack since there seems to be no @Offset annotation

Is there a way to also get a column "distance" in my Result predicates? My idea was something like this

Result(uuid, type, distance: vec_distance_cosine(?, text_embedding)) :- ...

However, that doesn't work because logica obviously doesn't have any value vec_distance_cosine in the rule body. Using it as a predicate same as the others also doesn't seem to work, because obviously logica doesn't know the function. So I guess I'd either be looking for a way to escape this segment vec_distance_cosine(?, text_embedding) and tell logica to not worry about it and just treat it as a correct value or to expose this function vec_distance_cosine to logica. (With that running, I'd like to add some additional logical filtering on top, which I think will illustrate logica's strengths nicely for my thesis).

relaxin101 avatar Mar 28 '25 15:03 relaxin101