sqlparse icon indicating copy to clipboard operation
sqlparse copied to clipboard

Extracting table names from more complex queries

Open michaelshobbs opened this issue 11 years ago • 6 comments

I've been working with the extract_table_names.py example for about a day now to try and parse the following queries. I have been able to get sqlparse to return all the table names but it also returns the order by and group by IdentifierLists. Any thoughts would be appreciated.

SELECT a.time_updated_server/1000,
content,
nick,
name
FROM table1 a
JOIN table2 b ON a.sender_id = b.user_id
JOIN table3 c ON a.channel_id = c.channel_id
JOIN table4 d ON c.store_id = d.store_id
WHERE sender_id NOT IN
  (SELECT user_id
   FROM table5
   WHERE store_id IN ('agent_store:1',
                                     'ask:1'))
   AND to_timestamp(a.time_updated_server/1000)::date >= '2014-05-01'
   GROUP BY 1,2,3,4
   HAVING sum(1) > 500
   ORDER BY 1 ASC
CREATE TEMPORARY TABLE test_table1
  AS (select * from test_table2)
SELECT count(*) from test_table1
WHERE id in (SELECT id from test_table2)

michaelshobbs avatar Dec 06 '14 21:12 michaelshobbs

I have something that 'works' for my test cases (including the example sql). I'd love some feedback as I know this probably isn't the 'right' way. This also handles multiple statements in one string.

import itertools
import sqlparse

from sqlparse.sql import IdentifierList, Identifier
from sqlparse.tokens import Keyword, DML


def is_subselect(parsed):
    if not parsed.is_group():
        return False
    for item in parsed.tokens:
        if item.ttype is DML and item.value.upper() == 'SELECT':
            return True
    return False


def extract_from_part(parsed):
    from_seen = False
    for item in parsed.tokens:
        if item.is_group():
            for x in extract_from_part(item):
                yield x
        if from_seen:
            if is_subselect(item):
                for x in extract_from_part(item):
                    yield x
            elif item.ttype is Keyword and item.value.upper() in ['ORDER', 'GROUP', 'BY', 'HAVING']:
                from_seen = False
                StopIteration
            else:
                yield item
        if item.ttype is Keyword and item.value.upper() == 'FROM':
            from_seen = True


def extract_table_identifiers(token_stream):
    for item in token_stream:
        if isinstance(item, IdentifierList):
            for identifier in item.get_identifiers():
                value = identifier.value.replace('"', '').lower()
                yield value
        elif isinstance(item, Identifier):
            value = item.value.replace('"', '').lower()
            yield value


def extract_tables(sql):
    # let's handle multiple statements in one sql string
    extracted_tables = []
    statements = list(sqlparse.parse(sql))
    for statement in statements:
        if statement.get_type() != 'UNKNOWN':
            stream = extract_from_part(statement)
            extracted_tables.append(set(list(extract_table_identifiers(stream))))
    return list(itertools.chain(*extracted_tables))

michaelshobbs avatar Dec 06 '14 22:12 michaelshobbs

Thanks I will be using this :+1:

ussoftwarepro avatar Apr 11 '17 20:04 ussoftwarepro

Great job, however during time something must be changed and I had some problems. This is a revisited code, it seems to work, but I didn't study it very deeply:

import itertools
import sqlparse

from sqlparse.sql import IdentifierList, Identifier
from sqlparse.tokens import Keyword, DML


def is_subselect(parsed):
    if not parsed.is_group:
        return False
    for item in parsed.tokens:
        if item.ttype is DML and item.value.upper() == 'SELECT':
            return True
    return False


def extract_from_part(parsed):
    from_seen = False
    for item in parsed.tokens:
        if item.is_group:
            for x in extract_from_part(item):
                yield x
        if from_seen:
            if is_subselect(item):
                for x in extract_from_part(item):
                    yield x
            elif item.ttype is Keyword and item.value.upper() in ['ORDER', 'GROUP', 'BY', 'HAVING', 'GROUP BY']:
                from_seen = False
                StopIteration
            else:
                yield item
        if item.ttype is Keyword and item.value.upper() == 'FROM':
            from_seen = True


def extract_table_identifiers(token_stream):
    for item in token_stream:
        if isinstance(item, IdentifierList):
            for identifier in item.get_identifiers():
                value = identifier.value.replace('"', '').lower()                
                yield value
        elif isinstance(item, Identifier):
            value = item.value.replace('"', '').lower()
            yield value


def extract_tables(sql):
    # let's handle multiple statements in one sql string
    extracted_tables = []
    statements = list(sqlparse.parse(sql))
    for statement in statements:
        if statement.get_type() != 'UNKNOWN':
            stream = extract_from_part(statement)
            extracted_tables.append(set(list(extract_table_identifiers(stream))))
    return list(itertools.chain(*extracted_tables))

DuccioFabbri avatar Aug 06 '19 08:08 DuccioFabbri

@DuccioFabbri

Thank you! This was so helpful for parsing ~150 SQL queries, some of which are rather complex.

I found two scenarios that didn't work well

select a from foo join (select a from bar) as b on b.id=foo.id

with mycte as (select id from foo) select b.id from bar as b join foo as f on b.id=f.id

I enhanced your code (that you posted here) by adding unit tests.

az0 avatar May 26 '20 23:05 az0

Thank you for contribute your code! I'm also working on a project which need to extract the tables form sql queries. I quickly test your code with a simple query, but could not get any output. Can you give me some help on what I missed?Appreciate on your help!``

I think we are getting off topic

extract_tables(sql)

Have you tried printing the return value?

Again, I think this discussion should continue elsewhere. If you are new to Python, try a forum for beginners.

az0 avatar May 28 '20 00:05 az0

Hi Guys,

Sorry this might be an old thread. I am trying to parse a statement similar to the following

SELECT * FROM (SELECT * from t) s But the output using the code #https://github.com/andialbrecht/sqlparse/issues/157#issuecomment-518572450 is as follows: ['(select']

Have tried a lot of different changes to the code, but no success.

tejkm avatar Mar 18 '21 11:03 tejkm