databricks-sql-python icon indicating copy to clipboard operation
databricks-sql-python copied to clipboard

Connector reads 0 rows although Cluster returned results

Open pimonteiro opened this issue 1 year ago • 18 comments

As the title says, I'm having an issue where the SQL connector, upon executing .fetchall() returns 0 rows, while the Cluster that ran the query returns approx. 151,653 rows - retrieved from cached in 500ms.

    with adb_sql.connect(
        server_hostname=db_creds_dict["db_name"],
        http_path=db_creds_dict["db_path"],
        access_token=db_creds_dict["db_token"],
    ) as conn:
        with conn.cursor() as cursor:
            if parameters is None:
                cursor.execute(query)
            else:
                cursor.execute(query.format(*parameters))
            data = pd.DataFrame(
                cursor.fetchall(),
                columns=[desc[0] for desc in cursor.description],
            )

In other situations where the same query returns fewer rows (i.e. 10s to return 28k rows), the connection reads correctly.

Python: 3.8.12 Package Version: 3.1.1 Databricks SQL Warehouse configuration:

  • DBSQL v. 2023.50
  • Serverless
  • X-Small

pimonteiro avatar Apr 01 '24 13:04 pimonteiro

@andrefurlan-db thoughts?

benc-db avatar Apr 01 '24 15:04 benc-db

@pimonteiro

  • if you run other queries - do you see the same behavior? Do you think it may depend on rows count?
  • if you run the same query but limit rows count explicitly (say, to 10 rows or so) - does the behavior change?
  • did you check if the result of cursor.fetchall() itself is empty, or you checked pandas dataframe and it's empty?
  • if cursor.fetchall() actually returns data but dataframe if empty - have you checked what's in cursor.description?
  • if you pass use_cloud_fetch = false to adb_sql.connect( - does it change anything?

kravets-levko avatar Apr 01 '24 17:04 kravets-levko

  • I think it does depend on the row count. I did a test where I executed the problematic query with a limit of 100/1.000/10.000 and it worked;
  • cursor.fetchall() is empty itself;
  • cursor.description contains info about the columns
  • using the flag use_cloud_fetch = false seems to work. From what I know, it should work better with the cloud fetch on, given the amount of rows, correct?

pimonteiro avatar Apr 03 '24 09:04 pimonteiro

Thank you @pimonteiro! Yes, CloudFetch should improve handling of very large results. I asked you to disable it to narrow down the scope of the issue. If you're able to get your data with use_cloud_fetch = false (+ considering that with smaller results everything is fine) - then probably there's some bug in CloudFetch-related code. I need to poke around, and when I'll have other questions - I'll get back to you.

P.S. I see you provided library version and warehouse details - which is very nice, thank you! Can you please also tell us if you're using AWS or Azure workspace. Thanks!

kravets-levko avatar Apr 03 '24 10:04 kravets-levko

Thank you @pimonteiro! Yes, CloudFetch should improve handling of very large results. I asked you to disable it to narrow down the scope of the issue. If you're able to get your data with use_cloud_fetch = false (+ considering that with smaller results everything is fine) - then probably there's some bug in CloudFetch-related code. I need to poke around, and when I'll have other questions - I'll get back to you.

P.S. I see you provided library version and warehouse details - which is very nice, thank you! Can you please also tell us if you're using AWS or Azure workspace. Thanks!

Sounds good, we will in the meantime try and reduce the query results size, optimize it in order to continue development. I assume there's nothing wrong in, for now, disabling use_cloud_fetch?

And lastly, I'm using Azure Workspace. That should have been on the opening line of the ticket, I do apologize :)

pimonteiro avatar Apr 03 '24 10:04 pimonteiro

Yes, you can just disable CloudFetch and go on. You'll still be able to get large results, just maybe less efficient than with CloudFetch enabled, that's it

kravets-levko avatar Apr 03 '24 12:04 kravets-levko

Any update on this?

pimonteiro avatar Apr 16 '24 10:04 pimonteiro

Hi, we're facing the same issue. Any update on this?

akshay-s-ciq avatar May 02 '24 13:05 akshay-s-ciq

@pimonteiro @akshay-s-ciq sorry, no much updates for now. I'm still trying to figure this out. However, recently we've got a very similar bug report but for Go driver. But we're still not sure what's going on.

Also, considering all the mentioned above, may I ask you to run same query but using a Nodejs connector? If any of you volunter to help - I can prepare a test project for you

kravets-levko avatar May 28 '24 18:05 kravets-levko

@pimonteiro @akshay-s-ciq We have few hypotheses about what may cause your issues. Currently we're checking them, but you may help us. We just released a v3.2.0, which, among the other changes, has more debug logging added. You can help us by using the v3.2.0, enabling debug logging (add import logging + logging.basicConfig(level=logging.DEBUG)), and when you see your issue again - collect and send us the log output. Thank you so much!

P.S. Don't forget to re-enable CloudFetch - use_cloud_fetch=True

kravets-levko avatar Jun 14 '24 09:06 kravets-levko

I was also facing the same issue and use_cloud_fetch=False helped.

if it helps, here is what I get when I show debug logging

DEBUG - 2024-06-27 15:08:46,637- databricks.sql.thrift_backend - attempt_request:396 - Sending request: ExecuteStatement(<REDACTED>)
DEBUG - 2024-06-27 15:08:52,019- databricks.sql.thrift_backend - attempt_request:408 - Received response: TExecuteStatementResp(<REDACTED>)
DEBUG - 2024-06-27 15:08:52,020- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetOperationStatus(<REDACTED>)
DEBUG - 2024-06-27 15:08:57,122- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetOperationStatusResp(<REDACTED>)
DEBUG - 2024-06-27 15:08:57,123- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetOperationStatus(<REDACTED>)
DEBUG - 2024-06-27 15:09:02,258- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetOperationStatusResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:02,260- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetOperationStatus(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,450- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetOperationStatusResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,451- databricks.sql.thrift_backend - attempt_request:396 - Sending request: GetResultSetMetadata(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,584- databricks.sql.thrift_backend - attempt_request:408 - Received response: TGetResultSetMetadataResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,585- databricks.sql.thrift_backend - attempt_request:396 - Sending request: FetchResults(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,735- databricks.sql.thrift_backend - attempt_request:408 - Received response: TFetchResultsResp(<REDACTED>)
DEBUG - 2024-06-27 15:09:05,736- databricks.sql.utils - __init__:159 - Initialize CloudFetch loader, row set start offset: 0, file list:
DEBUG - 2024-06-27 15:09:05,736- databricks.sql.utils - __init__:166 - - start row offset: 0, row count: 53424
DEBUG - 2024-06-27 15:09:05,736- databricks.sql.utils - __init__:166 - - start row offset: 53424, row count: 8865
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - add_file_links:52 - ResultFileDownloadManager.add_file_links: start offset 0, row count: 53424
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - add_file_links:52 - ResultFileDownloadManager.add_file_links: start offset 53424, row count: 8865
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.utils - _create_next_table:234 - CloudFetchQueue: Trying to get downloaded file for row 0
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - _remove_past_handlers:123 - ResultFileDownloadManager: removing past handlers, current offset: 0
DEBUG - 2024-06-27 15:09:05,737- databricks.sql.cloudfetch.download_manager - _remove_past_handlers:132 - - checking result link: start 0, row count: 53424, current offset: 0
DEBUG - 2024-06-27 15:09:05,738- databricks.sql.cloudfetch.download_manager - _remove_past_handlers:132 - - checking result link: start 53424, row count: 8865, current offset: 0
DEBUG - 2024-06-27 15:09:05,738- databricks.sql.cloudfetch.download_manager - _schedule_downloads:144 - ResultFileDownloadManager: schedule downloads
DEBUG - 2024-06-27 15:09:05,738- databricks.sql.cloudfetch.download_manager - _schedule_downloads:149 - - start: 0, row count: 53424
DEBUG - 2024-06-27 15:09:05,740- databricks.sql.cloudfetch.download_manager - _schedule_downloads:149 - - start: 53424, row count: 8865
DEBUG - 2024-06-27 15:09:05,740- databricks.sql.cloudfetch.download_manager - _find_next_file_index:161 - ResultFileDownloadManager: trying to find file for row 0
DEBUG - 2024-06-27 15:09:05,740- databricks.sql.cloudfetch.download_manager - _find_next_file_index:177 - - found file: start 0, row count 53424
.
.
.
/python3.11/site-packages/databricks/sql/thrift_backend.py", line 408, in attempt_request
    logger.debug(
Message: 'Received response: TCloseSessionResp(<REDACTED>)'
Arguments: ()
DEBUG - 2024-06-27 15:10:05,748 - databricks.sql.cloudfetch.downloader - is_file_download_successful:62 - Cloud fetch download timed out after 60 seconds for link representing rows 0 to 53424
DEBUG - 2024-06-27 15:10:05,749 - databricks.sql.cloudfetch.download_manager - get_next_downloaded_file:112 - ResultFileDownloadManager: cannot find file for row index 0
DEBUG - 2024-06-27 15:10:05,751 - databricks.sql.utils - _create_next_table:244 - CloudFetchQueue: Cannot find downloaded file for row 0
DEBUG - 2024-06-27 15:10:05,771 - databricks.sql.thrift_backend - attempt_request:396 - Sending request: CloseOperation(<REDACTED>)
DEBUG - 2024-06-27 15:10:05,967 - databricks.sql.thrift_backend - attempt_request:408 - Received response: TCloseOperationResp(<REDACTED>)

Python 3.11.9, databricks-sql-connector-3.2.0

rnakshay avatar Jun 27 '24 13:06 rnakshay

Do you have a VPN or Proxy on the machine running the connector? It most likely cannot complete the SSL handshake between your machine and the DBFS URIs provided by CloudFetch. Do you have SSL / Proxy traffic logs?

On Thu, Jun 27, 2024 at 8:56 AM Akshay R N @.***> wrote:

I was also facing the same issue and use_cloud_fetch=False helped.

if it helps, here is what I get when I show debug logging (we forma

/python3.11/site-packages/databricks/sql/thrift_backend.py", line 408, in attempt_request logger.debug( Message: 'Received response: TCloseSessionResp(<REDACTED>)' Arguments: () DEBUG - 2024-06-27 15:10:05,748 - databricks.sql.cloudfetch.downloader - is_file_download_successful:62 - Cloud fetch download timed out after 60 seconds for link representing rows 0 to 53424 DEBUG - 2024-06-27 15:10:05,749 - databricks.sql.cloudfetch.download_manager - get_next_downloaded_file:112 - ResultFileDownloadManager: cannot find file for row index 0 DEBUG - 2024-06-27 15:10:05,751 - databricks.sql.utils - _create_next_table:244 - CloudFetchQueue: Cannot find downloaded file for row 0 DEBUG - 2024-06-27 15:10:05,771 - databricks.sql.thrift_backend - attempt_request:396 - Sending request: CloseOperation(<REDACTED>) DEBUG - 2024-06-27 15:10:05,967 - databricks.sql.thrift_backend - attempt_request:408 - Received response: TCloseOperationResp(<REDACTED>)

— Reply to this email directly, view it on GitHub https://github.com/databricks/databricks-sql-python/issues/383#issuecomment-2194755271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEY2HKHIEKBQSWRKIA4FO3ZJQKZ5AVCNFSM6AAAAABFRUQL26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUG42TKMRXGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

kthejoker avatar Jun 27 '24 20:06 kthejoker

Hmm, we do use VPN but I am not sure about the details. In my case, the above issue happens only when I am in Python debug mode. When run end to end this issue doesn't occur.

rnakshay avatar Jun 28 '24 08:06 rnakshay

Interesting, is there any kind of extensive lag in debug between execution and fetch? Maybe if the connection is returned to the pool you can no longer retrieve session details like row counts even though you can still retrieve the data.

On Fri, Jun 28, 2024 at 3:21 AM Akshay R N @.***> wrote:

Hmm, we do use VPN but I am not sure about the details. In my case, the above issue happens only when I am in Python debug mode. When run end to end this issue doesn't occur.

— Reply to this email directly, view it on GitHub https://github.com/databricks/databricks-sql-python/issues/383#issuecomment-2196389967, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEY2HIWPMJK4TAFDN5BUETZJUMHJAVCNFSM6AAAAABFRUQL26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJWGM4DSOJWG4 . You are receiving this because you commented.Message ID: @.***>

kthejoker avatar Jun 28 '24 19:06 kthejoker

hmm, I don't think so. We use pandas read_sql function with databricks_sql.connect for connection. For now, this issue is noticed only in local debug runs in my case, I will use the use_cloud_fetch = false option and keep an eye out for any update/response from others. Thank you for your help.

rnakshay avatar Jul 02 '24 09:07 rnakshay

Hi guys 👋 First - thank you all for your reports, logs, and any other information you provided on this issue. We also collected other reports from Databricks customers, and this helped to identify several problems in CloudFetch code. Now we do a first attempt to fix what we've found so far - databricks/databricks-sql-python#405. Hopefully, this will fix all the major issues we know about so far. I will let you know when we release this fix. Thank you for patience!

kravets-levko avatar Jul 02 '24 10:07 kravets-levko

Now we do a first attempt to fix what we've found so far - https://github.com/databricks/databricks-sql-python/pull/405

~I can confirm that this replaces the 0 row response by a MaxRetryError for me (see https://github.com/databricks/databricks-sql-python/issues/413) which is better but maybe not optimal.~

Edit: never mind, this fixes it. The linked issue was for a specific case of combining large queries with SSL verification that had to be disabled.

rth avatar Jul 11 '24 15:07 rth

Hello here 👋 We just released v3.3.0 which includes a refactoring of CloudFetch-related code. We think it should fix your issues, so please give it a try and let me know if it helped or not (remember to enable CloudFetch via use_cloud_fetch=True). If you still see any issues - please enable debug logging (see https://github.com/databricks/databricks-sql-python/issues/383#issuecomment-2167627778) and share log output. Thank you!

kravets-levko avatar Jul 17 '24 17:07 kravets-levko