dbt-sqlserver icon indicating copy to clipboard operation
dbt-sqlserver copied to clipboard

Deadlocks on target server when more than 1 dbt thread

Open deanja opened this issue 4 years ago • 2 comments

Expected Behaviour

dbt can run many tests in parallel without causing database deadlocks. dbt can query information_schema and do DDL on separate threads without deadlocks.

Actual Behaviour

Intermittent deadlocks. So far only observed on dbt build (with tests) and dbt test.

('40001', '[40001] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Transaction (Process ID 62) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (1205) (SQLFetch)')

An example deadlock log, showing implicated queries/ddl, is pasted at the end of this issue.

Steps to reproduce

dbt test --threads 8

or

dbt build--threads 8

Yesterday I could reliably get it to fail with 2 threads. Today it's running fine with 8 sometimes. Oooouch! Seems slightly easier to recreate when there are some bigger relations (10M+ rows) to test.

Configuration

A set of small generic dbt tests - I have ~50 tests DAG does not affect dbt test. For dbt build I have ~4 paths through the DAG. dbt 1.0.0 SQL Server 2019 ODBC Driver 2017

Possible causes

The "what changed recently " cause is that from dbt 0.21.1 test creates a lot more tables, as a workaround for there being no nested CTE functionality in SQL Server.

Cause in SQL Server. Not sure, but seems the engine has it's own means of ensuring integrity when querying INFORMATION_SCHEMA and making DDL changes at the same time. In DML, the query would simply get blocked (delayed) while the write completes, not cause a deadlock, except in some edge cases involving index updates. Things were fine pre 0.21.1 probably because of lower volume of operations per second - most tests are quick, and they can take advantage of parallelism.

A root cause is that there's not nested CTE in SQL Server. dbt-core tests and packages like dbt-expectations increasingly use nested CTEs.

Other information

Current workaround: limit dbt threads to 1.

Some discussion of possible causes and solutions on dbt slack

Concern in back of my mind is that we could also see this error then doing dbt run, given a busy enough DAG. I don't know the inner workings of dbt, but if they involve querying INFORMATION_SCHEMA and making DDL changes at the same time on another thread then there is cause for concern. No symptoms so far though :)

Have also been getting SQL engine errors that the name of some test tables exceed SQL Server's maximum of 128 characters. Also intermittent, and only surfaces if there was some other error in the SQL that caused the test run to fail.

Resources

Extract from a deadlock log from SQL Server:

<deadlock>
...
      </executionStack>
      <inputbuf>
select
      table_catalog as [database],
      table_name as [name],
      table_schema as [schema],
      case when table_type = 'BASE TABLE' then 'table'
           when table_type = 'VIEW' then 'view'
           else table_type
      end as table_type

    from [dw2].information_schema.tables
    where table_schema like 'dbt_test__audit'
     </inputbuf>
...
      <inputbuf>
  
  USE [dw2];
  if object_id ('"dbt_test__audit"."source_not_null_relation_column"','U') is not null
      begin
      drop table "dbt_test__audit"."source_not_null_relation_column"
      end
   </inputbuf>
</deadlock>

The log was found using: (credit to credit )

DECLARE @xelfilepath NVARCHAR(260)
SELECT @xelfilepath = dosdlc.path
FROM sys.dm_os_server_diagnostics_log_configurations AS dosdlc;
SELECT @xelfilepath = @xelfilepath + N'system_health_*.xel'
 DROP TABLE IF EXISTS  #TempTable
 SELECT CONVERT(XML, event_data) AS EventData
        INTO #TempTable FROM sys.fn_xe_file_target_read_file(@xelfilepath, NULL, NULL, NULL)
         WHERE object_name = 'xml_deadlock_report'
SELECT EventData.value('(event/@timestamp)[1]', 'datetime2(7)') AS UtcTime, 
            CONVERT(DATETIME, SWITCHOFFSET(CONVERT(DATETIMEOFFSET, 
      EventData.value('(event/@timestamp)[1]', 'VARCHAR(50)')), DATENAME(TzOffset, SYSDATETIMEOFFSET()))) AS LocalTime, 
            EventData.query('event/data/value/deadlock') AS XmlDeadlockReport
     FROM #TempTable
     ORDER BY UtcTime DESC;

Run it in SSMS and you'll get nice links to the xml for each deadlock.

deanja avatar Dec 23 '21 04:12 deanja

Maybe this should be optional? Perhaps set in profiles.yml.

mikaelene avatar Jan 28 '22 07:01 mikaelene

Isn’t part of the problems with the temporary creation of tables in Test that different tests create the same temporary table? That’s how I read the get_relation part of it

SportsTribution avatar Apr 25 '22 18:04 SportsTribution