sp_BlitzLock - Duplicate available plans
Version of the script 8.28
What is the current behavior?
There are at least 3 scenarios where we can get dupliate entries for available plans from sp_BlitzLock:
- Procedure involved in deadlock has many queries
- Procedure is called with different parameters
- The query involves many tables and deadlock may occur on any of them (I think)
Setup:
DROP TABLE IF EXISTS dbo.LockTest
CREATE TABLE dbo.LockTest (
Id INT NOT NULL PRIMARY KEY,
Value INT NOT NULL
);
GO
CREATE OR ALTER PROC p_UpdateLockTest (@p1 int) AS
BEGIN
SELECT count(1) c into #t2 from sys.indexes
UPDATE dbo.LockTest
SET Value = Value + 1
WHERE Id = @p1;
END
GO
INSERT INTO dbo.LockTest (Id, Value) VALUES (1, 100), (2, 200);
GO
--clean out plan cache to better see the issue
DBCC FREEPROCCACHE()
Session A:
BEGIN TRAN;
EXEC p_UpdateLockTest @p1 = 1
GO
EXEC p_UpdateLockTest @p1 = 2
--ROLLBACK
Session B:
BEGIN TRAN;
EXEC p_UpdateLockTest @p1 = 2
GO
EXEC p_UpdateLockTest @p1 = 1
--ROLLBACK
- Set up table, procedure and insert data
- Open 2 sessions A and B
- In session A start the transaction and execute the first procedure
- In session B start the transaction and execute the first procedure
- In session A execute the second procedure. It will hang
- In session B execute the second procedure. It will succeed or become a deadlock victim
- execute sp_BlitzLock and observe the available plans
This reproducesr shows first 2 of the issues mentioned. The last one is quite hard to reproduce for me. Sorry.
Even though only one procedure was involved we can see it duplicated 6 times in my case. Some entries are obviosly duplicate of others. Other seem to provide data for unrelated query from the same procedure. You can see the statistics and statement offfsets are different.
What is the expected behavior? Only one row is returned for involved statement
Which versions of SQL Server and which OS are affected by this issue? Did this work in previous versions of our procedures? Reproduced in SQL 2017, 2019 Pretty sure this is independent from the version
I have potential fix for this issue since I stumbled into this in production already: https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit/commit/dd8bc10ef9d6ae433c0646e589e99b00fc727710 If you are interested I can make a pull request and/or explain it a little
@Tisit sure, I've assigned the issue to you and we'd welcome the pull request.
@Tisit I'd be really careful with this one. I ran into the same issue while writing the original code for it, and found quite often that the statement start and end details would filter out things they shouldn't in more complex scenarios than your test rig. I deliberately chose to be over-inclusive in the results to not miss anything. I do agree it could be better, but the plan cache is a fickle friend.
Also, these should be integers, not Unicode strings
stmtstart = ca.dp.value('@stmtstart', 'nvarchar(131)'),
stmtend = ca.dp.value('@stmtend', 'nvarchar(131)')
@erikdarlingdata Not sure what you expect from me now. I can believe I'm missing something, but on the systems I tested my changes I didn't notice any problems.
One thing I believe might be/has been the problem is that according to sys.dm_exec_query_stats documentation statement_end_offset:
For versions before SQL Server 2014 (12.x), a value of -1 indicates the end of the batch.
I don't account for that here and I don't work with anything older tha 2016 (Lucky me). But looking at others scripts that I use, I always special cased end_offset like this:
CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset
Speculatively I could built this in.
BTW I will be away from my main PC for a couple of weeks, so any update from my side needs to wait until I'm back. But I don't think we're under any SLA here ;)
@Tisit no expectations (aside from using the correct data types 😃) or SLAs on this.
It's entirely possible that your way is more correct (potentially losing some plans from results) than mine (including everything and returning too many results).
After all, it's the plan cache, and the plan cache is a hell crammed with blessings on its best day.
I also have no objections to this code going in (aside from using the incorrect data types 😃) and then narrowing the cases where things are lost later.
Just one question, though (I haven't tested your branch): is the return type of the query text still an XML clickable column after you do this: MAX(TRY_CAST(dr.query_xml AS nvarchar(MAX)))?
It does make life easier to retain that, even if it's only on the select out after your grouping query does its thing.
Thanks, and enjoy the time away!