IdentityServer4 icon indicating copy to clipboard operation
IdentityServer4 copied to clipboard

TokenCleanupService using full table scan

Open edmacdonald opened this issue 5 years ago • 7 comments

We have tens of millions of rows in the PersistedGrants table and this block is resulting in a full scan during clean up:

https://github.com/IdentityServer/IdentityServer4/blob/07898a61dc22ce9a95e5fc5611fa674feac4230e/src/EntityFramework.Storage/src/TokenCleanup/TokenCleanupService.cs#L77-L81

Sorting by the same column used in the predicate fixes it

                var expiredGrants = await _persistedGrantDbContext.PersistedGrants
                    .Where(x => x.Expiration < DateTime.UtcNow)
                    .OrderBy(x => x.Expiration)
                    .Take(_options.TokenCleanupBatchSize)
                    .ToArrayAsync();

edmacdonald avatar Dec 17 '20 15:12 edmacdonald

Submitted PR 5092

edmacdonald avatar Dec 17 '20 15:12 edmacdonald

Hmm, I guess there is an index on the key, but given that that's the ordering it still needs to do a full table scan, despite there being a index on expiration?

brockallen avatar Dec 30 '20 19:12 brockallen

It's not always a full scan, but it does read many more rows than it needs to (see the query plan linked in the PR).

If the optimizer chose to use the index on Expiration, it would have to read all the expired rows, and then re-order them based on the key before taking the top TokenCleanupBatchSize. This would be memory intensive and probably spill over into tempdb. The optimizer (wisely) decides against this and instead runs through the key index in the correct order, throwing away records that don't meet the criteria, and stopping when it has returned TokenCleanupBatchSize number of records.

In application terms, the job has to read and skip over all the older but longer living tokens (refresh) to get to the newer expired (access) tokens. Also, any time there are fewer expired records than TokenCleanupBatchSize it will do a full scan. This will happen regularly if the clean up job is keeping up.

Ordering and filtering by the Expiration, the query will only need to read TokenCleanupBatchSize rows.

edmacdonald avatar Dec 30 '20 22:12 edmacdonald

Yep, the sort by expiration is certainly the better choice. We'll get that merged in soon.

brockallen avatar Dec 31 '20 02:12 brockallen

I'm having the exact same issue as described by @edmacdonald, and did some tests with the purposed fix.
It actually gives a great performance boost! Is there any expectation of this being merged really soon ? we are start facing some nasty problems . I'm seeing that this is still waiting for a review in PR https://github.com/IdentityServer/IdentityServer4/pull/5092 image

Many thanks

sandcar avatar Mar 19 '21 12:03 sandcar

@sandcar I see they fixed it in the new code here but just didn't merge this PR for some reason.

edmacdonald avatar Mar 31 '21 18:03 edmacdonald

thanks @edmacdonald for the info . @brockallen is there any chance that this PR will be merged soon ?

sandcar avatar Apr 01 '21 10:04 sandcar