OpenCue Optimize FIND_JOBS_BY

This is an experiment to optimize FIND_JOBS_BY_SHOW which is the heaviest query in our use-case.

Environment / Test data

CentOS8 VM, Broadwell 24 cores, PostgreSQL13 1 instance
~8K jobs, ~8K layers, ~10M frames (1 layer per job, 1K/10K frames per layer)
extract 500 jobs sorted with int_priority and ts_started

Original FIND_JOBS_BY_SHOW

SQL https://gist.github.com/splhack/6905e35defd6bcbd92aeb82c7e645e41
Analyze result 249ms https://gist.github.com/splhack/bd7c5221acb43e095dffb3e8ddafd472

Optimized FIND_JOBS_BY_SHOW (~7 times faster than original)

SQL https://gist.github.com/splhack/8faead2d482f3dde2cba8226f6115dda
Analyze result 36ms https://gist.github.com/splhack/f93f4e784649a94bd9fdfc50129a9349

Explanation

Use CTE to sort job and job_resource first with some light filters (job state, pause, os, show ID, facility ID) and add ROW_NUMBER as rank
Filter with job, folder and layer calculation on the "sorted" jobs

Caveat

There are no guarantees that CTE provides the "sorted" result to outer query, thus the outer query should have had ORDER BY rank, and it makes the query slow.

Theory

Sort jobs with priority (and ts_started for FIFO #1060)
- O(N x log N) merge sort if we do everytime
- O(log N) binary search for insert if we can keep sorted list
Filter jobs
- Worst case O(N) linear search on the sorted jobs to check whether the utilization of job, folder and layer exceeds the maximum number of cores and gpus or not

Nov 30 '21 05:11 splhack

I like where this is going. One question, does the mentioned caveat means the benchmark is not actually comparing apples to apples? I mean, would the result of the proposed implementation be good enough to dispatch jobs in a somehow ordered fashion or do we need to add ORDER BY rank to make it usable?

Dec 01 '21 17:12 DiegoTavares

One question, does the mentioned caveat means the benchmark is not actually comparing apples to apples?

The benchmark results (EXPLAIN ANALYZE) are exact apples to apples comparison. Exact same environment, exact same data. The only difference is the SQL, FIND_JOBS_BY_SHOW vs FIND_JOBS_BY_SHOW_OPTIMIZED.

would the result of the proposed implementation be good enough to dispatch jobs in a somehow ordered fashion

Yes, it is. At least, in the range of amount data (~8K jobs, ~8K layers), the query result are the exact same. 'The exact same' means that both FIND_JOBS_BY_SHOW and FIND_JOBS_BY_SHOW_OPTIMIZED returns the exact same entities in the exact same order.

The caveat is because PostgreSQL (or any relational database) does not guarantee the outer query can see the exact same order with the inner query order.

But, with PostgreSQL version 13 and most likely earlier versions, the CTE (Common Table Expression, WITH clause query) will generate a temporary dataset in memory or materialize it to storage in the specified order because the CTE query requires to set the rank with ROW_NUMBER using the int_priority (and ts_started) order.

Thus, (I guess) in the current PostgreSQL implementation, the outer query just uses the temporary dataset as is. There are no technical reason that the outer query will see the temporary dataset in different order than the CTE query.

Dec 01 '21 18:12 splhack

I can add dispatcher.find_jobs_optimization_enabled property to switch the SQL, just like #1060

Dec 01 '21 18:12 splhack

Agreed, this looks good.

I can add dispatcher.find_jobs_optimization_enabled property to switch the SQL, just like #1060

I think this is a good idea for now.

Dec 04 '21 22:12 bcipriano

Optimize FIND_JOBS_BY_SHOW

Environment / Test data

Original FIND_JOBS_BY_SHOW

Optimized FIND_JOBS_BY_SHOW (~7 times faster than original)

Explanation

Caveat

Theory