Add `selected_message_queues` flag to filter message queues and improve performances
What does this PR do?
This PR adds a selected_message_queues flag to allow watching only a specific list of message queues and improve performances.
Motivation
(Customer request)
Sometimes you just want to monitor messages on specific IBM i system message queues. The thing is that the way it's implemented now, the request is being made on all MESSAGE_QUEUE_NAME instead of first filtering by interesting MESSAGE_QUEUE_NAME and after that get the message queue info we want. We suspect the absence of filter to be responsible for the high CPU usage on IBM i hosts so we need to fix that.
Benchmark
To check that the feature really improved the performance I ran some performance tests:
We will be monitoring the QSYSOPR message queue and CECUSER (the user) message queue (those names are not important, we just want to have 2 different message queues). We will create messages in the CECUSER message queue and measure the query execution time on both QSYSOPR, CECUSER, without filter. We will be satisfied if the time for QSYSOPR remains nearly constant, QSYSOPR increases and the without filter increases too
The query is
SELECT MESSAGE_QUEUE_NAME, MESSAGE_QUEUE_LIBRARY, COUNT(*), SUM(CASE WHEN SEVERITY >= 50 THEN 1 ELSE 0 END) FROM QSYS2.MESSAGE_QUEUE_INFO {message_queues_filter} GROUP BY MESSAGE_QUEUE_NAME, MESSAGE_QUEUE_LIBRARY
And the message_queues_filter will take the value:
-
WHERE MESSAGE_QUEUE_NAME IN ('QSYSOPR')for the QSYSOPR query. -
WHERE MESSAGE_QUEUE_NAME IN ('CECUSER')for the CECUSER query. -
for the unfiltered query.
Now that we're all on the same page, here are the time results about the query:
(Tests ran on IBM i 7.4 PowerVM POWER9 LPAR 1 vCPU 2048 Mo RAM)
| # CECUSER Jobs | QSYSOPR query time | CECUSER query time | No filter query time |
|---|---|---|---|
| 0 | 0.67s | 0.36s | 0.45s |
| 220 | 0.40s | 0.53s | 0.48s |
| 960 | 0.60s | 0.62s | 0.62s |
| 2150 | 0.43s | 0.72s | 0.72s |
| 3600 | 0.38s | 0.97s | 0.91s |
| 6700 | 0.43s | 1.24s | 1.20s |
| 9300 | 0.48s | 1.70s | 1.57s |
| 10830 | 0.35s | 1.66s | 1.64s |
| 12000 | 0.46s | 1.86s | 1.85s |
| 66400 | 0.86s | 8.46s | 8.64s |
So our expectations finally realise as we see the QSYSOPR query time being constant compared to the other two queries. This means less load on the CPU for the filtered query, mission completed !
Additional Notes - Scripts for reproducibility
IBM i is not a Unix-like OS, so I think it's important to detail the scripts used, both for rigor and for reproducibility purposes.
To create the jobs on the VM the following command was used:
max=1000
index=0
while [ $index -lt $max ] ; do
let index+=1
echo Job $index
system "SBMJOB JOBD(QBATCH) JOB(WSYS) JOBQ(QBATCH) CMD(WRKSYSSTS)"
done
To measure the time taken during a query the following command was used:
qsh -c "db2 -t \"select distinct current_timestamp from sysibm.sysdummy1;\";" | head -n 4 |tail -n 1; qsh -c "db2 \"$query\""; qsh -c "db2 -t \"select distinct current_timestamp from sysibm.sysdummy1;\";" | head -n 4 |tail -n 1
Review checklist (to be filled by reviewers)
- [ ] Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
- [ ] PR title must be written as a CHANGELOG entry (see why)
- [ ] Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
- [ ] PR must have
changelog/andintegration/labels attached
The validations job has failed; please review the Files changed tab for possible suggestions to resolve.
The validations job has failed; please review the Files changed tab for possible suggestions to resolve.
The validations job has failed; please review the Files changed tab for possible suggestions to resolve.
Codecov Report
Merging #12808 (f8ddbea) into master (eecf6d8) will increase coverage by
0.00%. The diff coverage is100.00%.
| Flag | Coverage Δ | |
|---|---|---|
| ibm_i | 82.28% <100.00%> (+0.70%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
The validations job has failed; please review the Files changed tab for possible suggestions to resolve.