skyscraper icon indicating copy to clipboard operation
skyscraper copied to clipboard

Restructure skipExisting checks to reuse previous values

Open J-Swift opened this issue 1 year ago • 4 comments

Another bottleneck : D

For a large db.xml + a sizeable gamelist this got really expensive to keep doing the same checks. Restructured it to save information that can be reused.

This took my local skipExisting check from ~5mins to ~5seconds

J-Swift avatar Aug 10 '24 15:08 J-Swift

Also, there is still room to improve it but I can't quite figure out the algorithm.

Right now, we iterate over the queue for every gamelist entry, and we call N removes on the queue for each found game. I think It would be better to do a full iteration of the queue, building a cache of "index N has this filepath", simliar to my previous PR. Then do a full cycle of the gamelist to find all indexes to remove. Then remove those in a tight loop (or in batch? is that possible? I havent done c++ in 20 years and never messed with QT).

J-Swift avatar Aug 10 '24 15:08 J-Swift

That must be some gamelist! Your suggested change of this PR saves one if. I tested it but noticed no measurable difference between the "now" and your changes. However, my gamelist for the platform only has just below 700 entries.

Can you provide evidence for your claim (5min --> 5secs)? Do you compile against Qt6 standard or against Qt5 standard (see manual installation in the readme)? What hardware do you use, incl. storage interface, is the storage network attached?

Gemba avatar Aug 12 '24 09:08 Gemba

Are you generating for ES? I didn’t mention but thats what I was fixing, the other frontends I just updated for consistency.

Ive been doing a lot of updating so Ill have to see if I still have a copy of the cache that was showing the pathological case. Note that its not so much that we are saving an if as much as we are saving an if N times (the size of the queue) as well as handling worst case misses (ie the scenarios where we have to run through the whole queue). My gamelist was ~1500 entries with a lot of misses, resource cache probably 100k entries, and ~7k quickida. My only evidence is that it was doing it before I updated it and now its not : D

Im running on mac via docker (dockerfile taken from other PR). I believe thats QT5.

On Mon, Aug 12, 2024 at 5:33 AM Gemba @.***> wrote:

That must be some gamelist! Your suggested change of this PR saves one if. I tested it but noticed no measurable difference between the "now" and your changes. However, my gamelist for the platform only has just below 700 entries.

Can you provide evidence for your claim (5min --> 5secs)? Do you compile against Qt6 standard or against Qt5 standard (see manual installation in the readme)? What hardware do you use, incl. storage interface, is the storage network attached?

— Reply to this email directly, view it on GitHub https://github.com/Gemba/skyscraper/pull/83#issuecomment-2283503448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTHTTG7YEQG4RMYLC2EODZRB6OBAVCNFSM6AAAAABMJ5OKOOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBTGUYDGNBUHA . You are receiving this because you authored the thread.Message ID: @.***>

J-Swift avatar Aug 12 '24 12:08 J-Swift

Yes I did try the ES frontend to reproduce your issue. I saw also your other minor changes. And yes, the docker file / update_skyskraper.sh script uses Qt5 standard. This takes the qt6 changes on QList/QVector out of the mix.

Seems it is primary caused by a large quickids.xml, db.xml and resource cache. If you have this context it would help to find the real culprit.

The nested loop remove.at() is another issue -as you spotted- which can be optimized, from M*N/2 (in average) to M+N (worst case).

Gemba avatar Aug 14 '24 07:08 Gemba

Sorry about the delay, got sidetracked on other projects. I dont have the backup to compare any more

J-Swift avatar Sep 23 '24 21:09 J-Swift

Thanks for keeping me posted. I will close the PR now. Feel free to reopen, when you have evidence.

Gemba avatar Sep 27 '24 17:09 Gemba