Supertask with many tasks kills UI performance
I'm running Hashtopolis 0.12.0 on a quite beefy server, the host itself does not do any cracking. Recently I added a supertask with a few hundred tasks and after running it for a while, I could no longer login to the UI, web is not really responding in any reasonable time.
I managed to narrow the problem down to suboptimal way in querying the database, there are a lot of repeated queries that attempt to fetch one row of data for each task in the subtask. Each of the query also uses DB join (which in itself is not a problem).
The query causing the issue is:
SELECT `File`.`fileId` AS `File.fileId`, `File`.`filename` AS `File.filename`, `File`.`size` AS `File.size`, `File`.`isSecret` AS `File.isSecret`, `File`.`fileType` AS `File.fileType`, `File`.`accessGroupId` AS `File.accessGroupId`, `File`.`lineCount` AS `File.lineCount`, `FileTask`.`fileTaskId` AS `FileTask.fileTaskId`, `FileTask`.`fileId` AS `FileTask.fileId`, `FileTask`.`taskId` AS `FileTask.taskId` FROM File INNER JOIN FileTask ON File.fileId=FileTask.fileId WHERE FileTask.taskId='563' ORDER BY File.fileId ASC
Would it be possible to change this query to perform a bulk get instead of sequential fetching? Unfortunately I'm not any good with PHP so can't write a fix myself.
Attaching a DB log from about a minute of run. The log has nearly 16MB as the query is repeated many times in there, only the FileTask.taskId argument changes.
Thank you!
Additional info - I just noticed that this problem halts Hashtopolis operations completely as the clients can't connect to the server, getting HTTP timeouts.
I consistently run supertasks with 140 subtasks and see no issue. Not saying you are wrong or not experiencing the problem, just throwing my experience out there to see if we can hone in on a number of subtasks that cause the problem.
I consistently run supertasks with 140 subtasks and see no issue. Not saying you are wrong or not experiencing the problem, just throwing my experience out there to see if we can hone in on a number of subtasks that cause the problem.
The supertask I'm trying to run has over 10K subtasks in total :) Please see the attached log file, it should be reasonably self-explanatory.
Cheers
Jirka
I know that with some larger number of tasks some requests are not that ideal. They are mostly like this due to how Hashtopolis grew over the time. The main problem that it halts it completely, is that for every new request from an agent or the UI, the server has to load everything again, as there is not a service running which can cache some values over time.
The question here is, why you need 10k tasks? Is it even realistic that these tasks will ever be completed? It typically does not make sense to just throw thousands of masks into a supertask from some PACK analysis without limit yourself at some point (depending on the capabilities of the system and the hash algorithm).
Thanks for a reply! Yes, it's realistic that tasks will complete soon. Our HW managed to get about half of them in a week, so it's not too bad for our use case.
I'm not proposing a major overhaul of the system, but hoping that the sequential loading from DB could be easily fixed (I know the fix is easy in other languages as I have done it myself multiple times, I'm just not a PHP person at all).