server icon indicating copy to clipboard operation
server copied to clipboard

Supertask with many tasks kills UI performance

Open JirkaV opened this issue 3 years ago • 5 comments

I'm running Hashtopolis 0.12.0 on a quite beefy server, the host itself does not do any cracking. Recently I added a supertask with a few hundred tasks and after running it for a while, I could no longer login to the UI, web is not really responding in any reasonable time.

I managed to narrow the problem down to suboptimal way in querying the database, there are a lot of repeated queries that attempt to fetch one row of data for each task in the subtask. Each of the query also uses DB join (which in itself is not a problem).

The query causing the issue is:

SELECT `File`.`fileId` AS `File.fileId`, `File`.`filename` AS `File.filename`, `File`.`size` AS `File.size`, `File`.`isSecret` AS `File.isSecret`, `File`.`fileType` AS `File.fileType`, `File`.`accessGroupId` AS `File.accessGroupId`, `File`.`lineCount` AS `File.lineCount`, `FileTask`.`fileTaskId` AS `FileTask.fileTaskId`, `FileTask`.`fileId` AS `FileTask.fileId`, `FileTask`.`taskId` AS `FileTask.taskId` FROM File INNER JOIN FileTask ON File.fileId=FileTask.fileId  WHERE FileTask.taskId='563' ORDER BY File.fileId ASC

Would it be possible to change this query to perform a bulk get instead of sequential fetching? Unfortunately I'm not any good with PHP so can't write a fix myself.

Attaching a DB log from about a minute of run. The log has nearly 16MB as the query is repeated many times in there, only the FileTask.taskId argument changes.

Thank you!

general.log.zip

JirkaV avatar Sep 13 '22 11:09 JirkaV

Additional info - I just noticed that this problem halts Hashtopolis operations completely as the clients can't connect to the server, getting HTTP timeouts.

JirkaV avatar Sep 13 '22 15:09 JirkaV

I consistently run supertasks with 140 subtasks and see no issue. Not saying you are wrong or not experiencing the problem, just throwing my experience out there to see if we can hone in on a number of subtasks that cause the problem.

Matrix20085 avatar Sep 13 '22 23:09 Matrix20085

I consistently run supertasks with 140 subtasks and see no issue. Not saying you are wrong or not experiencing the problem, just throwing my experience out there to see if we can hone in on a number of subtasks that cause the problem.

The supertask I'm trying to run has over 10K subtasks in total :)  Please see the attached log file, it should be reasonably self-explanatory. 

Cheers

     Jirka

JirkaV avatar Sep 14 '22 06:09 JirkaV

I know that with some larger number of tasks some requests are not that ideal. They are mostly like this due to how Hashtopolis grew over the time. The main problem that it halts it completely, is that for every new request from an agent or the UI, the server has to load everything again, as there is not a service running which can cache some values over time.

The question here is, why you need 10k tasks? Is it even realistic that these tasks will ever be completed? It typically does not make sense to just throw thousands of masks into a supertask from some PACK analysis without limit yourself at some point (depending on the capabilities of the system and the hash algorithm).

s3inlc avatar Sep 14 '22 11:09 s3inlc

Thanks for a reply! Yes, it's realistic that tasks will complete soon. Our HW managed to get about half of them in a week, so it's not too bad for our use case.

I'm not proposing a major overhaul of the system, but hoping that the sequential loading from DB could be easily fixed (I know the fix is easy in other languages as I have done it myself multiple times, I'm just not a PHP person at all).

JirkaV avatar Sep 14 '22 11:09 JirkaV