ShokoServer icon indicating copy to clipboard operation
ShokoServer copied to clipboard

Command Request and background workers refactor

Open maxpiva opened this issue 9 years ago • 20 comments

Update Command Request from

        public int CommandRequestID { get; private set; }
        public int Priority { get; set; }
        public int CommandType { get; set; }
        public string CommandID { get; set; }
        public string CommandDetails { get; set; }
        public DateTime DateTimeUpdated { get; set; }

We add the following

    public string CommandDisplayName { get; set; } //Display Name used in GUIs
    public int MaxParallelTasks { get; set; } //Max number of parallel task this command support.
    public string ParallelIdentifier { get; set; } //Per example in case of import files, this could be the drive name, so same Parallel Identifiers cannot be paralelized.
    public string GroupName { get; set; } //example hasher, updater, images, anidb, etc, etc. Different Task different names.
    public string BatchIdentifier { get; set; } //unique Identifier, using Guid.NewGuid(), things added in the same batch should have the same identifier.

CommandRequest Interface will only have the Process Method.

A BaseCommandRequest child class could be made with helper functions, like populating the command request data initial values.

CommandRequest will extended into the commands we have today as childrens of CommandRequest or BaseCommandRequest, the child classes could have any number of extra properties to function. We use json.net to serialize the Child class commandrequest, and store it in CommandDetails, we do that in the save function of the repository, we clear CommandDetails then we serialize the object into CommandDetails using JSON.NET serializationSettings (TypeNameHandling=Auto) that will serialize the object with the correct children type. On CommandRequest retrieval from the database, we deserialize and return the deserializer child object instead the original Plain CommandRequest entity.

Any jmmserver worker should be split in two parts, the batch part (foreach, for, etc). and convert the loop part items into a commandrequests.

Off course a Thread Dispatcher should be made. And a good API to obtain information for the global status, group status, running requests, etc.

Any thoughts?

maxpiva avatar Sep 24 '16 02:09 maxpiva

The move from XML to JSON is a good thing. Also I understand that you want split this:

for int = 0; i++; i < 10
{
command[i].execute();
}

to :

command[0].execute();
command[1].execute();
...
command[9].execute();

?

Do we have errors because of half done task being run after restart? Maybe better (less cpu/time consuming) would have each command had it ID (PID) and have like last 1000 command executed in db so when you run half done command you would check againts list of done commands?

Also the conversion wont resolve situation when command will be converted to list of commands and someone shutdown jmm in middle - or you wont save in middle state of process and restart it from begining?

Do we know how big those iteration commands can get?

bigretromike avatar Oct 06 '16 13:10 bigretromike

Using this thread for the Command Worker refactors as talked about on Discord.

The goal is to remove the three separate command workers and instead have a single queue that can run multiple queries at once.

Assigned @maxpiva as he'll be working on this.

ElementalCrisis avatar Aug 16 '17 05:08 ElementalCrisis

Any progress? Currently hashing is EXTREMELY slow because it is basically single-threaded. Same applies for linking files.

t3hk0d3 avatar Jan 03 '24 19:01 t3hk0d3

Linking files is dependent on AniDB's rate limits, so it can't be a ton faster. It's faster by nature of some serious overall performance improvements. The queue refactor is going to take a while. Some things will happen first, with others coming later. I'm still not certain how I want to do the scheduling, whether to have chains of commands happen in a line for more user-friendly progress indication or to just throw it in a bucket and do things mostly in the order they are queued with some other rules (the way it is now). Hashing is likely one of the first things to be in the new queue. I'm not sure it will help much, though, as hashing is bottlenecked on CPU and IO.

da3dsoul avatar Jan 03 '24 20:01 da3dsoul

Oh apparently I never made an issue detailing my plans and thoughts regarding the queue rewrite. The plan is to use Quartz.net with a custom scheduler

da3dsoul avatar Jan 03 '24 20:01 da3dsoul

It's also much more currently WIP than 2018 lol. Progress was made only a few months ago. Then I shifted to the filter rewrite

da3dsoul avatar Jan 03 '24 20:01 da3dsoul

@da3dsoul

Linking files is dependent on AniDB's rate limits, so it can't be a ton faster. It's faster by nature of some serious overall performance improvements.

For me personally, moving files from import folder into destination folder during linking takes 99% of time.

Hashing is likely one of the first things to be in the new queue. I'm not sure it will help much, though, as hashing is bottlenecked on CPU and IO.

With modern SSDs - CPU is a biggest bottleneck. However spreading work over several cores by using multi-threading would be just enough solution.

t3hk0d3 avatar Jan 03 '24 20:01 t3hk0d3

I would check your setup. If it's on the same disk, it's almost instant for me.

Hashing is supposed to be multi-threaded, but I've seen complaints about the speed of rhash (the hash library we use)

da3dsoul avatar Jan 03 '24 20:01 da3dsoul

@da3dsoul

I would check your setup. If it's on the same disk, it's almost instant for me

It is same disk, but folders are mounted inside docker container separately, so OS does a hard-copy (because it sees it as different filesystems).

Also think about people who mount NAS remote folder for import.

t3hk0d3 avatar Jan 03 '24 20:01 t3hk0d3

I'd change that then, you can non-destructively edit import folders after setting your docker volumes to a single, common root directory.

da3dsoul avatar Jan 03 '24 20:01 da3dsoul

@da3dsoul

Oh apparently I never made an issue detailing my plans and thoughts regarding the queue rewrite. The plan is to use Quartz.net with a custom scheduler

I would suggest to look into Rx.Net (or any other reactive framework). From my experience any IO bound-work becomes sooo much faster.

t3hk0d3 avatar Jan 03 '24 20:01 t3hk0d3

I'm not sure how well reactive programming would work in Shoko. It works great when dealing with APIs, but most of Shoko is old and.... synchronous.

da3dsoul avatar Jan 03 '24 20:01 da3dsoul

I think queue workers are kinda separate subsystem, and because you are planning to redesign it from scratch - it seems to be a good candidate.

t3hk0d3 avatar Jan 03 '24 20:01 t3hk0d3

I have no idea how a reactive framework would help with a queue. Can you elaborate?

da3dsoul avatar Jan 03 '24 20:01 da3dsoul

Queue consumers could be implemented as bound async sink running on separate thread pools.

This would allow to have flexible concurrency based on type of task in the queue:

  • CPU bound task could be executed on 2*CPU core thread pool
  • IO bound tasks on 6*CPU core thread pool (as example)
  • Throttled tasks (such as AniDB requests) - single thread pool

Its also possible to have just single thread pool and restrict concurrency per task type, but it would be a bit more complicated.

t3hk0d3 avatar Jan 03 '24 20:01 t3hk0d3

Pretty sure Quartz can do all of that

da3dsoul avatar Jan 03 '24 20:01 da3dsoul

I don't have much experience with Quartz.NET, but i've had some time with Java version. According to my experience and Quartz.NET documentation (https://www.quartz-scheduler.net/documentation/quartz-3.x/configuration/reference.html#threadpool) you can have just single thread pool for all tasks 🥲

t3hk0d3 avatar Jan 03 '24 20:01 t3hk0d3

Well you have a single thread pool for all of Quartz, but you can subdivide it via the scheduler and job type. I'm not privy to how that might matter, but I don't see a problem with it

da3dsoul avatar Jan 03 '24 21:01 da3dsoul

Well you have a single thread pool for all of Quartz, but you can subdivide it via the scheduler and job type.

AFAIK you can't control job execution concurrency on that level. But i ain't expert with Quartz.NET.

I'm not privy to how that might matter, but I don't see a problem with it

Throttled API requests (such as AniDB) and Hasher (or any other CPU/IO bound work) likely should have different concurrency factor.

t3hk0d3 avatar Jan 03 '24 21:01 t3hk0d3

Well, I'll find out the hard way when I try to write the scheduler for the hasher. It'll be annoying, since I've put a decent bit of effort into Quartz, but I'm not against finding another option if that happens. I just don't want to write my own job queue and job store. We tried, and that's why it's so terrible.

da3dsoul avatar Jan 03 '24 21:01 da3dsoul