Contributing: Documentation of python client
Hey, I'd like to start writing the documentation of the python client. I have already created a module pyconductor on pypi. Currently it is just a filtered branch of the conductor repository.
I am planning on writing the documentation as well as adding some functionality which reflects our usecase.
I am not sure what is the best way to tie this back to the original project, and am looking for guidance on the same.
Hey @samhattangady, that would be great. The best way to link back is to maintain python client with in Conductor repo. That way, all the contributions to python client could stay in one place. Also, users can simply verify the authenticity of client over plethora of others (if available) on pypi. We'd be glad to work with you through this process. Thanks for your efforts.
Sure. I will get started on this.
I am looking to convert several of the methods in ConductorWorker to internal methods. The way I see it, only __init__ and start should be public methods. All the other methods (execute and poll_and_execute) should be internal.
However, these changes will be backward incompatible.
I would also like to add a consume methods, which executes one task (or a fixed number of tasks) and then ends. I am using this for some tasks that can be run on cron or some other scheduler. Specifically things like downloading and uploading from local network, and running these tasks at night so that there is no strain.
Please let me know your thoughts.
@samhattangady Breaking backwards compatibility at once would be troublesome for lots of users. I'd suggest to keep separate major versions of client. I.e the existing client could be 1.x and new client with breaking changes could be 2.x; This should be mentioned explicitly in client versions, so that users doesn't accidentally switch to one over other.
I would also like to add a consume methods, which executes one task (or a fixed number of tasks) and then ends.
I'm not sure if I understood this. Would you mind elaborating please.
Alright. I think I'll avoid the breaking changes for now then. Will just focus on documentation.
Currently, the workers are designed as continuously running processes (possibly daemons). This ensures that tasks get executed as fast as possible.
I have another usecase for workers though. As stated above, some tasks like uploading or downloading are not particularly frequent. Also, I would like these tasks to be run at some point when the network load is low, so as to not affect the users on my network.
Also, another usecase that I have is when I have a single instance that is going to execute multiple tasks. Let's assume I have a single machine that I am using to do image processing and image compression. These are both fairly resource heavy, and I don't want them both to be running together. So first I want to execute all the image processing tasks, and then move to the compression tasks.
For this I propose an alternate way to use the ConductorWorker. Along with the start method, there will be a consume method. The consume method will first check if there are any tasks to execute. If not, it will end. If there are any tasks to execute, it will execute them (or a fixed number of them) and then end.
So these tasks will only be executed when the worker is run, as opposed to having a worker that is always running.
This is my reference implementation, https://github.com/skylarkdrones/pyconductor/blob/worker-consume/pyconductor/ConductorWorker.py#L142
I don't know if conductor was designed with this usecase in mind, but I have been using it this way, and it has been working well.
I have added the docstrings for ConductorWorker in #1010.