dispy icon indicating copy to clipboard operation
dispy copied to clipboard

Job's ID should be specifiable at creation-time

Open UnitedMarsupials-zz opened this issue 7 years ago • 3 comments

The typical example of settings job.id is:

    job = cluster.submit(...)
    job.id = 'Meow'

However, as is also known, due to the race between threads, by the time the id-setting line is completed, the job may have already completed (or failed). It should be possible to set the identifier at the time of the task's very creation. Maybe, something like:

    job = cluster.submit(id = 'Meow', ...)

or, to ensure compatibility with the hypothetical existing code, which sends something called id to the remote, a separate method:

     job = cluster.submit_with_id('Meow', ....)

UnitedMarsupials-zz avatar Sep 22 '18 23:09 UnitedMarsupials-zz

I have considered this in early versions but dropped for couple of reasons: Delaying scheduling job until ID is set is clumsy and inefficient, getting the ID from user requires either extra parameter or another step etc. Since this is an issue that can happen only when jobs fail, I thought it was a compromise worth the trouble.

One idea is to take, say, dispy_job_id keyword parameter to submit. However, that would mean looking for that for every job submitted, which would mean burning a few cycles for every job submitted for every application! Is it possible to not rely on setting job's ID by application, but instead, use id(job) for application logic where unique ID is required? (Internally dispy uses uid that uses id function.)

pgiri avatar Sep 25 '18 02:09 pgiri

One idea is to take, say, dispy_job_id keyword parameter to submit

Yes, that's what I suggested.

would mean burning a few cycles for every job

?? That's like arguing for tabs over spaces, because you have to burn more cycles to skip the multiple blanks :)

But, if that's a concern, how about the other proposal -- a separate method for people, who want to be able to rely on the IDs. Something like: cluster.submit_with_id('My Id', .....)?

UnitedMarsupials-zz avatar Sep 25 '18 15:09 UnitedMarsupials-zz

I agree with @UnitedMarsupials . Another problem is that, the input argument of callback function is a copy of the job object (see line 1780 of the code below), which makes it impossible for callback function (in another thread) to wait until the __main__ thread assigns a job.id.

https://github.com/pgiri/dispy/blob/f74cdd20951a2ba2c05ead69813d0fa642db1d8d/py3/dispy/init.py#L1775-L1780

I propose an ugly solution in https://github.com/pgiri/dispy/issues/153#issuecomment-470100608.

I vote for designing something like cluster.submit_wit_id(id=xxx), and users take responsibility for maintaining their own id system.

xptree avatar Mar 09 '19 04:03 xptree