`AsyncTAPJob` makes unnecessary network calls in property getter.
It looks like most of not all the mutable properties of the class AsynTAPJob make network calls to update the job info. Most of the time this seems an unnecessary overhead. At the very minimum, the code could be changed to avoid the _update calls if the job is in a final state, in which case the remote call is redundant.
On Wed, Sep 21, 2022 at 03:12:35PM -0700, Adrian wrote:
time this seems an unnecessary overhead. At the very minimum, the code could be changed to avoid the
_updatecalls if the job is in a final state, in which case the remote call is redundant.
The trouble with this is that in UWS 1.1, there is the state ARCHIVED, which some services actually implement; cf. https://ivoa.net/documents/UWS/20161024/REC-UWS-1.1-20161024.html#ExecutionPhase Since that's the only true end state but in practice will almost never be seen by pyVO, this would make this change somewhat useless.
Of course: do we want to capture transistions to ARCHIVED? I'd not argue against allowing ABORTED and friends to be end states, too.
Another sore spot with this plan is that right now, if the server has removed the job, doing .phase (probably -- I have not tried) raises an exception. After this change, it will still pretend the job is there.
Again, I'm not saying that's a show stopper. But if we make that change, we should explicitly say how to test of the server-side existence of the job. And warn against this little trap.
Summing up: I won't speak loudly against the change, but I can't say I'm a big fan, either.
The way that people work in the Rubin Science Platform will tend to produce quite a bit of contact between PyVO and jobs in the ARCHIVED state, so I would definitely not be happy with an approach that doesn't take that into account. (The use case is people looking at their query history, and deciding whether or not a - potentially very time-consuming - query needs to be re-run by whether the results are still available, i.e., exactly on COMPLETED vs. ARCHIVED.)
The main point here is the frequency at which it fetches the data from the server. If the user creates an instance of AsyncTAPJob and tries to access its attributes, a separate GET call is generated every time an attribute is accessed.
If I'm not mistaken, the following code will produce 5 server calls. The info might not be consistent as the state of the job could change between calls.
job = AsyncTAPJob(url)
print('Job {job.job_id} own by {job.owner} runs query {job.query}, is in phase {job.phase} and estimated runtime {'job.quote'}
I don't think this is the intention of caller nor UWS which provides a snapshot of the job and returns all these attributes at once. Why can't the caller control when the state of a job is fetched from the server?